0% found this document useful (0 votes)
113 views

Lions - A Commentary On The Unix Operating System 197705

This document provides a commentary and overview of the UNIX operating system source code. It introduces the source code, which is divided into sections. It describes fundamental concepts like the processor, memory management, and initial conditions. It then summarizes selected code examples to illustrate processes, memory allocation, process switching, interrupts, traps, system calls, and software interrupts. The goal is to help students understand and navigate the UNIX source code.

Uploaded by

abigarxes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views

Lions - A Commentary On The Unix Operating System 197705

This document provides a commentary and overview of the UNIX operating system source code. It introduces the source code, which is divided into sections. It describes fundamental concepts like the processor, memory management, and initial conditions. It then summarizes selected code examples to illustrate processes, memory allocation, process switching, interrupts, traps, system calls, and software interrupts. The goal is to help students understand and navigate the UNIX source code.

Uploaded by

abigarxes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 118

A

COMMENTARY
ON THE
UNIX
OPERATING
SYSTEM
JOHN LIONS

The University of New South Wales


A
COMMENTARY
ON THE
UNIX
OPERATING
SYSTEM
Th is booklet has been produced for students at the University
of New South Wales taking courses 6.6028 and 6.657G.

It is intended as a companion to, and commentary on, the


booklet UNIX Operating System Source Code, Level Six.

The UNIX Software System was written by K. Thompson and


D. Ritchie of Bell Telephone Laboratories, Murray Hill, NJ. It
has been made available to the University of New South Wales
under a licence from the Western Electric Company.

THIS INFORMATION IS PROPRIETARY AND IS TEE PROP~


ERTY OF BELL LABORATORIES, INC. IT IS TO BE USEl
BY AUTHORIZED BELL LABS EMPLOYEES ONLY •. ITS RE-
PRODUCTION OR DISCLOSURE TO UNAUTHORIZED PERSON~
EITHER CRALLY OR IN WRITING, IS PROHIBITED.
J. LIONS DEPT. OF COMPUTER SCIENCE
THE UNIVERSITY OF NEW SOUTH WALES
3. Reading ·C n Programs
Some Selected Examples 3-1
Example 1 3-1
Example 2 3-1
Example 3 3-2
Example 4 3-3
Example 5 3-3
Example 6 3-4
CONTENTS Example 7 3-4
Example 8 3-4
Example 9 3-5
Example 10 3-5
Example 11 3-5
Example 12 3-6
Example 13 3-6
Preface Example 14 3-6
Example 15 3-6
Example 16 3-7
Example 17 3-7

4. An Overview
1. Introduction
Variable Allocation 4-1
The UNIX Operating System 1-1 Global Variables 4-1
Utilities 1-1 The 'c' Preprocessor 4-1
Other Documentation 1-1 Section One 4-2
UNIX Programmer's Manual 1-2 The First Group of '.h' Files 4-2
UNIX Documents 1-2 Assembly Language Files 4-2
UNIX Operating System Source Code 1-2 Other Files in Section One 4-2
Source Code Sections 1-3 Section Two 4-3
Source Code Files 1-3 Section Three 4-3
Use of these notes 1-3 Section Four 4-3
A Note on Programming Standards 1-3 Section Five 4-4

2. Fundamentals 5. Two Files


The Processor 2-1 The File 'malloc.c' 5-1
Processor Status Word 2-1 Rules for List Maintenance 5-1
General Registers 2-1 ma110c (2528) 5-2
Instruction Set 2-2 mfree (2556) 5-3
Addressing Modes 2-3 In conclusion .•• 5-3
Unix Assembler 2-4 The File 'prf.c' 5-3
Memory Management 2-4 printf (2340) 5-3
Segmentation Registers. 2-4 printn (2369) 5-4
Page Description Register 2-5 putchar (2386) 5-4
Memory Allocation 2-5 panic (2419) 5-5
Status Registers 2-5 prdev (2433) 5-6
Initial Conditions 2-5 deverror (2447) 5-6
Special Device Registers 2-5 Included Files 5-6
6. Getting Started Interrupt Priorities 9-2
Rules for Interrupt Handlers 9-2
Operator Actions 6-1 Traps 9-3
start (0612) 6-1 Assembly Language 'trap' 9-3
main (1550) 6-2 Return 9-3
Processes 6-3
Initialisation of proc[01 6-3
The story continues 6-4 10. The Assembler "Trapn Routine
sched (1940) 6-4
sleep (2066) 6-4 Sources of Traps and Interrupts HI-I
swtch (2178) 6-4 fuibyte (0814) 10-1
main revisited 6-5 fuiword (0844) 10-1
Interrupts HI-2
call (0776) 10-2
7. Processes User Program Traps HI-2
The Kernel Stack 10-3
The Process Image 7-1
The Proc Structure (0358) 7-2
The user Structure (0413) 7-2 11. Clock Interrupts
The Per Process Data Area 7-2
The Segments 7-3 clock (3725) ll-l
Execution of an Image 7-3 timeout (3845) 11-2
Kernel Mode Execution 7-3
User Mode Execution 7-3
An Example 7-3 12. Traps and System Calls
Setting the Segmentation Registers 7-4
estabur (1650) 7-4 trap (2693) 12-1
sureg (1739) 7-4 Kernel Mode Traps 12-1
newproc (1826) 7-5 User Mode Traps 12-2
System Calls 12-2
System Call Handlers 12-3
8. Process Management The File 'sysl.c' 12-3
exec (3020) 12-3
Process Switching 8-1 fork (3322) 12-4
Interrupts 8-1 sbreak (3354) 12-4
Program Swapping 8-1 The Files 'sys2.c' and 'sys3.c' 12-4
Jobs 8-1 The File 'sys4.c' 12-4
Assembler Procedures 8-2
savu (0725) 8-2
retu (0740) 8-2 13. Software Interrupts
aretu (0734) 8-2
swtch (2178) 8-2 An ticipa tion 13-1
setpri (2156) 8-2 Causation 13-1
sleep (2066) 8-3 Effect 13-1
wakeup (2113) 8-3 Tracing 13-2
setrun (2134) 8-3 procedures 13-2
expand (2268) 8-3 A. Anticipation 13-2
swtch revisited 8-4 B. Causation 13-2
Critical Sections 8-4 C. Effect 13-2
D. Tracing 13-2
ssig (3614) 13-2
9. Hardware Interrupts and Traps kill (3630) 13-2
signal (3949) 13-3
Hardware Interrupts 9-1 psignal (3963) 13-3
The Interrupt Vector 9-2 issig (3991) 13-3
Interrupt Handlers 9-2 psig (4043) 13-3
Priorities 9-2 core (4094) 13-3
grow (4136) 13-3 bwrite (48139) 17-3
exit (3219) 13-4 bawrite (4856) 17-3
rexit (32135) 13-4 bdwrite (4836) 17-3
wait (32713) 13-4 bflush (S229) 17-3
stop (41H6) 13-S physio (S259) 17-3
wait (32713) (continued) 13-S
ptrace (4164) 13-S
procxmt (42134) 13-6 18. File Access and Contral
Section Four 18-1
14. Program Swapping File Characteristics 18-1
System Calls 18-2
Text Segments 14-1 Control Tables 18-2
sched (19413) 14-2 file (S5f17) 18-2
xswap (4368) 14-3 inode (56S9) 18-2
xalloc (4433) 14-3 Resources Required 18-2
xfree (4398) 14-3 Opening a File 18-3
creat (5781) 18-3
openl (S8f14) 18-3
IS. Introduction to Basic I/O open (5763) 18-3
openl revisited 18-3
The File 'buf.h' lS-1 close (5846) 18-3
devtab (45S1) lS-1 closef (6643) 18-4
The File 'conf.h' lS-1 iput (7344) 18-4
The file 'conf.c' lS-1 Deletion of Files 18-4
System generation lS-2 Reading and writing 18-4
swap (5196) lS-2 rdwr (5731) 18-S
Race Conditions lS-2 readi (6221) 18-5
Reentrancy lS-3 writei 18-6
For the Uninitiated 15-3 iomove (6364) 18-6
Additional Reading lS-3 bmap (641S) 18-6
Leftovers 18-6
16. The RK Disk Driver
19. File Directories and Directory
Control Status Register (RKCS) 16-1
Word Count Register (RKWC) 16-1 Files
Disk Address Register (RKDA) 16-2 File Names 19-1
The file 'rk.c' 16-2 The Directory Data Structure 19-1
rkstrategy (5389) 16-2 Directory Files 19-1
rkaddr (S42f1) 16-2 namei (7518) 19-1
devstart (Sfl96) 16-2 Some Comments 19-2
rkintr (5451) 16-2 link (59139) 19-3
iodone (5 ~'ll8) 16-2 wdir (7477) 19-3
maknode (7455) 19-4
unlink (3SlfI) 19-4
17. Buffer Manipulation mknod (S952) 19-4
access (6746) 19-4
Flags 17-1
A Cache-like Memory 17-1
clrbuf (51338) 17-1 213. File Systems
incore (4899) 17-1
getblk (4921) 17-1 The 'Super Block' (5561) 213-1
brelse (4869) 17-2 The 'mount' table (13272) 213-1
binit (SflS5) 17-2 iinit (6922) 213-2
bread (47S4) 17-3 Mounting 213-2
breada (4773) 17-3 smount (61386) 213-2
Notes 20-2 stty (8183) 24-2
iget (7276) 20-2 sgtty (8201) 24-2
getfs (7167) 20-3 klsgtty (8090) 24-2
update (7201) 20-3 ttystty (8577) 24-2
sumount (6144) 20-4 The DLll/KLll Terminal Device Handler 24-2
Resource Allocation 20-4 Device Registers 24-3
alloc (6956) 20-4 Receiver Status Register (klrcsr) 24-3
itrunc (7414) 20-4 Receiver Data Buffer Register (klrbuf) 24-3
free (7000) 20-5 Transmitter Status Register (kltcsr) 24-3
iput (7344) 20-5 Transmitter Data Buffer Register (kltbuf) 24-3
ifree (7134) 20-5 UNIBUS Addresses 24-3
iupdat (7374) 20-5 Software Considerations 24-3
Interrupt Vector Addresses 24-3
Source Code 24-3
21. Pipes klopen (8023) 24-4
klclose (8055) 24-4
pipe (7723) 21-1 klxint (8070) 24-4
readp (7758) 21-1 klrint (8078) 24-4
wri tep (7805) 21-1
plock (7862) 21-1
prele (7882) 21-1 25. The File ntty.c n
flushtty (8252) 25-1
22. Character Oriented Special Files wflushtty (8217) 25-1
Character Input 25-1
LPII Line Printer Driver 22-1 ttread (8535) 25-1
lpopen (8850) 22-1 canon (8274) 25-1
Notes 22-2 Previous character was not a backslash 25-2
lpoutput (8986) 22-2 Previous character was a backslash 25-2
lpstart (8967) 22-2 Character ready 25-2
lpint (8976) 22-2 line completed 25-2
lpwrite (8870) 22-3 Notes 25-2
lpclose (8863) 22-3 ttyinput (8333) 25-2
Discussion 22-3 Character Output 25-3
lpcanon (8879) 22-3 ttwrite (8550) 25-3
For idle readers: A suggestion 22-4 ttstart (8505) 25-3
ttrstrt (8486) 25-3
ttyoutput (8373) 25-3
23. Character Handling Terminals with a restricted character set 25-3
A. The test for 'TTLOWAT' (Line 8074) 25-4
cinit (8234) 23-2 B. Inactive Terminals 25-5
getc (0930) 23-2 well, that's all, folks ••• 25-5
putc (0967) 23-2
Control Characters 23-3
Graphic Characters 23-3 26. Suggested Exercises
Graphic Character Sets 23-3
UNIX Conventions 23-3 Section One 26-1
maptab (8117) 23-4 Section Two 26-2
partab (7947) 23-4 Section Three 26-2
Section Four 26-2
Section Five 26-2
24. Interactive Terminals General 26-2
Interfaces 24-1
The 'tty' Structure (7926) 24-1
Note 24-2
Initialisation 24-2
it runs on a system which was already The second approach is the "building
available to us; block" approach, wherein the students
are enabled to synthesise a small scale
it is compact and accessible; or "toy" operating system for them-
selves. While undoubtedly this can be a
it provides an extensive set of very valuable exercise, if properly organ-
usable facilities; ised, it cannot but fail to encompass
the complexity and sophistication of
it is intrinsically interesting, and real operating systems, and is usually
in fact breaks new ground in a biased towards one aspect of operating
number of areas. system design, such as process syn-
chronisation.

Not least amongst the charms and vir-


tues of the UNIX Time-sharing System is The third approach is the II~ study"
the compactness of its source code. approach. This is the one originally
The source code for the permanently recommended for the Systems Programming
resident "nucleus" of the system when course in IICurriculum '68 11 , the report
only a small number of peripheral dev- of the ACM Curriculum Committee on Com-
ices is represented, is comfortably puter Science, published in the March,
less than 9000 lines of code. 1968 issue of the IICommunications of
the ACMII.

It has often been suggested that 10,000


lines of code represents the practical Ten years ago, this approach, which
limit in size for a program which is to advocates devoting IImost of the course
be understood and maintained by a sin- to the study of a single system ll was
gle individual. unrealistic because the cost of provid-
ing adequate student access to a suit-
able system was simply too high.
Most operating systems either exceed
PREFACE this limit by one or even two orders of
magnitude, or else offer the user a Ten years later, the economic picture
very limited set of facilities, i.e. has changed significantly, and the
either the details of the system are costs are no longer a decisive disad-
inaccessible to all but the most deter- vantage if a minicomputer system can be
This book is an attempt to explain in mined, dedicated and long-suffering the subject of study. The considerable
detail the nucleus of one of the most student, or else the system is rather advantages of the approach which under-
interesting computer operating systems specialised and of little intrinisic takes a detailed analysis of an exist-
to appear in recent years. interest. ing system are now attainable.

It is the UNIX Time-sharing System, There seem to be three main approaches In our opinion, it is highly beneficial
which runs on the larger models of to teaching Operating Systems. for students to have the opportunity to
Digital Equipment Corporation's PDPII study a working operating system in all
computer system, and was developed by its aspects.
Ken Thompson and Dennis Ritchie at Bell First there is the "general principles"
Laboratories. It was first announced to approach, wherein fundamental princi
the world in the July, 1974 issue of pIes are expounded, and illustrated by Moreover it is undoubtedly good for
the "Communications of the ACM". references to various existing systems, students majoring in Computer Science,
(most of which happen to be outside the to be confronted at least once in their
students' immediate experience). This careers, with the task of reading and
Very soon in our experience with UNIX, is the approach advocated by the COSINE understanding a program of major dimen-
it suggested itself as an interesting Committee, but in our view, many stu- sions.
candidate for formal study by students, dents are not mature or experienced
for the following reasons: enough to profit from it.

UNIX Operating System i-I Preface


In 1976 we adopted UNIX as the subject printed in July, 1976. This is a spe- Acknowledgements
for case study in our courses in cially edited selection of code from
Operating Systems at the University of the Level Six version of UNIX, as The preparation of these notes has been
New South Wales. These notes were received by us in December, 1975. encouraged and supported by many of my
prepared originally for the assistance colleagues and students including David
of students in those courses (6.602B Carrington, Doug Crompton, Ian Hayes,
and 6.657G). During 1976, an initial version of the David Horsfall, Peter Ivanov, Ian John-
present notes was distributed in stone, Chris Maltby, Dave Milway, John
roneoed form, and only in the latter O'Brien and Greg Rose.
The courses run for one semester each. part of the year were the facilities of
Before entering either course, students the "nroff" text formatting program
are presumed to have studied the PDPll exploited. The opportunity has Pat Mackie and Mary Powter did much of
architecture and assembly language, and recently been taken to revise and the initial typing, and Adele Green has
to have had an opportunity to use the "nroff" the earlier material, to make assisted greatly in the transfer of the
UNIX operating system during exercises some revisions and corrections, and to notes to "nroff" format.
for earlier courses. integrate them into their present form.
David Millis and the Publications Sec-
In general, students seem to find the A decision had to be made quite early tion of the university of New South
new courses more onerous, but much more regarding the order of presentation of Wales have assisted greatly with the
satisfying than the previous courses the source code. The intention was to mechanics of publication, and Ian John-
based 6n the "general principles" provide a reasonably logical sequence stone and the Australian Graduate
approach .of the COSINE Committee. for the student who wanted to learn the School of Management provided facili-
whole system. With the benefit of ties for the preparation of the final
hindsight, a great many improvements in draft.
Some mention needs to be made regarding detail are still possible, and it is
the documentation provided by the intended that these changes will be
authors of the UNIX system. As repro- made in some future edition. Throughout this project, my wife
duced for use on our campus, this Marianne has given me unfailing moral
comprises two volumes of A4 size paper, support and much practical support with
with a total thickness of 3 cm, and a It is our hope that this book will be proof-reading.
weight of 1250 grams. of interest and value to many students
of the UNIX Tlme-sharing System.
Although not prepared primarily for use Finally Ken Thompson and Dennis Ritchie
A first observation is that the whole as a reference work, some will wish to started it all.
documentation is not unreasonably tran- use it as such. The indices provided at
sportable in a student's brief case. the end should go some of the way
However it must not be assumed that towards satisfying the requirement for To all the above, I wish to express my
this amount of docu~entation, which is reference material at this level. sincere thanks.
written in a fresh, terse, whimsical
style, is necessarily inadequate.
Since these notes refer to proprietary The co-operation of the "nroff" program
material administered by the Western must also be mentioned. Without it,
In fact the second observation (which Electric Company, they can only be made these notes could never have been pro-
is only made after considerable experi- available to licensees of the UNIX duced in this form. However it has
ence) is that for reference purposes, Time-sharing System, and hence are yielded some of its more enigmatic
the documentation is remarkably unable to be published through more secrets so reluctantly, that the
comprehensive. However there is plenty usual channels. author's gratitude is indeed mixed.
of scope for additional tutorial Certainly "nroff" itself must provide a
material, one part of which, it is fertile field for future practitioners
hoped, is satisfied by these notes. Corrections, criticism and suggestions of the program documenter's art.
for improvement of these notes will be
very welcome.
The actual UNIX operating system source
code is recorded in a separate compan- Jonn Lions
ion volume entitled "UNIX· Operating Kensington, NSW
System Source Code", which was first May, 1977

UNIX Operating System i-2 Preface


Much of the effectiveness of UNIX Utilities
derives from the simple and direct
implementation, by two people (presum- The remaining part of UNIX (which is
ably sharing the same office!) using an much larger!) is composed of a set of
appropriate high level language called suitably tailored programs which run as
pC", and restrained by the very defin- "user programs", and which, for want of
ite size limitations of the PDPll. a better term, may be termed "utili-
ties".

Not only is UNIX effective, but it is


accessible in a way that most other Under this heading come a number of
systems are not: the amount of material programs with a very strong symbiotic
which must be mastered in order to gain relationship with the operating system
a reasonably deep understanding of the such as
system is not impossibly large. By way
of comparison, OS/360 and its succes- the "shell" (the command language
sors are far too complex to be com- interpreter)
pletely understood by anyone indivi-
dual. Most major operating systems "/etc/init" (the terminal configura-
require many months of study before an tion controller)
individual will be ready to make major
modifications to the system. and a number of file system management
programs such as:

Of course there are systems which are check du rmdir


easier to understand than UNIX but, it chmod mkdir sync
CHAPTER ONE may be asserted, these are invariably clri mkfs umount
much simpler and more modest in what df mount update
Introduction they attempt to achieve. As far as the
list of features offered to users is
concerned, UNIX is in the "big league". It should be pointed out that many of
In fact it offers many features which the functions carried out by the
are notable by their absence from some above-named programs are regarded as
"UNIX" is the name of a time-sharing of the well-known major systems. operating system functions in other
system for PDPll computers, written by computer systems, and that this cer-
Ken Thompson and Dennis Ritchie at Bell tainly does contribut-e significantly to
Laboratories. It was described by them the bulk of these other systems aS~COm­
in the July, 1974 issue of the "Commun- The UNIX Operating System pared _ with the UN[:X Operating System
ications of the ACM". (in the way we have defined ~t) .
The purpose of this document, and its
companion, the "UNIX Operating System
UNIX has proved to be effective, effi- Source Code", is to present in _detail Descriptions of the function and use of
cient and reliable in operation and was that part of the UNIX time-sharing sys- the above programs may be found in the
in use at more than 150 installations tem which we choose to call the "UNIX "UNIX Programmer's Manual" (UPM) ,
by the end of 1976. Operating System", namely the code either in Section I (for the commonly
which is permanently resident in the used programs) or in Section VIII (for
main memory during the operation of the programs used only by the System
The amount of effort to write UNIX, UNIX. This code has the following Manager) .
while not inconsiderable in itself ( major functions:
-10 man years up to the release of the
Level Six system) is insignificant when initialisation;
compared to other systems. (For process management; Other Documentation
instance, by 1968, OS/360 was reputed system calls;
to have consumed more then five man interrupt handling; These notes make frequent reference to
millennia and TSS/360, another IBM input/output operations; the "UNIX Programmer's Manual" (UPM),
operating system, more than one man file management. occasional reference to the "UNIX
millennium. ) Documents" booklet, and constant

UNIX Operating System 1-1 Introduction


reference to the "UNIX Operating System Section IV describes "special UNIX Summary provides a check list
Source Code". files", which is another name for which will be useful in answering
peripheral devices. Some of these the question "what does an operat-
are relevant, and some merely ing system do?"
All these are relevant to a complete interesting. It depends where you
understanding of the system. In addi- are;
tion, a full study of the asse~bly
language routines requires reference to Section V describes "File Formats UNIX Operating System Source Code
the "PDPII Processor Handbook", pub- and Conventions". A lot of highly
lished by Digital Equipment Corpora- relevant information is tucked This is an edited version of the
tion. away in this section; operating system as supplied by Bell
Laboratories.
Sections VI and VII describe "User
Maintained" -programs and subrou- The code selection presumes a "model"
UNIX Programmer'~ Manual tines. No UNIXophile will ignore system consisting of:
these sections, but they are not
The UPM is divided into eight major particularly relevant to the PDPII/4~ processor;
sections, preceded by a table of con- operating system;
tents and a KWIC (Key Word In Context) RK~5 disk drives;
index. The latter is mostly very use- Section VIII describes "system
ful but is occasionally annoying, as maintenance" (software, not LPII line printer;
some indexed material does not exist, hardware!). There is lots of use-
and some existing material is not ful information here, especially PCII paper tape reader/punch;
indexed. if you are interested in how a
UNIX installation is managed. KLII terminal interface.

Within each section of the manual, the


material is arranged alphabetically by The principal editorial changes to the
subject name. The section number is UNIX Documents source code are as follows:
conventionally appended to the subject
name, since some subjects appear in This is a somewhat miscellaneous col- the order of presentation of files
more than ~ne section, e.g. "CHDIR(I)" lection of essays of varying degrees of has been changed;
and "CHDIR(II)". relevance:
the order of material within
Section I contains commands which Setting ~ UNIX really belongs in several files has been changed;
either are recognised by the Section VIII of the UPM (it's
"shell" command interpreter, or relevant); to a very limited extent, code has
are the names of standard user been transferred between files
utility programs; The UNIX Time-sharing System is an (with hindsight a lot more of this
updated version of the original would have been desirable);
Section II contains "system calls" "Communications of the ACM" paper.
WhICh are operating system rou- It should be re-read at least once about 5% of the lines have been
tines which may be invoked from a per month; shortened in various ways to less
user program to obtain operating than 66 characters (by elimination
system service. A study of the UNIX for Beginners is useful if of blanks, rearrangement of com-
operating system will render most your UNIX experience is still lim- ments, splitting into two lines,
of these quite familiar; ited; etc.) ;

Section III contains "subroutines" The tutorials on "C" and the edi- a number of comments consisting of
which are- library routines which tor, and the reference manuals for a line of underscore characters
may be called from a user program. "C" and the assembler are highly have been introduced, particularly
To the ordinary programmer, the useful unless you are completely at the end of procedures;
distinctions between Sections II expert;
and III often appear somewhat the size of each file has been
arbitrary. Most of Section III IS The UNIX I/O System provides a -"
adjusted to an exact multiple Ui.
irrelevant to the operating sys- good--o\Tervlew of many features of 5~ lines by padding with blank
tem; the operating system; lines;

UNIX Operating System 1-2 Introduction


a four digit line number has been Section Two deals with interrupts, the source code, are not essential for
inserted at the beginning of each traps, system calls and signals understanding the UNIX operating sys-
line to identify it for cross-' (software interrupts); tern. It is perfectly possible to
referencing. proceed without them, and you should
Section Three deals primarily with attempt to do so as long as you can.
disk operations for program swap-
The source code has been printed in 3 ping and basic, block oriented
double column format with fifty lines input/output. It also deals with The notes are a crutch, to aid you when
per column, giving one hundred lines the manipulation of the pool of the going becomes difficult. If you
per sheet (or page). Thus there is a large buffers; attempt to read each file or procedure
convenient relationship between line on your own first, your initial pro-
numbers and sheet numbers. Section Four deals with files and gress is likely to be slower, but your
file systems: their creation, ultimate progress much faster. Reading
maintenance, manipulation and des- other people's programs is an art which
A number of summaries have been truction; should be learnt and practised
included at the beginning of the Source because it is useful!
Code volume: Section Five deals with "character
special fTIes", which is the UNIX
A Table of Contents showing files term for slow speed peripheral
Tn--order- of appearance, together devices which operate out of a ~ Note on Programming Standards
with the procedures they contain; common, character oriented, buffer
pool. You will find that most of the code in
An alphabetical list of procedures UNIX is of a very high standard. Many
with line numberS;-- sections which initially seem complex
The contents of each section is out- and obscure, appear in the light of
A list of Defined Symbols with lined in more detail in Chapter Four. further investigation and reflection,
theIrvalues; to be perfectly obvious and "the only
way to fly".
~ Cross Reference Listing giving
the line numbers where each symbol Source Code Files
is used. (Reserved words in "C" For this reason, the occasional com-
and a number of commonly used sym- Each of the five sections just ments in the notes on programming
bols such as "p" and "u" have been described consists of several source style, almost invariably refer to
omi tted.) code files. The name of each file apparent lapses from the usual standard
includes a suffix which identifies its of near perfection.
type:

".s" denotes a file of a'ssembly What caused these? Sometimes it appears


language statements; that the original code has been patched
Source Code Sections expediently. More than once apparent
".c" denotes a file of executable "C" lapses have proved not to be such: the
The source code has been divided into language statements; "bad" code has been found in fact to
five sections, each devoted primarily incorporate some subtle feature which
to a single major aspect of the system. ".h" denotes a file of "C" language was not at all apparent initially. And
statements which is not for some allowance is certainly needed for
separate compilation, but for occasional human weakness.
The intention, which has been largely inclusion in other ".c" files
achieved, has been to make each section when they are compiled i.e. the
sufficiently self-contained so that it ".h" files contain global But on the whole you will find that the
may be studied as a unit and before its declarations. authors of UNIX, Ken Thompson and
successors have been mastered: Dennis Ritchie, have created a program
of great strength, integrity and effec-
Section One deals with system ini- tiveness, which you should admire and
tialisation, and process manage- Use of these notes seek to emulate.
ment. It also contains all the
assembly language routines; These notes, which are intended to sup- -000-
plement the comments already present in

UNIX Operating System 1-3 Introduction


The Processor General Registers

The processor, which is designed around The processor incorporates a number of


a sixteen bit word length for instruc- sixteen bit registers of which eight
tions, data and program addresses, are accessible at any time as "general
incorporates a number of high speed registers". These are known as
registers.
r0, rl, r2, r3, r4, r5, r6 and r7.

Processor Status Word The first six of the general registers


are available for use as accumulators,
This sixteen bit register has subfields address pointers or index registers.
which are interpreted as follows: The convention in UNIX for the use of
these registers is as follows:
bits description
r0, rl are used as temporary accu-
14,15 current mode (00 kernel;) mulators during expression evalua-
tion, to return results from a
12,13 previous mode (11 user;) procedure, and in some cases to
communicate actual parameters dur-
5,6,7 processor priority (range 0 .• 7) ing a procedure call;

4 trap bit r2, r3, r4 are used for local


variables--during procedure execu-
3 N, set if the previous result tion. Their values are almost
was negative always stored upon procedure
CHAPTER TWO entry, and restored upon procedure
2 Z, set if the previous result exit;
Fundamentals was zero
r5 is used as the head pointer to
1 V, set if the previous a "dynamic chain" of procedure
operation gave an overflow activation records stored in the
current stack. It is referred to
UNIX runs on the larger models of the o C, set if the previous as the "environment pointer".
PDPll series of computers manufactured operation gave a carry
by Digital Equipment Corporation. This
chapter provides a brief summary of The last two of the "general registers"
certain selected features of these com- do have a special significance and are
puters with particular reference to the The processor can operate in two dif- to all intents, "special purpose":
PDPll/40. ferent modes: kernel and user. Kernel
mode is the more privileged of the two r6 (also known as "sp") is used as
and is reserved by the operating system the stack pointer. The PDPll/40
If the reader has not previously made for its own use. The choice of mode processor incorporates two
the acquaintance of the PDPll series determines: separate registers which may be
then he is directed forthwith to the used as "sp", depending on whether
·PDPll Processor Handbook", published The set of memory management segmen- the processor is in kernel or user
by DEC. tation registers which is used mode. No other one of the general
to translate program virtual registers is duplicated in this
addresses to physical addresses; way;
A PDPll computer consists of aproces-
sor (also called a CPU) connected to The actual register used as r6, the r7 (also known as "pc") is used as
one or more memory storage units and "stack pointer"; the program counter, i.e. the
peripheral controllers via a bi- instruction address register.
directional parallel communication line Whether certain instructions such as
called the "bnibus". "halt" will be obeyed.

UNIX Operating System 2-1 Fundamentals


Instruction Set bhi Branch if higher, i.e. if C pc, rn, -(sp) = dest., pc, rn
and ·z = ";
The PDPII instruction set includes dou- mfpi Push onto the current stack the
ble, single and zero operand instruc- bhis Branch if higher or the same, i.e. value of the designated word in
tions. Instruction length is usually if C = 0; the "previous" address space;
one word, with some instructions being
extended to two or three words with bic Clear each bit to zero in the des- mov Copy the source value to the des-
additional addressing information. tination that corresponds to a tination;
non-zero bit in the source;
mtpi Pop the current stack and store
With single operand instructions, the bis Perform an "inclusive or" of the value in the designated word
operand is usually called the "destina- source and destination and store in the "previous" address space;
tion"; ~i~h double operand instruc- the result in the destination;
tions, the two operands are called the mul Multiply the contents of rn and
"source" and "destination". The various bit Perform a logical "and" of the the source. If n is even, the pro-
modes of addressing are described source and destination to set the duct is left in rn and r(n+l);
later. condition codes;
reset Set the IN IT line on the Unibus
ble Branch if greater than or equal -----for 10 milliseconds. This will
The following instructions have been to, i.e. if Z = 1 or N = V; have the effect of reinitialising
used in the file "m40.s" i.e. the file all the device controllers;
of assembly language support routines blo Branch if lower (than zero) , i.e.
for use with the 11/40 processor. Note ifC = 1 ; ror Rotate all bits of the destination
that N, Z, V and C are the condition one place to the right. Bit 0 is
codes i.e. bits in the processor status bne Branch if not equal (to zero) , loaded into C, and the previous
word ("ps"), and that these are set as i.e. i f Z 0; value of C is loaded into bit 15;
side effects of many instructions
besides just "bit", "cmp" and "tst" br Branch to a location within the rts Return from subroutine. Reload pc
(whose stated function is to set the range (.-128, .+127) where " " is from rn, and reload rn from the
condition codes). the current location; stack;

adc Add the contents of the C bit to clc Clear C; rtt Return from interrupt or trap.
the destination; Reload both pc and ps from the
clr Clear destination to zero; stack;
add Add the source to the destination;
Compare the source and destination sbc Subtract the carry bit from the
ash Shift the contents of the defined to set the condition codes. N is destination;
register left the number of times set if the source value is less
specified by the shift count. (A than the destination value; sob Subtract one from the designated
negative value implies a right register. If the result is not
shift. ) ; dec Subtract one from the contents of zero, branch back "offset" words;
the destination;
ashc Similar to "ash" except that two sub Subtract the source from the des-
registers are involved; div The 32 bit two's complement tination;
integer stored in rn and r(n+l)
asl Shift all bits one place to the (where n is even) is divided by swab Exchange the high and low order
left. Bit 0 becomes 0 and bit 15 the source operand. The quotient bytes in the destination;
is loaded into C; is left in rn, and the remainder
in r (n+l) ; tst Set the condition codes, Nand Z,
asr Shift all bits one place to the according to the contents of the
right. Bit 15 is replicated and inc Add one to the contents of the destination;
bit 0 is loaded into C; destination;
wait Idle the processor and release the
beq Branch if equal, i.e. if Z = 1; ~ Jump to the destination; Unibus until a hardware interrupt
occurs.
Branch if greater than or equal Jump to subroutine. Register
to, i.e. if N = V; values are shuffled as follows:

UNIX Operating System 2-2 Fundamentals


The "byte" version of the following Autodecrement Mode. The register is of the "jump" is the content of the
instructions are used in the file decremented and then used to locate the word whose address is labelled by "0f"
"m40.s", as well as the "word" versions operand, e.g. plus the value of r0 (a small positive
described above: integer). This is a standard way to
inc -(r0) implement a mUlti-way switch.
bis inc mov - (r 1) ,r2
clr mov mov (r0)+,-(sp)
cmp tst clr - (sp)
The following two modes use the program
Index Mode. The register contains a counter as the designated register to
Addressing Modes value which is added to a sixteen bit achieve certain special effects.
word following the instruction to form
Much of the novelty and complexity of the operand address, e.g.
the PDPll instruction set lies in the Immediate Mode. This is the pc autoin-
variety of addressing modes which may clr 2(r0) crement mode. The operand is thus
be used for defining the source and movb 6 (sp) , (sp) extracted from the program string, i.e.
destination operands. movb reloc(r0) ,r0 it becomes an immediate operand, e.g.
mov =10(r2), (rl)
add $2,r0
The, 'a,dqres,s'ing\ mode,s which are uS,ed in add $2, (rl)
"m4'0\ .. s'" are des'cr'fbed below. Depending on your viewpoint, in this bic $17,r0
mode the, register is either an index mov $KISA0,r0
register or a base register. The mov $77406,(rl)+
Register Mode. The operand resides in latter case actually predominates in
one of the general registers, e.g. "m40.s". The third example above is
actually one of the few uses of a Relative Mode. This is the pc index
clr r0 regist~r as an index register. (Note mode. T~address relative to the
mov rl,r0 that " reloc" is an acceptable variable current program counter value is
add r4,r2 name. ) extracted from the program string and
added to the pc value to form the abso-
lute address of the operand, e.g.

In the following modes, the designated bic, $340,PS


register contains an address value There are two addressing modes whose bit $1,SSR0
which is used to locate the operand. use is limited to the following two inc SSR0
examples: mov (sp) ,KISA6

Register Deferred Mode. The register jsr pc,*(r0)+


contains the address of the operand, jmp *0f (r0) It may be noted that each of the modes
e.g. "index", "index deferred", "immediate"
and "relative" extends the instruction
inc (rl) ,The first example involves the use of size by one word.
asr (sp) the "autoincrement deferred" mode.
add (r2) ,rl (This occurs in the routine "callI" on
lines 0785, 0799.) The address of a The existence of the "autoincrement"
routine intended for execution is to be and "autodecrement" modes, together
Autoincrement Mode. The register con- found in the word addressed by r0, i.e. with the special attributes of r6, make
tains the address of the operand. As a two levels of indirection are involved. it conveniently possible to store many
side effect, the register is incre- The fact that r0 is incremented as a operands in a stack, or LIFO list,
mented after the operation, e.g. side effect is not relevant in this which grows downwards in memory. There
usage. are a number of advantages which flow
clr (rl)+ from this: code string lengths are
mfpi (r0)+ shorter and it is easier to write posi-
mov (rl)+,r0 The second example (which occurs on tion independent code.
mov r2,(r0)+ lines 1055, 1066) is an instance of the
cmp (sp)+, (sp)+ "index deferred" mode. The destination

UNIX Operating System 2-3 Fundamentals


Unix Assembler single digit followed by a addresses into physic·al addresses of
colon, and need not be unlque. eighteen bits or more. The mechanism,
The UNIX assembler is a two pass assem- A reference to "nf" where Un" is which is known as the memory management
bler without macro facilities. A full a digit, refers to the first unit, is simpler on the PDPll/40 than
description may be found in the "UNIX occurrence of the label "n:" on the 11/45 or the 11/70.
Assembler Reference Manual" which is found by searching forward.
contained in the "UNIX Documents"
A reference to "nb" is similar On the PDPll/40 the memory management
except that the search is con- unit consists of two sets of registers
The following brief notes should be of ducted in the backwards direc- for mapping virtual addresses to physi-
some assistance: tion; cal addresses. These are known as
"active page registers" or "segmenta-
(a) a string of digits may define a (i) An assignment statement of the tion registers". One set is used when
constant number. This is assumed form the processor is in user mode and the
to be an octal number unless the identifier = expression other set, in kernel mode. Changing the
string is terminated by a period contents of these registers changes the
("."), when it is interpreted as associates a value and type with details of these mappings. The ability
a decimal number. the identifier. In the example to make these changes is a privilege
that the operating system keeps firmly
(b) The character "I" is used to to itself.
signify that the rest of the
line is a comment; the operator I~I delivers the
value of the first operand and
(c) If two or more statements occur the type of the second operand Segmentation Registers.
on the same line, they must be (in this case, "location");
separated by semicolons; Each set of segmentation registers is
(j) The string quote symbols are "<" composed of eight pairs, each consist-
(d) The character • is used to and 11)" i ing of a "~ address re~ister" (PAR)
denote the current location; and a ~ descriptlon register"
(k) Statements of the form (PDR) •
(e) UNIX assembler uses the charac-
ters "$" and "*" where the DEC .globl x, y, z
assemblers use "I" and "@" Each pair of registers controls the
respectively. serve to make the names "x", "yO mapping of one ~ i.e. one eighth
and HZ" external; part of the virtual address space which
(f) An identifier consists of a set has a size of 8K bytes (4K words).
of alphanumeric characters (1) The names "edata" and "end"
(including the underscore) • are loader pseudo variables
Only the first eight characters which the define the size of the Each page may be regarded as an aggre-
are significant and the first data segment, and the data seg- gate of 128 blocks, each of 6~bytes
may not be numeric; ment plus the bss segment (32 words). This latter size is the
respectively. "grain size" for the memory mapping
(g) Names which occur in DC" pro- function, and as a practical conse-
grams for variables which are to quence, it is also the "grain size" for
be known globally, are modified memory allocation.
by the addition of a prefix con- Memory Management
sisting of a single underscore.
Thus for example the variable Programs running on the PDPII may Any virtual address belongs to one page
" regloc" which occurs on line address directly up to 64K bytes (32K or other. The corresponding physical
1025 in the assembly language words) of storage. This is consistent address is generated by adding the
file, "m40.s", refers to the with an address size of sixteen bits. relative address within the page to the
same variable as "regloc" at Since it is economical and not unrea- contents of the corresponding PAR to
line 2677 of the file, "trap.c"; sonable to do so the larger PDPII form an extended address (18 bits on
models may be equipped with larger the PDPll/40 and 11/45; 22 bits on the
(h) There are two kinds of statement amounts of memory (up to 256K bytes for 11/70) .
labels: name labels and numeric the PDPll/40) plus a mechanism for con-
labels. The latter consist of a verting sixteen bit virtual (program)

UNIX Operating System 2-4 Fundamentals


Thus each page address register acts as should be allocated (except to the space available on the PDPll/40.
a relocation register for one page. extent that they must begin and end on
a 32 word boundary). These areas may be
allocated in any order and may overlap
Each page can be divided on a 32 word to any extent. Initial Conditions
boundary into two parts, an upper part
and lower part. Each such part has a When the system is first started after
size which is a multiple of 32 words. In practice the allocation of areas of all the devices on the Unibus have been
In pa~ticular one part may be null, in physical memory is much more discip- reinitialised, the memory management
which case the other part coincides lined as We shall see in Chapter Seven. unit is disabled and the processor is
with the whole page. Areas for pages which are related are in kernel mode.
most often allocated contiguously and
in the order of their page numbers, so
One of the two parts is deemed to con- that all the segment areas associated Under these circumstances, virtual
tain valid virtual addresses. Addresses with a single program are contained (byte) addresses in the range 0 to 56K
in the remaining part are declared within one or at most two large areas are mapped into identically valued phy-
invalid. Any attempt to reference an of physical memory. sical addresses. However the highest
invalida,ddress will be trapped by the page of the virtual address space is
hardware. The advantage of this scheme mapped into the highest page of the
is that space in the physical memory physical address space, i.e. on the
need only be allocated for the valid Memory Management PDPll/40 or 11/45, addresses in the
part of a page. Status Registers range
In addition to the segmentation regis- 0160000 to 0177777
ters, on the PDPll/40 there are two
Page Description Register memory management status registers: are mapped into the range

The page description register defines: SR0 contains abort error flags and 0760000 to 0777777
other essential information for
(a) the size of the lower part of the operating system. In particu-
the page. (The number stored is lar memory management is enabled
actually the number of 32 word when bit 0 of SR0 is on; Special Device Registers
blocks less one);
SR2 is loaded with the 16 bit vir- The high page of physical memory is
(b) a bit which is set when the tual address at the beginning of reserved for various special registers
upper part is the valid part. each instruction fetch. associated with the processor and the
(Also known as the "expansion peripheral devices. By sacrificing one
direction" bit); page of memory space in this way, the
PDPll designers have been able to make
(c) access mode bits defining "no "i" and "~" Spaces the various device registers accessible
access" or "read only access" or without the need to provide special
"read/write access". In the PDPll/45 and 11/70 systems, instruction types.
there are additional sets of segmenta-
tion registers. Addresses created using
Note that if the valid part is null, the pc register (r7) are said to belong The method of assignment of addresses
this fact must be shown by setting the to "i" space, and are translated by a to registers in this page is a black
access bits to "no access". different set of segmentation registers art: the values are hallowed by tradi-
from those used for the remaining tion and are not to be questioned.
addresses which are said to belong to
"d" space.
-000-
Memory Allocation The advantage of this arrangement is
that both Hi" and lid" spaces may occupy
The hardware does not dictate the way up to 32K words, thus allowing the max-
areas in physical memory which imum space which can be allocated to a
correspond to the valid parts of pages program to be increased to twice the

UNIX Operating System 2-5 Fundamentals


There are two of the "UNIX Documents· Some Selected Examples
which relate directly to the ·C n
language: The examples which follow are taken
directly from the source code.
nC Reference Manual", by Dennis Ritchie
Example 1:.
"Programming in C - A Tutorial",
by Brian Kernighan The simplest possible procedure, which
does nothing, occurs twice(!) in the
source code as "nullsys" (2864) and
You should read them now, as far as you "nulldev" (6577), sic.
can, and return to reread them from
time to time with increasing comprehen- 6577 nulldev ()
sion. {
}

Learning to write "C" programs is not


required. However if you have the While there are no parameters, the
opportunity, you should attempt to parentheses, "(" and ")", are still
write at least a few small programs. required. The brackets "{" and "}"
This does represent the accepted way to delimit the procedure body, which is
learn a programming language, and your empty.
understanding of the proper use of such
items as:

semicolons; Example 1
n=n and "==";
"{n and "}"; The next example is a little less
"++n and " __ "; trivial:
declarations;
register variables; 6566 nodev ()
"if" and "for" statements; {
etc. u.u error ENODEV;

will be quickly reinforced.


CHAPTER THREE
The additional statement is an assign-
Reading "c" Programs You will find that nCR is a very con- ment statement. It is terminated by a
venient language for access1ng and semicolon which is part of the state-
manipulating data structures and char- ment, not a statement separator as in
acter strings, which is what a large Algol-like languages.
part of operating systems is about. As
Learning to read programs written in befits a terminal oriented language,
the "c" language is one of the hurdles which requires concise, compact expres- "ENODEV" is a defined symbol, i.e. a
that must be overcome before you will sion, nC n uses a large character set symbol which is replaced by an associ-
be able to study the source code of and makes many symbols such as "*" and ated character string by the compiler
UNIX effectively. "&" work hard. In this respect it preprocessor before actual compilation.
invites comparison with APL. "ENODEV" is defined on line 0484 as 19.
The UNIX convention is that defined
As with natural languages, reading is symbols are written in upper case, and
an easier skill to acquire than writ- There many features of "c" which are all other symbols in lower case.
ing. Even so you will need to be care- reminiscent of PLIl, but it goes well
ful lest some of the more subtle points beyond the latter in the range of
pass you by. facilities provided for structured pro- "=" is the assignment operator, and
gramming. "u.u error" is an element of the struc-
ture-nu". (See line 0419.) Note the use
of " " as the operator which selects an

UNIX Operating System 3-1 Reading "C" Programs


element of a structure. The element The three lines beginning with "do" the final value for "++b".
name is "u error" which may be taken as should be studied carefully. If "b" is
a paradigm-for the way names of struc- a "pointer to integer" type, then
ture elements are constructed in the The "--" operator obeys the same rules
UNIX source code: a distinguishing *b as the "++" operator, except that it
letter is followed by an underscore decrements by one. Thus "--c" enters an
followed by a name. denotes the integer pointed to. Thus to expression as the value after decremen-
copy the value pointed to by "a" to the tation.
location designated by "b", we could
write
Example 1 The "++" and" " operators are very
*b *a; useful, and are used throughout UNIX.
6585 bcopy (from, to, count) Occasionally you will have to go back
int *from, *to; to first principles to work out exactly
{ what their use implies. Note also
register *a, *b, c; If we wrote instead there is a difference between
a from;
b = to; b = a; *b++ and (*b)++
c = count;
do this would make the value of "b" the
*b++=*a++; same as the value of "a", i.e. "b" and
while (--c); "a" would point to the same place. These operators are applicable to
Here at least, that is not what is pointers to structures as well as to
required. simple data types. When a pointer
which has been declared with reference
to a particular type of structure is
The function of this procedure is very Having copied the first word from incremented, the actual value of the
simple: it copies a specified number of source to destination, we need to pointer is incremented by the size of
words from one set of consecutive loca- increase the values of "b" and "a" so the structure.
tions to another set. that the point to the next words of
their respective sets. This can be done
by writing We can now see the meaning of the line
There are three parameters. The second
line b b+l; a = a+l; *b++ = *a++;
int *from, *to; but "c" provides a shorter notation The word is copied and the pointers are
(which is more useful when the variable incremented, all in one hit.
specifies that the first two variables names are longer) viz.
are pointers to integers. Since no
specification is supplied for the third b++; a++; The line
parameter, it is assumed to be an
integer by default. or alternatively while (--c);
++b; ++a: delimits the end of the set of state-
The three local variables, a, b, and c, ments which began after the "do". The
have been assigned to registers, expression in parentheses "--c", is
because registers are more accessible evaluated and tested (the value tested
and the object code to reference them Now there is no difference between the is the value after decrementation). If
is shorter. "a" and "b" are pointers to statements "b++:" and "++b:" here. the value is non-zero, the loop is
integers and "c" is an integer. The repeated, else it is terminated.
register declaration could have been
written more pedantically as However "b++" and "++b" may be used as Obviously if the initial value for
terms in an expression, in which case "count" were negative, the loop would
register int *a, *b, c; they are different. In both cases the not terminate properly. If this were a
effect of incrementing "b" is retained, serious possibility then the routine
to emphasise the connection with but the value which enters the expres- would have to be modified.
integers. sion is the initial value for "b++" and

UNIX Operating System 3-2 Reading nC" Programs


Example ! would be executed, to terminate further Sheet 03 in the following form:
execution of "getf", and to return the
6619 getf (f) value of "fp" to the calling procedure 0358 struct proc
{ as the result of "getf". {
register *fp, rf;
rf = f;
i f (rf < 111 II rf >= NOFILE) The expression
goto bad;
fp = u. u ofile [rf] ; rf < 0 I I rf >= NOFILE proc [NPROC];
i f (fp I;; NULL)
return (fp); is the logical disjunction (norn) of
bad: the two simple relational expressions. "p" is a register variable of type
u.u error = EBADF; pointer to a structure of type "procH.
return (NULL);
An example of a "goto" statement and p = &proc [0] ;
associated label will be noted.
assigns to "p" the address of the first
The parameter "f" is a presumed element of the array "proc n • The
integer, and is copied directly into "fp" is assigned a value, which is an operator "&" in this context means "the
the register variable "rf". (This pat- address, from the "rf"-th element of address of "
tern will become so familiar that we the array of integers ·u ofile", which
will now cease to remark upon it.) is embedded in the structure "un.
Note that if an array has n elements,
the elements have subscripts 0, 1, .. ,
The three simple relational expressions The procedure "getf n returns a value to (n-l). Also it is permissible to write
its calling procedure. This is either the above statement more simply as
rf < 111 rf >=NOFILE fp 1= NULL the value of "fp" (i.e. an address) or
"NULL". p = proc;
are each accorded the value one if
true, and the value zero if false. The
first tests if the value of "rfn is
less than zero, the second, if "rf" is Example ~ There are two statements in between the
greater than the value defined by "do" and the "while".
nNOFILE n and the third, if the value of 2113 wakeup (chan)
"fp" is not equal to "NULL" (which is { The first of these could be rewritten
defined to be zero). register struct proc *p; more simply as
register c, i;
c = chan; if (p->p_wchan == c) setrun (p);
The conditions tested by the nif" p = &proc[0];
statements are the arithmetic expres- i = NPROC; i.e. the brackets are superfluous in
sions contained within parentheses. do { this case, and since "C" is a free form
if (p->p wchan c) { language, the arrangement of text
setrun-(p) ; between lines is not significant.
If the expression is greater than zero,
the test is successful and the follow- p++;
ing statement is executed. Thus if for while (--i); The statement
instance, "fp" had the value 001375,
then setrun (p);

fp 1= NULL There are a number of similarities invokes the procedure "setrun" passing
between this example and the previous the value of "pH as a parameter. (All
is true, and as a term in an arithmetic one. We have a new concept however, an parameters are passed by value.)
expression, is accorded the value one. array of structures. To be just a
This value is greater than zero, and little confusing, in this example it
hence the statement turns out that both the array and the The relation
structure are called "procH (yes, "C"
return (fp); allows this). They are declared on

UNIX Operating System 3-3 Reading nCo Programs


tests the equality of the value of DC" This procedure simply checks if there three statements whose execution
and the value of the element "p wchan" has been an error, and if the error depends on this value are enclosed in
of the structure pointed to by Up". indicator "u.u error" has not been set, the brackets "I" and "J".
Note that it would have been wrong to sets it to a-general error indication
have written ("EIO") •
Note that a calIon a procedure with no
p.p_wchan == c parameters must still be written with a
"B ERROR" has the value e4 (see line set of empty parentheses, sic.
because lip" is not the name of a struc- 4575) so that, with only one bit set,
ture. it can be used as mask to isolate bit suser ()
number 2. The operator "&" as used in
The second statement, which cannot be
combined with the first, increments Up"
by the size of the "procH structure, is the bitwise logical conjunction Example .!!
whatever that is. (The compiler can ("and") applied to arithmetic values.
figure i t out.) "CO provides a conditional expression.
Thus if "a" and Db" are integer vari-
The above expression is greater than ables,
In order to do this calculation one if bit 2 of the element "b flags"
correctly, the compiler needs to know of the "buf" structure pointed -to by (a > b ? a : b)
the kind of structure pointed at. When "bp" , is set.
this is not a consideration, you will is an expression whose value is that of
notice that often in similar situa- the larger of "a" and Db".
tions, Up" will be declared simply as Thus if there has been an error, the
expression
register *p: However this does not work if "a" and
Db" are to be regarded as unsigned
because it was easier for the program- integers. Hence there is a use for the
mer, and the compiler does not insist. is evaluated and compared with zero. procedure
Now this expression includes an assign-
ment operator "=". The value of the 6326 max (a, b)
The latter part of this procedure could expression is the value of "u.u error" char *a, *b:
have been written equivalently but less after the value of "bp->b flagi" has {
efficiently as been assigned to it. - i f (a > b)
return(a) :
return (b):
i = e; This use of an assignment as part of an
do expression is useful and quite common.
if (proc[i].p wchan c)
setrun (&proc[i]): The trick here is that "a" and Db",
while (++i < NPROC): having been declared as pointers to
Example I characters are treated for comparlson
purposes as unsigned integers.
3428 stime ()
Example .§. { The body of the procedure could have
i f (suser()) { been written as
5336 geterror (abp) time[e] = u.u are[Re]:
struct buf *abp: time[l] = u.u-are[RI]:
{ wakeup (tout); if (a > b)
register struct buf *bp: return (a):
bp = abp: else
if (bp->b flags&B ERROR) return (b):
if((u.u-error=bp->b error)==e)
u.u_error = EIO: - In this example, you should note that
the procedure "suser" returns a value but the nature of "return" is such that
which is used for the "if" test. The the "else" is not needed here!

UNIX Operating System 3-4 Reading ·C· Programs


Example ~ Example !! Example 11
Here are two "quickies" which introduce Many of the points which have been 4856 bawrite (bp)
some different and exotic looking introduced above are collected in the struct buf *bp;
expressions. First: following procedure: (
register struct buf *rbp;
7679 schar() 2134 setrun (p) rbp = bPi
{ { rbp->b flags =1 B_ASYNC;
return (*u.u_dirp++ & 0377); register struct proc *rp; bwrite-(rbp) ;
rp = p;
rp->p wchan = 0;
rp->p-stat = SRUN;
where the declaration if (rp->p_pri < curpri) The second last statement is interest-
runrun++; ing because it could have been written
char *u_dirp; if (runout != 0 && as
(rp->p flag&SLOAD) 0) {
is part of the declaration of the runout =-0;
structure "u". wakeup (&runout);

"u.u_dirp" is a character pointer. In this statement the bit mask


Therefore the value of n*u.u dirp++n is "B ASYNC" is Horned into
a character. (Incrementation of the Check your understanding of "C" by "rbp->b flags". The symbol "I" is the
pointer occurs as a side effect.) figuring out what this one does. logical- disjunction for arithmetic
values.

When a character is loaded into a six- There are two additional features you
teen bit register, sign extension may may need to know about: This is an example of a very useful
occur. By "and"ing the word with 0377 construction in UNIX, which can save
any extraneous high order bits are the programmer much labour. If "I" is
eliminated. Thus the result returned "&&" is the logical conjunction ("and") any binary operator, then
is simply a character. for relational expressions. (Cf. "II"
introduced earlier.) x = x I a;
Note that any integer which begins with where "a" is an expression, can be
a zero (e.g. 0377) is interpreted as an The last statement contains the expres- rewritten more succinctly as
octal integer. sion
x =1 a; .
&runout
The second example is: A programmer using this construction
which is syntactically an address vari- has to be careful about the placement
1771 nseg(n) able but semantically just a unique bit of blank characters. Since
( pattern.
return «n+127) »7) ; x =+ 1;

This is an example of a device which is is different from


used throughout UNIX. The programmer
The value returned is "n divided by 128 needed a unique bit pattern for a par- x = +1;
and rounded up to the next highest ticular purpose. The exact value did
integer". not matter as long as it was unique. what is to be the meaning of
An adequate solution to the problem was
to use the address of a suitable global x =+1; ?
Note the use of the right shift opera- variable.
tor "»" in preference to the division
operator "/".

UNIX Operating System 3-5 Reading nC n Programs


Example 12 Actually the original code had The meaning of the procedure is
6824 ufalloc () for (p=&proc[0]:p<&proc[NPROC]:p++) Keep calling "cpass" while the
{ result is positive, and pass the
register i: but it wouldn't fit on the line! As result as a parameter to a calIon
for (i=0: i<NOFILE: i++) noted earlier, the use of "procH as an "lpcanon".
if (u.u ofile[i]==NULL) alternative to the expression
u.u ar0[R0] = i: "&proc[0]" is acceptable in this con-
return (i): text. Note the redundant Hint" in the
declaration for "c". It isn't always
u.u error = EMFILE: omittedi
return (-1): This kind of "for" statement is almost
a cliche in UNIX so you had better
learn to recognise it. Read it as
Example 15
This example introduces the "for" for p = each process in turn
statement, which has a very general The next example is abbreviated from
syntax making it both powerful and com- the original:
pact.
Note that "&proc[NPROC]" is the address 5861 seek ()
of the (NPROC+l)-th element of the {
The structure of the "fo~~-statement is array (which does not of course exist) int n[2]:
adequately described on page 10 of the i.e. it is the first location beyond register *fp, t:
"C Tutorial", and that description is the end of the array. fp = getf (u.u_ar9[R9]):
not repeated here.
t = u.u_arg[l]:
At the risk of overkill we would point
The Algol equivalent of the above "for" out again that whereas in the previous switch (t)
statement would be example
case 1:
for i:=l step 1 until NOFILE-l do i++ case 4:
n[9] =+ fp->f offset[9]:
The power of the "for" statement in nCR meant "add one to the integer in, here dpadd (n, fp->f_offset[l]):
derives from the great freedom the pro- break:
grammer has in choosing what to include p++
between the parentheses. Certainly default:
there is nothing which restricts the means "skip p to point to the next n[0] =+ fp->f inode->i size0
calculations to integers, as the next structure". &9377: - -
example will demonstrate. dpadd(n,fp->f_inode->i_sizel):

case 9:
Example 14 case 3:
Example 13
8870 Ipwrite ()
3949 signal (tp, sig) {
{ register int c:
register struct proc *p: while «c=cpass (» >= 0)
for (p=proc:p<&proc[NPROC]:p++) Ipcanon(c) :
if (p->p ttyp == tp) Note the array declaration for the two
psignaI (p,sig): word array Un", and the use of "getf"
(which appeared in Example 4).
This is an example of the "while"
statement, which should be compared
In this example of the "for" statement, with the "do .•. while "construc- The "switch" statement makes a multi-
the pointer variable "p" is stepped tion encountered earlier. (Cf. the way branch depending on the value of
through each element of the array "while" and "repeat" statements of Pas- the expression in parentheses. The
"proc" in turn. cal. ) individual parts have "case labels":

UNIX Operating System 3-6 Reading nCR Programs


If nth is one or four, then one case IFBLK: Example 17
set of actions is in order. (*bdevsw[maj] .d_close) (dev,rw);
We offer the following as a final exam-
If "t" is zero or three, nothing iput (rip); ple:
is to be done at all.
4043 psig ()
If "t" is anything else, then a {
set of actions labelled "default" This example has a number of interest- register n, p;
is to be executed. ing features.
swi tch (n) {
Note the use of "break" as an escape to The declaration for "d_major" is case SIGQIT:
the next statement after the end of the case SIGINS:
"switch" statement. Without the struct { case SIGTRC:
"break", the normal execution sequence char d minor; case SIGIOT:
would be followed within the "switch" char d=major; case SIGEMT:
statement. case SIGFPT:
case SIGBUS:
Thus a "break" would normally be so that the value assigned to "maj" is case SIGSEG:
required at the end of the "default" the high order byte of the value case SIGSYS:
actions. It has been omitted safely assigned to "dev". u.u_arg[0] n;
here because the only remaining cases if (core () )
actually have null actions associated n =+ 0200;
with them. In this example, the "switch" statement }
has only two non-null cases, and no u.u arg[0]=(u.u ar0[R0]«8) I n;
"default". The actions for the recog- exit (); -
The two non-trivial pairs of actions nised cases, e.g.
represent the addition of one 32 bit
integer to another. The later versions (*bdevsw[maj] .d_close) (dev,rw);
of the "C" compiler will support "long" Here the "switch" selects certain
variables and make this sort of code look formidable at first glance. values for Un" for which the one set of
much easier to write (and read). actions should be carried out.
First it should be noted that this is a
procedure call, with parameters "dev"
Note also that in the expression and "rw". An alternative would have been to write
a "monster" "if" statement such as
fp->f_inode->i_size0 Second "bdevsw" (and "cdevsw") are
arrays of structures, whose "d close" if (n==SIGQIT I I n==SIGINS I I
there are two levels of indirection. element is a pointer to a function, ••• I I n==SIGSYS)
i.e.
but that would not have been either
bdevsw [maj] transparent or efficient.
Example ~
is the name of a structure, and
6672 closei (ip, rw) Note the addition of an octal constant
int *ip; bdevsw[maj] .d_close to Un" and the method of composing a 16
{ bit value from two eight bit values.
register *rip; is an element of that structure which
register dev, maj; happens to be a pointer to a function,
rip = ip; so that
dev = rip->i addr[0]; -000-
maj = rip->i-addr[0].d major; *bedsw[maj] .d_close
switch (rip->i_mode&IFMT) {
is the name of a function. The first
case IFCHR: pair of parentheses is "syntactical
(*cdevsw[maj].d close) (dev,rw); sugar" to put the compiler in the right
break; - frame of mind!

UNIX Operating System 3-7 Reading "C" programs


ours. In many ways the divisions (b) a number of variables such as
between files is irrelevant to the "swbuf" (4721) which are refer-
present discussion and might well be enced only by procedures within
abolished entirely. a single file, and are declared
at the beginning of that file.
As mentioned already in Chapter One,
the files have been organised into five Global variables may be declared
sections. As far as was possible, the separately within each file in which
sections were chosen to be of roughly they are referenced. It is then the job
equal size, to cluster files which are of the loader, which links the compiled
strongly associated and to separate versions of the program files together
files which are only weakly associated. to match up the different declarations
for the same variable.

variable Allocation
The 'f' Preprocessor
The PDPII architecture allows efficient
access to variables whose absolute If global declarations must be repeated
address is known, or whose address in full in each file (as is required by
relative to the stack pointer can be Fortran, for instance) then the bulk of
determined exactly at compile time. the program is increased, and modifying
a declaration is at best a nuisance,
and at worst, highly error-prone.
There is no hardware support for multi-
CHAPTER FOUR ple lexical levels for variable
declarations such as are available in These difficulties are avoided in UNIX
An Overview block structured languages such as by use of the preprocessor facility of
Algol or Pascal. Thus "C" as imple- the "C" compiler. This allows declara-
mented on the PDPll supports only two tions for most global variables to be
lexical levels: global and local. recorded once only in one of the few
".h" files.
The purpose of this chapter is to sur-
vey the source code as a whole i.e. to Global variables are allocated stati-
present the "wood" before the "trees". cally; local variables are allocated Whenever the declaration for a particu-
dynamically within the current stack lar global variable is required the
area or in the general registers (r2, appropriate ".h" file can then be
Examination of the source code will r3 and r4 are used in this way). "included" in the file being compiled.
reveal that it consists of some 44 dis-
tinct files, of which:
UNIX also uses the ".h" files as vehi-
two are in assembly language, and Global Variables cles for lists of standard definitions
have names ending in ".s"; for many symbolic names which represent
In UNIX with very few exceptions, the constants and adjustable parameters,
28 are in the "C" language and declarations for global variables have and for declaration of some structure
have names ending in ".e" ; been all gathered into the set of ".h" types.
files. The exceptions are:
14 are in the "C" language, but
are not intended for independent (a) the static variable "p" (2l8~) For example, if the file "bottle.c"
compilation, and have names ending declared in "swtch" which is contains a procedure "glug" which
in ".h". stored globally, but is accessi- references a global variable called
ble only from within the pro- "gin" which is declared in the file
cedure "swtch". (Actually "pH is "box.h", then a statement:
The files and their contents were a very popular name for local
arranged by the programmers presumably variables in UNIX.); jlinclude "box.h"
to suit their convenience and not for

UNIX Operating System 4-1 An Overview


must be inserted at the beginning of previously "included" "param.h". low.s [Sheet 05; Chapter 9] contains
the file "bottle.c". When the file information, including the trap vec-
"bottle.c" is compiled, all declara- tor, for initialising the low
tions in "box.hn are compiled, and ~.Q [Sheet 03] contains a few address part of main memory. This
since they are found before the begin- definitions and one declaration, file is generated by a utility pro-
ning of any procedure in "bottle.c n which are used for referencing the gram called "mkconf" to suit the set
they are flagged as external in the segmentation registers. This file of peripheral devices present at a
relocatable module which is produced. could be absorbed into "param.h" and particular installation;
"systm.h" without any real loss;
When all the object modules are linked m40.s [Sheets 06 .. 14; Chapters 6, 8,
together, a reference to "gin" will be proc.Q [Sheet 03; Chapter 7] con- ~ 10, 22] contains a set of rou-
found in every file for which the tains the important declaration for tines appropriate to the PDPll/40,
source included "box.h". All these "proc", which is both a structure to carry out a variety of special-
references will be consistent and the type and an array of such struc- ised functions which cannot be
loader will allocate a single space for tures. Each element of the "procH implemented directly in pC".
"gin" and adjust all the references structure has a name which begins
accordingly. with "p ", and no other variable is Sections of this file are introduced
so named. Similar conventions are into the discussion as and where
used for naming the elements of the appropriate. (The largest of the
other structures. assembler procedures, "backup", has
Section One been left to the reader to survey as
The sets of values for the first two an exercise.)
Section One contains many of the ".h" elements, "p stat" and "p_flag",
files and the assembly language files. have individual names which are
defined. There is an alternative to "m40.s",
which is not presented here, namely
It also contains a number of files con- "m45.s", which is used on PDP11/45's
cerned with" system initialisation and user.h [Sheet 04; Chapter 7] con- and 70's.
process management. tains- the declaration for the very
important "user" structure, plus a
set of defined values for "u error".
Other Files in Section One
Only one instance of the "user":
structure is ever accessible at one main.£ [Sheets 15 .• 17; Chapters 6,
param.Q [Sheet 01] contains no vari- time. This is referenced under the 7] contains "main" which performs
able declarations, but many defini- name "un and is in the low address various initialisation tasks to get
tions for operating system constants part of a 1024 byte area known as UNIX running. It also contains
and parameters, and the declarations the "per process data area". "sureg" and "estabur" which set the
for three simple structures. The user segmentation registers.
convention will be noted of using
"upper case only" for defined con- In general the complete ".h" files are
stants. not analysed in detail later in this ~.£ [Sheets 18 •. 22; Chapters 6, 7,
text. It is expected that the reader 8, 14] contains the major procedures
will refer to them from time to time required for process management
systm.Q [Sheet 02; Chapter 19] con- (with increasing familiarity and under- including "newproc", "sched",
sists entirely of declarations (with standing) • "sleep" and "swtch".
definitions of the structures "cal-
lout" and "mount" as side-effects).
Note that none of the variables is E£f.£ [Sheets 23, 24; Chapter 5]
initialised explicitly, and hence Assembly Language Files contains "panic" and a number of
all are initialised to zero. other procedures which provide a
There are two files in assembly simple mechanism for displaying ini-
The dimensions for the first three language which comprise about 10% of tialisation messages and error mes-
arrays are parameters defined in the source code. A reasonable acquain- sages to the operator.
"param.h". Hence any file which tance with these files is necessary.
"includes" "systm.h" must have

UNIX Operating System 4-2 An Overview


malloc.£ [Sheet 25; Chapter 5] con- sysl.£ [Sheets 3~ •. 33; Chapters 12, buf.h [Sheet 45; Chapter 15] defines
tains "malloc" and "mfree" which are 13] contains various routines asso- the -"buf" structure and array, the
used to manage memory resources. ciated with system calls, including structure "devtab", and names for
"exec", "exit", "wait" and "fork". the values of "b error". All these
are needed for the management of the
large (512 byte) buffers.
Section Two sys4.£ [Sheets 34 •• 36; Chapters 12,
13, 19] contains routines for
Section Two is concerned with traps, "unlink", "kill" and various other conf.~ [Sheet 46; Chapter 15]
hardware interrupts and software inter- minor system calls. defines the arrays of structures
rupts. "bdevsw" and "cdevsw", which specify
the device oriented procedures
clock.c [Sheets 37, 38; Chapter 11] needed to carry out logical file
Traps and hardware interrupts introduce contains "clock" which is the operations.
sudden switches into the CPU's norm~l handler for clock interrupts, and
instruction execution sequence. This which does much of the incidental
provides a mechanism for handling spe- housekeeping and basic accounting. conf.c [Sheet 46; Chapter 15] is
cial conditions which occur outside the generated, like "low.s", by the
CPU's immediate control. "mkconf" utility to suit the set of
~.£ [Sheets 39 .• 42; Chapter 13] peripheral devices present at a par-
contains the procedures which handle ticular installation. It contains
Use is made of this facility as part of "signals" or "software interrupts". the initialisation for the arrays
another mechanism called the "system These provide facilities for inter- "bdevsw" and "cdevsw", which control
call", whereby a user program may exe- process communication and tracing. the basic i/o operations.
cute a "trap" instruction to cause a
trap deliberately and so obtain the
operating system's attention and assis- bio.c [Sheets 47 •• 53; Chapters 15,
tance. Section Three 16, -17] is the largest file after
"m40.s". It contains the procedures
Section Three is concerned with basic for manipulation of the large
The software interrupt (or "signal") is input/output operations between the buffers, and for basic block
a mechanism for communication between main memory and disk storage. oriented i/o.
processes, particularly when there is
"bad news".
These operations are fundamental to the rk.c [Sheets 53, 54; Chapter 16] is
activities of program swapping and the the- device driver for the RKll/RK05
~.~ [Sheet 26; Chapter l~] defines creation and referencing of disk files. disk controller.
a set of constants which are used in
referencing the previous user mode
register values when they are stored This section also introduces procedures
in the kernel stack. for the use and manipulation of the Section Four
large (512 byte) buffers.
Section Four is concerned with files
trap.£ [Sheets 26 •. 28; Chapter 12] and file systems.
contains the nCR procedure "trap" text.~ [Sheet 43; Chapter 14]
which recognises and handles traps defines the "text" structure and
of various kinds. array. One "text" structure is used A file system is a set of files and
to define the status of a shared associated tables and directories
text segment. organised onto a single storage device
sysent.£ [Sheet 29; Chapter 12] con- such as a disk pack.
taIns the declaration and initiali-
sation of the array "sysent" which text.£ [Sheets 43, 44; Chapter 14]
is used by "trap" to associate the contains the procedures which manage This section covers the means of
appropriate kernel mode routine with the shared text segments. creating and accessing files;
each system call type. locating files via directories;
organising and maintaining
file systems.

UNIX Operating System 4-3 An Overview


It also includes the code for an exotic fio.c [Sheets 66 •• 68; Chapters 18, relevant data for controlling an
breed of file called a ·pipe-. 19] contains intermediate level rou- individual terminal), declares the
tines for file opening, closing and "par tab" table (used to control
control of access. transmission of individual charac-
file.h [Sheet 55; Chapter 18] ters to terminals) and defines names
OeTInes the "file" structure and for many associated parameters.
array. alloc.c [Sheets 69 .. 72; Chapter 2a]
contains procedures which manage the
allocation of entries in the "inode" kl.c [Sheet 8a; Chapters 24, 25) is
filsys.~ [Sheet 55; Chapter 2a) array and of blocks of disk storage. the device driver for terminals con-
defines the "filsys" structure which nected via KLII or DLII interfaces.
is copied to and from the "super
block" on "mounted" file systems. t§et.c [Sheets 72 .. 74; Chapters 18,
, 2a] contains procedures con- !!y.£ [Sheets 81 .. 85; Chapters 23,
cerned with referencing and updating 24, 25] contains common procedures
ino.h [Sheet 56] describes the "inodes". which are independent of the attach-
structure of "inodes" as recorded on ing interfaces, for controlling
the "mounted" devices. Since this transmission to or from terminals,
file is not "included" in any other, namLc [Sheets 75, 76; Chapter 19) and which take into account various
it really exists for information contaIns the procedure "namei" which terminal idiosyncrasies.
only. searches the file directories.
oc.c [Sheets 86,87; Chapter 22) is
inode.h [Sheet 56; Chapter 18] ~.£ [Sheets 77, 78; Chapter 21] the- device handler for the PCll
defines the "inode" structure and is the "device driver" for "pipes", paper tape reader/punch controller.
array. "inodes" are of fundamental which are a special form of short
importance in managing the accesses disk file used to transmit informa-
of processes to files. tion from one process to another. lE..£ [Sheets 88, 89; Chapter 22] ,is
the device handler for the LPII line
printer controller.
sys2.£ [Sheets 57 •• 59; Chapters 18,
19] contains a set of routines asso- Section Five
ciated with system calls including mem.c [Sheet 9a] contains procedures
"read", "write", "creatH, "open" and Section Five is the final section. It which provide access to main memory
"close". is concerned with input/output for the as though it were an ordinary file.
slower, character oriented peripheral This code has been left to the
devices. reader to survey as an exercise.
i Ya j3.c [Sheets
contains a
6a, 61; Chapters 19,
set of routines asso- -000-
ciated with various minor system Such devices share a common buffer
calls. pool, which is manipulated by a set of
standard procedures.
rdwrLc [Sheets 62, 63; Chapter 18]
contains intermediate level routines The set of character oriented peri-
involved with reading and writing pheral devices are exemplified by the
files. following:

KL/DLII interactive terminal


subr.c [Sheets 64, 65; Chapter 18] PCll paper tape reader/punch
contaIns more intermediate level LPII line printer.
routines for i/o, especially "bmap"
which translates logical file
pointers into physical disk !!y.~ [Sheet 79; Chapters 23, 24]
addresses. defines the "clist" structure (used
as a list head for character buffer
queues), the "tty" structure (stores

UNIX Operating System 4-4 An Overview


These are concerned with the allocation
and subsequent release of two kinds of
memory resources, namely:

main memory in units of 32 words


(64 bytes);
disk swap ~ in units of 256
words (512 bytes).

Section One contains many of the global


declaration files and the assembly For each of these two kinds of
language files. resource, a list of available areas is
maintained within a resource "map"
(either "coremap" or "swapmap"). A
It also contains a number of files con- pointer to the appropriate resource
cerned with system initialisation and "map" is always passed to "malloc" and
process management. "mfree" so that the routines themselves
do not have to know the kind of
resource with which they are dealing.

Each of "coremap" and "swapmap· is an


array of structures of the type "map"
as declared at line 2515. This struc-
ture consists of two character pointers
i.e. two unsigned integers.

CHAPTER FIVE
The declarations of "coremap· and
Two Files "swapmap" are on lines e293, 1!21!4.
Here the "map" structure is completely
ignored a regrettable programming
short-cut which is possible because it
is not detected by the loader. Thus the
This chapter is intended to provide a actual numbers of list elements in
gentle introduction to the source code "coremap" and "swapmap" are "CMAPSIZ/2"
by looking at two files in Section One and "SMAPSIZ/2" respectively.
which can be isolated reasonably well
from the rest.

Rules for List Maintenance


The discussion of these files supple-
ments the discussion of Chapter Three (A) Each available area is defined
and includes a number of additional by its size and relative address
comments regarding the syntax and (reckoned in the units appropri-
semantics of the "C" language. ate to the resource);
(B) The elements of each list are
arranged at all times in order
The ----
File 'malloc.c' of increasing relative address.
--- - Care is taken that no two list
This file is found on Sheet 25 of the elements represent contiguous
_........ _, and consists of just two
Source 1"'1'\..=10. areas - the alternative course,
procedures: to merge the two areas into a
single larger area is always
malloc (2528) mfree (2556) taken;

UNIX Operating System 5-1 Two Files


(C) The whole list can be scanned by Entry Size Address Note also that no explicit test for the
looking at successive elements 6 34 end of the array is made. (It can be
of the array, starting with the
first, until an element with a
"
1
2
15
7
47
65
shown that this latter is not necessary
provided CMAPSIZ, SMAPSIZ >= 2*NPROC !)
zero size is encountered. This 3 ??
last element is a "sentinel"
which is not part of the list
4 "
?? ??
2535: If the list element defines an
proper. area at least as large as that
If the area spanning addresses 4" to 46 requested, then •..
inclusive is returned to the available
The above rules provide a complete list, the "map" would become 2536: Remember the address of the first
specification for "mfree", and a unit of the area;
specification for "malloc" which is Entry Size Address
complete except in one respect: 28 34 2537: Increment the address stored in

We need to specify how the


"
1
2
7 65
?
the array element;

resource allocation is actually


made when there exists more than
3 "
?? ?? 2538: Decrement the size stored in the
element and compare the result
one way of performing it. with zero (i.e. was it an exact
Note how the number of elements has fit?) ;
The method adopted in "malloc" is one actually decreased by one because of
known as "First Fit" for reasons which amalgamation though the total available 2539: In the case of an exact fit, move
should become obvious. resources have of course increased. all the remaining list elements
(up to and including the sen-
tinel) down one place.
As an illustration of how the resource Let us now turn to a consideration of
"mapn is maintained, suppose the fol- the actual source code. Note that "(bp-l)" points to the
lowing three resource areas were avail- structure before the one refer-
able: enced by "bp";

an area of size 15 beginning at malloc (2528) 2542: The "while" continuation condi-
location 47 and ending at location tion does not test the equality
61; The body of this procedure consists of of " (bp-l)->m size" and
a "for" loop to search the "map" array "bp->m_size"! -
an area of size 13 spanning until either:
addresses 27 to 39 inclusive; The value tested is the value
(a) the end of the list of available .. assign-ed to "(bp-l) ->m. sbe"
an area of size 7 beginning at resources is encountered; or . co.piedfrom "bp->m_size". -
location 65.
(b) an area large enough to honour (You are forgiven for: not
the current request is found; recognising this at .once.);
Then the "map" would contain:
2534: The "for" statement initialises 2543: Return the address of the area.
Entry Size Address "bp" to point to the first ele- This represents the end of the
27 ment of the resource map. At procedure and hence very defin-
"
1
2
13
15
7
47
65
each succeeding iteration "bp" is
incremented to point to the next
itely the end of the "for" loop.
3 ?? "map" structure. Note that a value of zero
4 "
?? ??
Note that the continuation condi-
returned means "no luck". This is
based on the assumption that no
tion "bp->m size" is an expres- valid area can ever begin at
If a request for a space of size 7 were sion, which becomes zero with the location zero.
received, the area would be allocated sentinel is referenced. This
starting at location 27, and the "map" expression could have been writ-
would become: ten equivalently but more tran-
sparently as "bp->m_size>"".

UNIX Operating System 5-2 Two Files


mfree (2556) 2566: Increase the size of the previous Note also that the correct functioning
list element by the size of the of "malloc" and "mfree" depends on
This procedure returns the area of size area being returned; correct initialisation of "coremap" and
"size" at address "aa" to the "resource "swapmap". The code to do this occurs
map" designated by "mp". The body of 2567: Does the area being returned also in the procedure "main" at lines 1568,
the procedure consists of a one line abut the next element of the 1583.
"for" statement, followed by a multi- list? I f so ..•
line "if" statement.
2568: Add the size of the next element
2564: The semicolon at the end of this of the list to the size of the
line is extremely significant, previous element;
terminating as it does the empty
statement. (It would aid legibil- 2569: Move all the remaining list ele- This file is found on Sheets 23 and 24,
ity if this character were moved ments (up to the one containing and contains the following procedures:
to a line on its own, as
is done the final zero size) down one
on line 2394.) place. printf (2340) panic (2416)
printn (2369) prdev (2433)
Depending on your point of view, Note that if the test on line putchar (2386) deverror (2447)
this statement demonstrates 2567 fortuitously gives a true
either the power or the obscurity result when "bp->m size" is zero
of the "CD language. Try writing no harm is done; - The calling relationship between these
equivalent code to this statement procedures is illustrated below:
in another language such as Pas- 2576: This statement is reached if the
calor PL/l. . test on line 2565 failed i.e. the panic deverror
area being returned cannot be I I
Step "bp" through the list until amalgamated with the previous I prdev
an element is encountered either element on the list. I I
with an address greater than the \ /
address of the area being Can it be amalgamated with the printf
returned. next element? Note the check that I
the next element is not null; printn
i.e. not "bp->m_addr <= a" I
2579: Provided the area being returned putchar
or which indicates the end of the is genuinely non-null (perhaps
list this test should have been made
sooner?) add a new element to the
Le. not "bp->m_size != 0"; list and push all the remaining printf (2340)
elements up one place.
2565: We have now located the element The procedure "printf" provides a
in front of which we should direct, unsophisticated low-level,
insert the new list element. The unbuffered way for the operating system
question is: will the list grow In conclusion ••. to send messages to the system console
larger by one element or will terminal. It is used during initialisa-
amalgamation keep the number of The code for these two procedures has tion and to report hardware errors or
elements the same or even reduce been written very tightly. There is the imminent collapse of the system.
it by one? little, if any, "fat" which could be
removed to improve run time efficiency.
If "bp > mpH we are not trying to However it would be possible write (These versions of "printf" and
insert at the beginning of the these procedures in a more transparent "putchar" run in kernel mode and are
list. If fashion. similar to, but not the same as, the
versions invoked by a nCo program which
(bp-l)->m_addr+(bp-l)->m_size==a runs in user mode. The latter versions
If you feel strongly on this point, of "printf" and "putchar" live in the
then the area being return abuts then as an exercise, you should rewrite library "/lib/libc.a". You may still
the previous element in the list; "mfree" to make its function more find it useful to read the sections
easily discernible. "PRINTF(III)" and "PUTCHAR(III)" of the
UPM at this point.)

UNIX Operating System 5-3 Two Files


2340: The programmer must have been addresses of variables or arrays printn (2369)
carried away when he declared all are effective because they can be
the parameters for this pro- evaluated at compile or --load This procedure calls itself recursively
cedure. In fact the procedure time. ) ; 1n order to generate the required
body only contains references to digits in the required order. It might
·xl" and "fmt". 2348: Extract into the register "c" be possible to code this procedure more
successive characters from the efficiently but not more completely.
format string; (Anyway, in view of the implementation
This serves to reveal one of the facts of "putchar", efficiency is hardly a
of "c" programming. The rules for 2349: I f lie" is not a '% ' then ... consideration here.)
matching parameters in procedure calls
and procedure declarations are not 23513: If "c" is a null character
enforced, not even with respect to the ('\13'), this indicates the end of Suppose n = A*b + B where A = Idiv(n,b)
numbers of parameters. the format string in the normal and where B lrem(n,b) satisfies
way, and "printf" terminates; 0<=B<b. Then in order to display the
Parameters are placed on the stack in value for n, we need to display the
reverse order. Thus when "printf" is 2351: Otherwise call "putchar" to send value for A followed by the value for
called "fmt" will be nearer to the "top the character to the system con- B.
of stack" than "xl", etc. sole terminal;
2353: A '%' character has been seen. The latter is easy for b = 8 or 10: it
Get the next character (it had consists of a single character. The
better not be the '\13'!); former is easy if A = 0. It is also
stack easy if "printn" is called recursively.
grows 2354: If this character is a 'd' or 'I' Since A < n, the chain of recursive
down or '0', call "printnn passing as calls must terminate.
parameters the value referenced
x2 by "adx" and either the value "8"
xl or "10" depending on whether "c" 2375: Arithmetic values corresponding
fmt is '0' or not. (The 'd' and 'I' to digits are conveniently con-
codes are clearly equivalent.) verted to their corresponding
character representations by the
top of "printn" expresses the binary addition of the character '0'.
stack numbers as a set of digit charac-
ters according to the radix sup-
plied as the second parameter; The procedures "Idly" and "lrem" treat
"xl" has a higher address then· "fmt" their first parameter as an unsigned
but a lower address then "x2", because 2356: If the editing character i s ' s ' , integer (i.e. no sign extension, when a
stacks grow downwards on the PDPII. then all but the last character 16 bit value is extended to a 32 bit
of a null terminated string is to value before the actual division opera-
2341: "fmt" may be interpreted as a be sent to the terminal. "adx" tion). They may be found beginning on
constant character pointer. This should point to a character lines 1392 and 1400 respectively.
declaration is (almost) pointer in this case;
equivalent to
"char *fmt;" 2361: Increment "adx" to point to the
The difference is that here the next word in the stack i.e. to putchar (2386)
value of "fmt" cannot be changed; the next parameter passed to
"printf"; This procedure transmits to the system
2346: "adx" is set to point to "xl". console the character which was passed
The expression "&xl" is the 2362: Go back to line 2347 and continue as a parameter.
address of "xl". Note that since scanning the format string.
"xl" is a stack location, this Enthuisiasts for structured pro- It illustrates in a small way the basic
expression cannot be evaluated at gramming will prefer to replace features of i/o operations on the PDPII
compile time. lines 2347 and this by computer.
"while (1) {" and "}"
(Many of the expressions you will respectively. 2391: "SW" is defined on line 0166 as
find elsewhere involving the the value "0177570". This is the

UNIX Operating System 5-4 Two Files


kernel address of a read only 2393: While bit 7 of the transmitter and compactness not to mention
processor register which stores status register ("XST") is off, clarity seem to have suffered.);
the setting of the console switch keep doing nothing, because the
register. interface is not ready to accept 2406: Restore the contents of the
another character. transmitter status register. In
The meaning of the statement is particular if bit 6 was formerly
clear: get the contents at loca- set to enable interrupts then
tion 0177570 and see if they are This is a classic case of "busy wait- this resets it.
zero. The problem is to express ing" where the processor is allowed to
this in "CU. The code cycle uselessly through a set of
instructions until some externally
if (SW == 0) defined event occurs. Such waste of panic (2419)
processing power cannot normally be
would not have conveyed this tolerated but this procedure is only This procedure is called from a number
meaning. Clearly "SW" is a used in unusual situations. of locations in the operating system.
pointer value which should be (e.g. line 1605). When circumstances
dereferenced. The compiler might exist under which continued operation
have been changed to accept 2395: The need for this statement is of the system seems undesirable.
tied up with the statement on
i f (SW -> == 0) line 2405;
UNIX does not profess to be a "fault
but as it stands, this is syntac- 2397: Save the current contents of the tolerant" or "fail soft" system, and in
tically incorrect. By inventing a transmitter status register; many cases the calIon "panic" can be
dummy structure, with an element interpreted as a fairly unsophisticated
"integ" (see line 0175), the pro- 2398: Clear the transmitter status response to a straightforward problem.
grammer has found a satisfactory register preparatory to sending
solution to his problem. the next character; However more complicated responses
require additional code, lots of it,
2399: With bit 7 of the control status and this is contrary to the general
Several other examples of this program- register reset, move the next UNIX philosophy of "keep it simple".
ming device will be found in this pro- character to be transmitted to
cedure and elsewhere. the transmitter buffer register.
This initiates the next output 2419: The reason for this statement is
operation; given in the comment beginning at
In hardware terms, the system console line 2323;
terminal interface consists of four 16 2400: A "new line" character needs to
bit control registers which are given be accompanied by a "car r iage 2420: "update" causes all the large
consecutive addresses on the Unibus return" character and this is block buffers to be written out.
beginning at kernel address 0177560 accomplished by a recursive call See Chapter Twenty;
(see the declaration for "KL" on line on "putchar".
0165.) For a description of the formats 2421: "printf" is called with a format
and usage of these registers, see A couple of extra "delete" char- string and one parameter, which
Chapter Twenty-Four or the "PDPll Peri- acters are thrown in also, to was passed to "panic";
pherals Handbook". allow for any delays in complet-
ing the carriage return operation 2422: This "for" statement defines an
at the terminal; infinite loop in which the only
In software terms, this interface is action is a calIon the assembly
the unnamed structure which is defined 2405: This calIon "putchar" with an language procedure "idle" (1284).
beginning on line 2313, with four ele- argument of zero effectively
ments which name the four control results in a re-execution of "idle" drops the processor prior-
registers. It does not matter that the lines 2391 to 2394. ity to zero, and performs a
structure is unnamed because it is not "wait". This is a "do nothing"
necessary to allocate any instances of (It is very hard to see why the instruction of indefinite dura-
it (the one we are interested in is proqrammer chose to use a recur- tion. It terminates when a
essentially predefined, at the address sive call here in preference to hardware interrupt occurs.
given by "KL"). simply repeating lines 2393 and
2394, since both code efficiency

UNIX Operating System 5-5 Two Files


An infinite set of calls on ftidle" is The files ftbuf.hft and "conf.h" have
better than the execution of a nhalt ft been included to provide declarations
instruction, since any i/o activities for "d majorn. "d minor", "b dev" and
which were under way can be allowed to "b blkno", which- are used Tn "prdev"
complete and the system clock can keep and "deverror".
ticking.

The reason for the inclusion of the


The only way for the operator to fourth file, "seg.h", is a little
recover from a "panic" is to reinitial- harder to find. In fact it is not
ise the system, (after taking a core necessary as the code stands, and the
dump, if desired) •• author owes his readers an apology. In
editing the source code, it seemed like
a good idea to move the declaration for
"integ" from "seg.h" to "param.h".
prdev (2433) Q.E.D.
deverror (2447)
Note that the variable "panicstr"
These procedures provide warning mes- (2328) is also global but since it is
sages when errors are occurring in i/o not referenced outside nprf.c", its
operations. At this stage, their only declaration has not been placed in any
interest is as examples of the use of ".h" file.
"printf".
-000-

Included Files

It will be noted that whereas the file


"malloc.c" contains no request to
include other files, requests to
include four separate files are
included at the beginning of "prf.c".

(The observant reader will note that


these files are presumed to reside one
level higher in the file hierarchy than
"prf.c" itself.)

The statement on line 2304 is to be


understood as if it were replaced by
the entire contents of the file
"param.h". This then supplies defini-
tions for the identifiers ftSW", "KL"
and "integ" which occur in "putchar".
We noted earlier that declarations for
"KL", "SW" and "integ" occurred on
lines 0165, 0166 and 0175 respectively,
but this would have been meaningless,
if the file "param.h" had not been
"included" in "prf.c".

UNIX Operating System 5-6 Two Files


latter case, then we can assume that Address zero is occupied by a branch
all the disk files are intact and that instruction (line 0508), which branches
no special circumstance needs to be to location 000040, which contains a
recognised or dealt with. jump instruction (line 0522), which
jumps to the instruction labelled
"start" in the file "m40.s" (line
In particular, we can assume there is a 0612) .
file in the root directory called
"/unix", which is the object code for
the operating system.
start (0612)
This file began life as a set of source 0613: The "enabled" bit of the memory
files such as we are investigating. management status register, SR0,
These were compiled and linked together is tested. If this set, the pro-
in the normal way to form a single cessor will dwell forever in a
object program file, and stored in the two instruction loop. This regis-
root directory. ter will normally be cleared when
the operator activates the
"clear" button on the console
before starting the system.
Operator Actions
A number of reasons have been
Reinitialisation requires operator suggested for the necessity for
action at the processor console. The this loop. The most likely is
operator must: that in the case of a double bus
timeout error, the processor will
stop the processor by setting the branch to location zero, and in
"enable/halt" switch to "halt"~ this situation it should not be
allowed to go further.
set the switch register with the
CHAPTER SIX address of the hardware bootstrap 0615: "reset" clears and initialises
loader program~ . all the peripheral device control
Getting Started and status registers~
depress and release the "load
address" switch~
The system will now be running in
move the "enable/halt" switch to kernel mode--wIth-:ffiemory management
This chapter considers the sequence of "enable"~ disabled.
events which occur when UNIX is
"rebooted" i.e. it is loaded and ini- depress and release the "start"
tiated in an idle machine. switch. 0619: KISA0 and KISD0 are the high core
addresses of the first pair of
kernel mode segmentation regis-
A study of the initialisation process This activates the bootstrap program ters. The first six kernel
is of interest in itself, but more which is permanently recorded in a ROM descriptor registers are initial-
importantly, it allows a number of in the processor. ised to 077406, which is the
important features of the system to be description of a full size, 4K
presented in an orderly manner. word, read/write segment.
The bootstrap loader program loads a
larger loader program (from block 10 of The first six kernel address
The operating system may have to be the system disk), which looks for and registers are initialised to 0,
restarted in the aftermath of a system loads a file called "/unix" into the 0200, 0400, 0600, 01000 and 01200
crash. It will also have to be re- low part of memory. respectively.
started frequently for quite ordinary,
operational reasons, e.g. after an It then transfers control to the As a result the first six kernel
overnight shutdown. If we assume the instruction loaded at address zero. segments are initialised (without

UNIX Operating System 6-1 Getting Started


any reference to the actual size locations in physical memory. Each such 0671 and 0672) is therefore somewhat
of UNIX) to point to the first location is the beginning of an area enigmatic. The reason will come later.
six 4K word segments of physical 512 words long, known as a "per process In the meantime you might like to
memory. Thus the "kernel to phy- data area". ponder "why?". What do these lines do
sical" address translation is anyway?
trivial for kernel addresses in
the range ° to 0137777;
The seventh kernel address register is
now set to point to the segment which
will become the per process data area
0632: n end" is a loader pseudo vari- for process *0. main (1550)
able which defines the extent of
the program code and data area. 0646: The stack pointer is set to point Upon entry to this procedure:
This value is rounded up to the to the highest word of the per
next multiple of 64 bytes and is process data area; (a) the processor is running at
stored in the address register priority zero, in kernel mode
• for the seventh segment (segment 0647: By incrementing the value of SR0 and with the previous mode shown
#6) • from zero to one, the "memory as user mode;
management enabled" bit is con-
Note that the address of this veniently set. (b) the kernel mode segmentation
register is stored in Rka6", so registers have been set and the
that the content of this register memory management unit has been
is accessible as "*ka6"; From this point, all program addresses enabled;
are translated to physical addresses ~
0634: The corresponding descriptor the memory management hardware. (c) all the data areas used by the
register is loaded with a value operating system have been ini-
which (since "USIZE" is equal to 0649: "bss" refers to the second part tialised;
16) is the description of a of the program data area, which
read/write segment which is 16 x is not initialised by the loader (d) the stack pointer (SP or r6)
32 = 512 words long. (see "A.OUT(V)" in the UPM). The points to a word which contains
lower and upper limits of this a return address in "start".
The value 007406 is obtained by area are defined by the loader
shifting the octal value 017 pseudo variables, "edata" and
eight places to the left and then "_end" respectively; -
Horning in the value 6; 1559: The first action of "main" would
0668: The processor status word (PS) is appear to be redundant, since
0641: The eighth segment is mapped into changed to indicate that the "updlock" should have already
the highest 4K word segment of "previous mode" was "user mode". been set to zero as part of the
the physical address space. initialisation performed by
This prepares the way for the "start";
It should be noted that with investigation and initialisation
memory management disabled, the of the areas of physical memory 1560: "in is initialised to the ordinal
same translation is already in which are not part of the kernel of the first 32 word block beyond
force i.e. addresses in the address space. (This involves use the "per process data area" for
highest 4K word segment of the of the special instructions process *0;
32K program address space are "mtpi" and "mfpi" (Move To/From
automatically mapped into the Previous Instruction space) 1562: The first pair of user mode seg-
highest 4K word segment of the together with some manipulation mentation registers are used to
physical address space. of the user mode segmentation provide a "moving window" into
registers.) ; higher areas of the physical
memory.
We may note that from this point on, 0669: A call is then made to the pro-
all the kernel mode segmentation regis- cedure "main" (1550). At each position of the window an
ters will remain unchanged with the attempt is made (using "fuibyte")
single exception of the seventh kernel to read the first accessible word
segmentation address register. It will be seen later that "main" calls in the window. If this is not
"sched" which never terminates. The successful, it is assumed that
This register is explicitly manipulated need for or use of the last three the end of the physical memory
by UNIX to point to a variety of instructions of "start" (lines 0670, has been reached.

UNIX Operating System 6-2 Getting Started


Otherwise the next 32 word block address 777546, or a programm- Processes
is initialised to zero (using able, real-time clock (KWII-P)
"clearseg" (~676» and added to located at address 77754~ (lines "process" is a term which has occurred
the list of available memory, and 15~9, 1510). more than once already. A definition
the window is advanced by 32 which will suit our purposes reasonably
words. UNIX does not presume which clock well at present is simply "a program in
will be present. It attempts to execution".
read the status word for the line
"fuibyte" and "clearseg" are both to be frequency clock first. If suc-
found in "m4~.s". "fuibyte" will nor- cessful, that clock is initial- Details of the representation of
mally return a positive value in the ised and the other (if present) processes in UNIX will be discussed in
range ~ to 255. However, in the excep- remains unused. If the first the next chapter. For now we just note
tional case where the memory location attempt is unsuccessful, then the that each process involves a "procH
referenced does not respond, the value other clock is tried. If both structure from the array called "procH
-1 is returned. (The way this is attempts are unsuccessful, there and a "per process data area" which
brought about is a little obscure, and is a calIon "panic" which effec- includes one copy of the structure "un.
will be explained later in Chapter tively halts the system with an
Ten.) error message to the operator.
1582: "maxmem" defines the maximum Initialisation of proc[!]
amount of main memory which may Since the absence of a clock will be
be used by a user program. This indicated by a bus timeout error, it is The explicit initialisation of the
is the minimum of: convenient to make the reference via structure "proc[0J" is performed start-
"fuiword", preceded by the setting of a ing at line 1589. Only four elements
the physically available memory user mode segmentation register pair are changed from the overall initial
("maxmem") ; (1599, 1600). value of zero:
an installation definable parame- (a) "p stat" is set to "SRUN" which
ter ("MAXMEM") (~135); implies that process #0 is
16~7: Either type of clock is initial- "ready to run";
the Ultimate limit imposed by the ised by the statement
PDPII architecture; (b) "p flag" is set to show both
*lks = ~1l5; "SLOAD" and "SSYS". The former
1583: "swapmap" defines available space implies that the process is to
on the swapping disk which may be As a consequence of this action, be found in core (it has not
used when user programs are the clock will interrupt the pro- been swapped out onto the disk),
swapped out of main memory. It is cessor within the next 2~ milli- and the second, that it should
initialised to a single area of seconds. This interrupt may never be swapped out;
size "nswap", starting at rela- occur at any time, but it will be
tive address "swplo". Note that convenient for this discussion to (c) "p_size" is set to "USIZE";
"nswap" and "swplo" are initial- assume that no interrupt will
ised in "conf.c" (lines 4697, occur before initialisation is (d) "p addr" is set to the contents
4698) ; complete; of- the kernel segmentation
address register #6.
1589: The significance of this and the 1613: "cinit" (8234) initialises the
next four lines will be discussed pool of character buffers. See
shortly; Chapter 23; It will be seen that process #0 has
acquired an area of "USIZE" blocks
1599: The design of UNIX assumes the 1614: "binit" (5~55) initialises the (exactly the size of a "per process
existence of a system clock which pool of large buffers. See data area") which begins immediately
interrupts the processor at line Chapter 17; after the official end (" end") of the
frequency (i.e 50 Hz or 60 Hz). operating system data area.
1615: "Hnit" (6922) initialises table
There are two possible clock entries for the root device. See The ordinal number of the first block
types available: a line frequency Chapter Twenty. of this area has been stored for future
clock (KWII-L) which has a con- reference in "p addr". This area,
trol register on the Unibus at which was cleared to zero in "start"

UNIX Operating System 6-3 Getting Started


(e661), contains a single copy of the When the processor is at level six, swtch (2178)
"user" structure called "u·. only devices with priority seven can
interrupt it. The clock whose priority 2184: "pO is a static variable (2180),
On line 1593, the address of "proc[0]" level is six is thus inhibited from which means that its value is
is stored in "u.u-procpn, i.e. the interrupting the processor between this initialised to zero (1566) and is
"procH structure and the ·u" structure point and the subsequent calion "sp10" preserved between calls. For the
are mutually linked. at line 1976. very first calion "swtch", "p"
is set to point to "proc[0]";
1960: A search is made through "procH
for a process whose status is 2189: "savu" is called to save the
The story continues ... "SRUN" and which is not "loaded". stack pointer and the environment
pointer for the current process
1627: "newproc" (1826) will be dis- in "u.u_rsav";
cussed in detail in the next (Processes *0 and *1 have status "SRUN"
chapter. and are loaded. All remaining 2193: "retu" is called:
processes, have a status of zero, which
In brief this initialises a is equivalent to "undefined" or (a) to reset the kernel address
second "procH structure viz. "NULL") . register for segment *6 to the
"proc[l]", and allocates a second value passed as an argument
"per process data area" in core. 1966: The search fails ("n" is still (this causes a change in the
This is a copy of the "per pro- -1). The flag "runout" is made current process!);
cess data area" for process *0, non-zero, indicating that there
exact in all but one respect: the are no processes which are both (b) to reset the stack and environ-
value of "u.u-procp" in the ready to run and "swapped out" ment pointers to values
second area is "&proc[l]". onto disk; appropriate to the revised
current process, whose execution
We should note here that at line 1968: "sleep" is called (to wait for is about to be resumed.
1889, there is a calion "savu" such an event) with a priority
(0725) which saves the current "PSWP" (== -100) for when it
values of the environment and the wakes up, which is in the The combination of successive calls on
stack pointers in "u.u rsav" category of "very urgent". "savu" and "retu" at this point consti-
before the copy is made. tutes a so-called "coroutine jump" (Cf.
"exchange jump" on the Cyber or "Load
Also from line 1918 we can see PSW" on the /360 or "Move Stack" on the
that the value returned by sleep (2066) B670e) .
"newproc" will be zero, so that
the statements on lines 1628 to 2070: UpS" is the address of the pro- This time however the coroutine jump is
1635 will not be executed; cessor status word. The processor from process '0 to proc~ss '0 (not y~r~
status is stored 1n the register inte~e.sting!). -
1637: A call is made to ·sched" (1940) Us" (0164, 0175);
which, it may be observed, con- 2201: The set of processes is searched
tains an infinite loop, so that 2071: "rp" is set to the address of the to find the process whose state
it never returns! entry in the array "procH of the is "SRUN" and which is loaded and
current process (still "proc[e]" for which "p_pri" is a maximum.
at this stage!);
The search is successful and pro-
sched (1940) 2072: "prig is negative, so the "else" cess '1 is found. (N.B. The
branch is taken, setting the state of process '0 was just
At this stage we are only interested in status of the current process changed from "SRUN" to "SSLEEP"
what happens when ·sched" is entered ('0) to "SSLEEP". The reason for in "sleep" so it no longer satis-
for the first time. "going to sleep" and the "awaken- fies the search criterion);
ing priority" are noted.
1958: "sp16" is an assembler routine 2218: Since "pH is not "NULL", the idle
(1292) which sets the processor 2093: "swtch" is then called. loop is not entered;
priority level to six. (Cf. also
"sp10", "sp14", "sp15" and "spI7" 2228: "retu" (0740) causes a coroutine
in "m4e.s"). jump to process *1 which becomes

UNIX Operating System 6-4 Getting Started


the current process. main revisited copy of the first instruction in
"icode". The instructions subse-
What is process #1 ? It is a copy The story so far: process #0, having quently executed are copies also
of process #0, made at a previous created a copy of itself in the form of of instructions in "icode".
stage of the latter's existence. process #1, has gone to sleep. As a
result process #1 has become the
current process and has returned to AT THIS POINT, THE INITIALISATION OF
This call on "retu" was not preceded by "main" with a value of one. Now read THE SYSTEM IS COMPLETE.
a call on "savu" because the necessary on •••
information has in fact been saved
already. (Where?) 1627: The statements in "main" which Process #1 is running and to all
are conditional on "newproc" are intents and purposes, is a normal pro-
2229: "sureg" is a routine (1738) which now executed; cess. Its initial form is (almost)
copies into the user mode segmen- that which would come from compilation,
tation registers, the values 1628: "expand" (2268) finds a new, loading and execution of the simple,
appropriate for the current pro- larger area (from USIZE*32 to but non-trivial "C" program:
cess. These have been stored ear- (USIZE+l) *32 words) for process
lier in the arrays "u.u uisa" and #1, and copies the original data char *init "/etc/init";
"u.u uisd". area into it. main ( ) {
exec 1 (init, init, 0);
In this case, the original user while (1);
The very first call on "sureg" copies data area consists only of a "per }
zeros and serves no real purpose. process data area", with zero
length data and stack areas. The The equivalent assembler program is
2240: The "SSWAP" flag is not set, so original area is released;
that this enigmatic (2239) sec- sys exec
tion can be ignored for now; 1629: "estabur" is used to set the init
"prototype" ~egmentation regis- initp
2247: Finally "swtch" returns with a ters which are stored in br
value of "1". But where does the "u.u uisa" and "u.u uisd" for ini tp: init
"return" return to? Not to later use by "sureg". "estabur" o
"sleep" ! calls "sureg" as its last action. init: </etc/init\0>
The parameters for "estabur" are
The "return" follows values set by the the sizes of the text, data and If the system call on "exec" fails
stack pointer and the environment stack areas plus an indicator to (e.g. the file "/etc/init" cannot be
pointer. These (just before the return) decide whether the text and data found) the process falls into a tight
have values equal to those in force areas should be in separate loop, and there the processor will
when the most recent "savu(u.u rsav) " address spaces. (Never true on stay, except when the occasional clock
was performed. - the PDPll/40.) The sizes are all interrupt occurs.
in units of 32 words;
Now process #1, which is only just 1630: "copyout" (1252) is an assembler A description of the functions per-
starting has never performed a "savu", routine which copies an array in formed by "/etc/init" can be found in
but values were stored in "u.u rsav" kernel space of specified size the section "INIT (VIII)" of the UPM.
before the copy of process #0 was made into a region in user space. Here
by "newproc", which had been called the array "icode" is copied into -000-
from "main". an area starting at location zero
in user space;
Thus in this case, the return from 1635: The "return" is not special. From
"swtch" is made to "main", with a value "main" it goes to "start" (0670)
of one. (Look over this again, t"()l)"e where the three last instructions
sure you understand! ) have the effect of causing
execution in user mode OL the
instruction- at user mode address
zero. i.e. the execution of a

UNIX Operating System 6-5 Getting Started


The definition for "process" already Our present concern is with the low
given, "a program in execution", does level interpretation: with the struc-
reasonably well in suggesting what is ture of the process image, with the
intended. However it does not fit the details of execution and with the means
case of either process Ie throughout for switching the processor between
its life or process #1 during its first processes.
moments. All other processes in the
system however are clearly associated
with the execution of some program file The following observations may be made
or other. about processes in the UNIX context:
(a) the existence of a process is
Processes can be introduced into dis- implied by the existence of a
cussions of operating systems at two non-null structure in the "procH
levels. array, i.e. a "procH structure
for which the element "p_stat"
is non-null;
At the upper level, "process" is an
important organising concept for (b) for each process there is a "per
describing the activity of a computer process data area" containing a
system as a whole. It is often copy of the "user" structure;
expedient to view the latter as the
combined activity of a number of (c) the processor spends its entire
processes, each associated with a par- life executing one process or
ticular program such as the "shell", or another (except when it is rest-
the "editor". A discussion of UNIX at ing between instructions);
this level is given in the second half
of Ritchie's and Thompson's paper, "The (d) it is possible for one process
UNIX Time-sharing System". to create or destroy another
process;

CHAPTER SEVEN At this level the processes themselves (e) a process may acquire and pos-
are considered to be the active enti- sess resources of various kinds.
Processes ties in the system, while the identi-
ties of the true active elements, the
processor and the peripheral devices,
are submerged: the processes are born, ~ Process Image
live and die~ they exist in varying
The previous chapter traced the numbers; they may acquire and release Ritchie and Thompson in their paper
developments which occur after the resources; they may interact, define a "process" as the execution of
operating system has been "rebooted", cooperate, conflict, share resources; an "image", where the "image" is the
and in so doing introduced a number of etc. current state of a pseudo-computer,
significant features of the process i.e. an abstract data structure, which
concept. One of the aims of this may be represented in either main
chapter is to go back and re-explore At the lower level, "processes" are memory or on disk.
some of the same ground more inactive entities which are acted on by
thoroughly. active entities such as the processor.
By allowing the processor to switch The process image involves two or three
frequently from the execution of one physically distinct areas of memory:
There are a number of serious difficul- process image to another, the impres-
ties in providing a generally accept- sion can be created that each of the (1) the "procH structure, which is
able definition of "process". These are process images is developing continu- contained within the core
akin to the difficulties faced by the ously and this leads to the upper level resident "procH array and is
philosopher who would answer "what is interpretation. accessible at all times;
life?" We will be in good company if we
brush the more subtle points lightly (2) the data segment, which con-
aside. sists----ot the "per process data

UNIX Operating System 7-1 Processes


area", combined with a segment from time to time as a function of (c) "u uisa[16]", "u uisd[16]" which
containing the user program "p_nice", "p_cpu" and "p_time"; store prototypes for the page
data, (possibly) program text, address and description regis-
and stack: "p pid", "p ppid" are numbers ters;
whIch uniquely identify a process
(3) the text segment, which is not and its parent; (d) "u tsize", "u _dsize", "u ssize"
always present, consists of a which are the size of the text
segment containing only pure "p_sig", "p uid", "p ttyp" are segment and two parameters
program text i.e. re-entrant involved with external communica- defining the size of the data
code and constant data. tion i.e. with messages or "sig- segment, measured in 32 word
nals" from outside the process's blocks.
normal domain;
Many programs do not have a separate
text segment. Where one is defined, a "p wchan" identifies, for a The remaining elements are concerned
single copy will be shared among all "sleeping" process ("p_stat" with:
processes which are executions of the equals either "SSLEEP" or
same particular program. "SWAIT"), the reason for sleeping; saving floating point registers
(not for the PDPll/40);
"p_textp" is either null or a
pointer to an entry in the "text" - user identification;
The proc Structure (0358) array (4306), which contains vital
statistics regarding the text seg- - parameters for input/output opera-
This structure, which is permanently ment. tions;
resident in main memory, contains fif-
teen elements, of which eight are char- - file access control;
acters, six are integers, and one a
pointer to an integer. Each element The user Structure (0413) - system call parameters;
represents information that must be
accessible at any time, especially when One copy of the "user" structure is an accounting information.
the main part of the process image has essential ingredient of each "per pro-
been swapped out to disk: cess data area". At anyone time there
is exactly one copy of the "user"
"p stat" may take one of seven structure which is accessible. This The Per Process Data Area
values which define seven mutually goes under the name "un and is always
exclusive states. See lines 0381 to be found at kernel address 0140000 The "per process data area" corresponds
to 0387; i.e. at the beginning of the seventh to the valid part (lower part) of the
page of the kernel address space. seventh page of the kernel address
"p flag" is an amalgam of six one space. It is 1024 bytes long. The lower
bit flags which may be set 289 bytes are occupied by an instance
independently. See lines 0391 to The "user" structure has more elements of the "user" structure, leaving 367
0396; than can be conveniently or usefully words to be used as a kernel mode stack
introduced here. The comment accompany- area. (Obviously there will be as many
"p addr" is the address of the ing each declaration on Sheet 04 suc- kernel mode stacks as there are
data segment: cinctly suggests the function of each processes. )
element.
If the data segment is in main
memory this is a block number: While the processor is in kernel mode,
For the moment you should notice: the values of r5 and r6, the environ-
otherwise, if the data segment ment and stack pointers, should remain
has been swapped out, this is a (a) "u_rsav","u qsav", "u_ssav" within the range
disk record number; which are two word arrays used 0140441 to 01437777.
to store values for r5, r6; Transition beyond the upper limit would
"p_size" is the size of the data be trapped as a segmentation violation,'
segment, measured in blocks; (b) "u_procp" which gives the but the lower limit is protected only
address of the corresponding by the integrity of the software. (It
"p pri" is the current process "procH structure in the "proc n may be noted that the hardware stack
prlority. This may be recalculated array; limit option is not used by UNIX.)

UNIX Operating System 7-2 Processes


The Segments user process Ii to be separate setting the user segmentation registers
processes, rather than different is much less so.
The data segment is allocated as one aspects of a single process ii.
single area of physical memory but con-
sists of three distinct parts:
An Example
(a) a "per process data area"; Kernel Mode Execution
Consider a program on the PDPll/40
(b) a data area for the user pro- The seventh kernel segmentation address which uses 1.7 pages of text, 3.3 pages
gram. This may be further register must be set appropriately. of data, and 0.7 pages of stack area.
divided into areas for program None of the other kernel segmentation (Our use of fractions in this example
text, initialised data and unin- registers is ever disturbed and so is admittedly a little crude.) The set
itialised data; their values are assumed. As was seen of virtual addresses would be divided
earlier, the first six kernel pages are as shown in the following diagram:
(c) a stack for the user program. mapped to the first six pages of physi-
cal memory, while the eighth is mapped
into the highest page of physical 888 III sl stack
The size of (a) is always "USIZE" memory. The size of the seventh segment +-~8~88~/~/~/~s~1~-=area
blocks. The sizes of (bl and (cl are is always the same. 888
given in blocks by "u.u dsize" and 777
"u.u ssize". (It may be noted in pass- 777
ingthat the latter two may change dur- In kernel mode the setting of the user 777
ing the life of a process.) mode segmentation registers is in gen- 666
eral irrelevant. However they are nor- 666
mally set correctly for the user pro- 666 \~\ d4
A separate text segment containing only cess. 555 \ \ d3
pure text is allocated as one single 555 \\\ d3
area of physical memory. The internal 555 \\\ d3
structure of the segment is not impor- The environment and stack pointers 444 \\\ d2 data
tant here. point into the kernel stack area in the 444 \\\ d2
seventh page, above the "user" struc- 444 \\\ d2 area
ture. 333 \\\ dl
333 \\\ dl
Execution of an Image 333 \\\ dl
222
The image currently being executed (and User Mode
- --- Execution I 222 III t2
hence the identity of the current pro- I 222 III t2 text
cess) is determined by the setting of Each activation of a user process is I III III tl
the seventh kernel segmentation address preceded and succeeded by an activation I III III tl area
register. If process *i is the current of the corresponding kernel process. I 111 III tl
process, then the register has the Accordingly both the user mode and ker-
value "proc[iJ .p_addr". nel mode registers will be properly set Virtual Address Space
whenever a process image is being exe-
cuted in user mode.
It is often desirable to distinguish Two whole pages in the virtual address
between a process being executed in space must be allocated to the text
kernel mode and the same one being exe- The environment and stack pointers segment, even though the physical area
cuted in user mode. We will use the point into the user stack area. This required is only 1.7 pages.
terms "kernel process *i" and "user begins as the upper part of the eighth
process *i" to denote "process *i exe- user page, but may be extended down-
cuting in kernel mode" and "process #i wards, e.g. to occupy the whole of 222 III t2
executing in user mode" respectively. eighth page and part or all of the 222 III t2 text
seventh page, etc. III III tl
If we chose to associate processes with III III tl area
particular execution stacks rather than 111 III tl
with an entry in the "pr06" array, then Whereas the setting of the kernel seg-
we would consider kernel process *i and mentation registers is fairly trivial, Text Segment

UNIX Operating System 7-3 Processes


The data and stack areas require the Setting the Segmentation Registers Where more than one page is allocated,
dedication of four and one pages of all but the last will consist of 128
virtual address space, and 3.3 and 0.7 Prototypes for the user segmentation blocks (4096 words), and will be read
pages of physical memory respectively. registers are set up by "estabur" which only, and will have relative addresses
is called when a program is first starting at zero and increasing succes-
The whole data segment requires four launched into execution, and again sively by 128.
and one eighth pages of physical whenever a significant change in memory
memory. The extra eighth is for the allocation requires it. The prototypes
"per process data area" which are stored in the arrays "u.u_uisa", 1672: If some fraction of a page of
corresponds (from time to time) to the "u.u uisd". text is still to be assigned.
seventh kernel address page. allocate the appropriate part of
the next page;
Whenever process #i is about to be re-
888 III sl stack activated, the procedure "sureg" is 1677: if "in and "d" spaces are being
888 III sl
+-~~~~~~+-=area called to copy the the prototypes into used separately, mark the segmen-
666 \\\ d4 the appropriate registers. The descrip- tation registers for the remain-
555 \\\ d3 tion registers are copied directly, but ing "in pages as null;
555 \\\ d3 the address registers must be adjusted
555 \\\ d3 to reflect the actual location in phy- 1682: "a" is reset because all remain-
444 \\\ d2 data sical memory of the area used. ing addresses refer to the data
444 \\\ d2 area (not the text area) and are
444 \\\ d2 area relative to the beginning of this
333 \\\ dl area. The first "USIZE" blocks
333 \\\ dl estabur (1650) of this area are reserved for the
333 \\\ dl "per process data area";
ppda 1654: Various checks on consistency are
performed, to ensure that the 1703: The stack area is allocated from
Data Segment requested sizes for the text, the top of the address space
data and stack are reasonable. towards the lower addresses
("downwards");
Note the order of the components of the Note that a non-zero value for
data segment, and that there is no "sep" implies separate mappings 1711: If a partial page must be allo-
embedded unused space. for the text area ("i" space) and cated for the stack area, it is
the data area ("d" space). This the high address part of the page
The user mode segmentation need to be is never possible on the which· is valid. (For text and
set to reflect the values in the fol- PDPll/40; data areas, which grow "upwards",
lowing table, where "tn, "d" denote the it is the lower part of a partial
block numbers of beginning of the text 1664: "a" defines the address of a seg- page which is valid.) This
and data segments respectively: ment relative to an arbitrary requires an extra bit in the
base of zero. nap" and "dp" point descr iptor, hence "ED" (" expan-
Page Address Size Comment to the set of prototype segmenta- sion downwards");
======= tion address and descriptor
registers respectively. 1714: If separate "in and "d" spaces
1 t+0 1.0 read only are not used, only the first
2 t+l28 0.7 read only eight of the sixteen prototype
3 d+l6 1.0 The first eight of each of these sets register pairs will have been
4 d+144 1.0 are intended to refer to Hi" space, and initialised by this point. In
5 d+272 1.0 the second eight, to "d" space. this case, the second eight are
6 d+400 0.3 copied from the first eight.
7 ? 0.0 not used 1667: "ntH measures the number of 32
8 d+400 0.7 grows downwards word blocks needed for the text
segment. If "ntH is non-zero,
one or more pages must be allo-
Note the setting of the eighth address cated for this purpose.
register. The address prototypes stored This routine is called by "estabur"
in t~e array "u.u_uisa" are obtained by (1724), "swtch" (2229) and "expand"
settlng "t" and "d" to zero. (2295) , to copy the prototype

UNIX Operating System 7-4 Processes


segmentation registers into the actual distinguishing number for the have to be made on disk. The
hardware segmentation registers. process. Since the cycle of next section of code should be
values may eventually repeat, a analysed carefully because of the
check is made that the number is inconsistency introduced at line
1743: Get the base address for the data not still in use; if so a new 1891 Le.
area from the appropriate element value is tried; u.u_procp->p_addr != *ka6
of the "procH array;
1846: A search is made through the 1903: Mark the current process as
1744: The prototype address registers "procH array for a null "procH "SIDL" to head off temporarily
(of which there are only eight structure (indicated by "p stat" any further attempt to swap it
for the PDPll/40) are modified by having a null value); - out (Le. initiated by "sched"
the addition of "a" and stored in (1940» ;
the hardware segmentation address 1860: At this point, the address of the
registers; new entry in the "procH array is 1904: Make the new "procH entry con-
stored as both Up" and "rpp", and sistent, i.e. set
1752: Test if a separate text area has the address of "procH entry for rpp->p_addr = *ka6;
been allocated, and if so, reset the current process is stored
"a" to the relative address of both as"up" and "rip"; 1905: Save the current values of the
the text area to the data area. environment and stack pointers in
(Note this value may be negative! 1861: The attributes of the new process "u.u_ssav";
Fortunately at this point, are stored in the new "procH
addresses are in terms of 32 word entry. Many of these are copied 1906: Call "xswap" (4368) to copy the
blocks.); from the current process; data segment into the disk swap
area. Because the second parame-
1754: The pattern of code now followed 1876: The new process inherits the open ter is zero, the main memory area
is similar to the beginning of files of its parent. Increment will not be released;
the routine, except ... the reference count for each of
these; 1907: Mark the new process as "swapped
1762: a rather obscure piece of code out";
adjusts the setting of the 1879: If there is a separate text seg-
address register for segments ment increment the associated 1908: Return the current process to its
which are not "writable" i.e. reference counts. Notice that normal state;
which presumably are text seg- "rip", "rpp" are used for tem-
ments. porary reference here; 1913: There was room in main memory, so
store the address of the new
1883: Increment the reference count for "-proc" entry and copy the data
The code in "estabur" and "sureg" shows the parent's current directory; segment a block at a timeL
evidence of having been developed in
several stages and is not as elegant as 1889: Save the current values of the 1917: Restore the current process's
could be desired. environment and stack pointers in "'per' process data area" to its
"u.u rsav" . IIsavu lf is an assem- previous state;
bIer
- routine defined at line
0725 ; 1918: Return with a value of zero.
newproc (1826)
1890: Restore the values of "rip" and
It is now time to take a good look at "rpp". Temporarily change the Obviously "newproc" on its own is not
the procedure which creates new value of "u.u procp" from the sufficient to produce an interesting
processes as (almost exact) replicas of value appropriate to the current and varied set of processes. The pro-
their creators. process to the value appropriate cedure "exec" (3020) which is discussed
to the new process; in Chapter Twelve provides the neces-
sary additional facility: the means for
1896: Try to find an area in main a process to change its character, to
1841: "mpid" is an integer which is memory in which to create the new be reincarnated.
stepped through the values 0 to da ta segf.len t;
32767. As each new process is -000-
created, a new value for "mpid" 1902: If there is no suitable area in
is created to provide a unique main memory, the new copy will

UNIX Operating System 7-5 Processes


This may be done for example if a pro- it had left off regardless. This point
cess has reached a point beyond which is important for understanding how UNIX
it cannot proceed immediately. The pro- avoids many of the pitfalls associated
cess calls "sleep" (2066) which calls with "critical sections" of code, which
"swtch". are discussed at the end of this
chapter.
Alternatively a kernel process which is
ready to revert to user mode will test
the variable "runrun" and if this is Program Swapping
non-zero, implying that a process with
a higher precedence is ready to run, In general there will be insufficient
the kernel process will call "swtch". main memory for all the process images
at once, and the data segments for some
of these will have to be "swapped out"
"swtch" searches the "proc" table, for i.e. written to disk in a special area
entries for which "p stat" equals designated as the swap area.
"SRUN" and the "SLOAD" bit is set in
"p flag". From these it selects the
pr;cess for which the value of "p pri" While on disk the process images are
is a minimum, and transfers contr;l to relatively inaccessible and certainly
it. unexecutable. The set of process
images in main memory must therefore be
Values for "p_pri" are recalculated for changed regularly by swapping images in
each process from time to time by use and out. Most decisions regarding
of the procedure "setpri" (2156). Obvi- swapping are made by the procedure
ously the algorithm used by "setpri" "sched" (1940) which is considered in
has a significant influence. detail in Chapter Fourteen.

CHAPTER EIGHT A process which has called "sleep" and "sched" is executed by process #I!,
suspended itself may be returned to the which after completing its initial
Process Management "ready to run" state by another pro- tasks, spends its time in a double
cess. This often occurs during the role: openly as the "scheduler" i.e. a
handling of interrupts when the process normal kernel process; and surrepti-
handling the interrupt calls "setrun" tiously as the intermediate process of
(2134) either directly or indirectly "swtch" (discussed in Chapter Seven).
Process management is concerned with via a calIon "wakeup" (2113). Since the procedure "sched" never ter-
the sharing of the processor and the minates, kernel process #I! never com-
main memory amongst the various pletes its task, and so the question of
processes, which can be seen as com- a user process #I! does not arise.
petitors for these resources. Interrupts
It should be noted that a hardware
Decisions to reallocate resources are interrupt (see Chapter Nine) does not Jobs
made from time to time, either on the directly cause a calIon "swtch" or its
initiative of the process which holds equivalent. A hardware interrupt will There is no concept of "job" in UNIX,
the resource, of for some other reason. cause a user process to revert to a at least in the sense in which this
kernel process, which as just noted, term is understood in more conven-
may call "swtch" as an alternative to tional, batch processing oriented sys-
reverting to user mode after the inter- tems.
Process Switching rupt handling is complete.
Any process may "fork" a new copy of
An active process may suspend itself itself at any time, essentially without
i.e relinquish the processor, by cal- If a kernel process is interrupted, delay, and hence create the equivalent
ling "swtch" (2178) which calls " retu" then after the interrupt has been han- of a new job. Hence job scheduling,
(1!7U) • dled, the kernel process resumes where job classes, etc. are non-events here.

UNIX Operating System 8-1 Process Management


Assembler Procedures swtch (2178) spends most of its time executing
line 22213. It is only disturbed
The next three procedures are written "swtch" is called by "trap" (0770, thence by an interrupt (e.g. from
in assembler and run with the processor 0791), "sleep" (21384, 2093), "expand" the clock);
priority level set to seven. These (2287), "exit" (3256), "stop" (4027)
procedures do not observe the normal and "xalloc" (4480). 2196: The flag "runrun" is reset. (It
procedure entry conventions so that r5 is used to indicate that a higher
and r6, the environment and stack priority process than the current
pointers, are not disturbed during pro- This procedure is unique in that its process is ready to run. "swtch"
cedure entry and exit. execution is in three phases which in is about to look for the highest
general involve three separate kernel priority process.);
As has already been noted, "savu" and processes. The first and third of
"retu" can combine to produce the these processes will be called the 2224: The priority of the "arising"
effect of a coroutine jump. The third "retiring" and the "arising" processes process is noted in "curpri" (a
procedure, "aretu", when followed by a respectively. Process *0 is always the global variable) for future
"return" statement produces the effect intermediate process; it may be the reference and comparison;
of a non-local "goto". "retiring" or the "arising" process as
well. 2228: Another calIon "retu" resets r5,
r6 and the seventh kernel address
register to values appropriate
savu (0725) Note that the only variables used by for the "arising" process;
"swtch" are either registers, or global
This procedure is called by "newproc" or static (stored globally). 2229: Phase Three begins:
(1889, 1905), "swtch" (2189, 2281),
"expand" (2284), "trapl" (2846) and "sureg" (1739) resets the user
"xswap" (4476,4477). 2184: The static structure pointer, mode hardware segmentation regis-
"p", defines a starting point for ters using the stored prototypes
The values of r5 and r6 are stored in searching through the "proc" for the arising process;
the array whose address is passed as a array to locate the next process
par arne te r . to activate. Its use reduces the 22313: The comment which begins here is
bias shown to processes entered not encouraging. We will return
early in the "proc" array. If "p" to this point again towards the
is null, set its value to the end of this chapter;
retu (0740) beginning of the "proc" array.
This should only occur upon the 2247: If you check, you will find that
This procedure is called by "swtch" very first calIon "swtch"; none of the procedures which call
(2193, 2228) and "expand" (2294). "swtch" directly examines the
2189: A calIon "savu" (0725) saves the value returned here.
current values of the environment
It resets the seventh kernel segmenta- and stack pointers (r5 and r6); Only the procedures which call
tion address register, and then resets "newproc" which are interested in
r6 and r5 from the newly accessible 2193: "retu" (137413) resets r5 and r6, this value, because of the way
copy of "u.u rsav" (which it may be and, most importantly, resets the the child process is first
noted, is at the beginning of "u"). kernel address register *6 to activated!
address the "scheduler's" data
segment;
aretu (0734) 2195: Phase Two begins: setpri (2156)
This procedure is called by "sleep" The code from this line to line 2161: Process priorities are calculated
(2106) and "swtch" (2242). 2224 is only ever executed by according to the formula:
kernel process ia. There are two
It reloads r6 and r5 from the address nested loops, from which there is priority = min {127, (time used +
passed as a parameter. no exit until a runnable process PUSER + p_nice)}
can be found.
where
At slack periods, the processor

UNIX Operating System 8-2 Process Management


(1) time used = accumulated central 2070: The current processor status is setrun (2134)
processor time (usually since the saved to preserve the incoming
process was last swapped in), processor priority and previous 2140: The process status is set to
measured in clock ticks divided mode information; "SRUN". The process will now be
by 16 i.e. thirds of a second. considered by "swtch" and "sched"
(More on this later when we dis- 2072: If the priority is non-negative, as a candidate for execution
cuss the clock interrupt.); a test is made for "waiting sig- again;
nals";
(2) PUSER == 100; 2141: If the aroused process is more
2075: A small critical section begins important (lower priority!) than
( 3) "p nice" is a parameter used to here, wherein the process status the current process, the
bias the process priority. It is is changed and the parameters are rescheduling flag, "runrun" is
normally positive and hence stored in generally accessible set for later reference;
reduces the process's effective locations (viz. within the array
precedence. "proc"). 2143: If "sched" is sleeping, waiting
for a process to "swap in", and
This code is critical because the if the newly aroused process is
Note the somewhat confusing convention same information fields may be on disk, wake up "sched".
in UNIX that the lower the priority, interrogated and changed by
the higher the precedence. Thus a "wakeup" (2113) which is fre-
priority of -10 beats a priority of 100 quently called by interrupt Since it turns out that "sched" is the
every time. handlers; only procedure which calls "sleep" with
"chan" equal to "&runout", line 2145
2165: Set the rescheduling flag if the 2080: When "runin" is non-zero, the could be replaced by the recursive call
process, whose priority has just scheduler (process #0) is waiting
been recalculated, has less pre- to swap another process into main setrun (&proc[0]);
cedence than the current process. memory;
or better still, by just
2084: The calIon "swtch" represents a
The sense of the test on line 2165 is delay of unknown extent during rp = &proc[0];
surprIsIng, especially when it is com- which a relevant external event goto sr;
pared with line 2141. We leave it to may have occurred. Hence the
the reader to satisfy himself that this second test on "issig" (2085) is where "sr" is a label to be inserted at
is not an error. (Hint: look at the not irrelevant; the beginning of line 2139.
parameters for the calls on "setpri".)
2087: For negative priority "sleeps",
where the process typically waits
for freeing of system table expand (2268)
sleep (2066) space, the occurrence of a "sig-
nal" is not allowed to deflect The comment at the beginning of this
This procedure is called (from nearly the course of the activity. procedure (2251) says most of what
30 different places in the code) when a needs to be said about the procedure,
kernel process chooses to suspend except for the question of "swapping
itself. There are two parameters: out" when not enough core is available.
wakeup (2113)
- the reason for sleeping; Note that "expand" takes no particular
This procedure complements "sleep". It notice of the contents of the user data
- a priority with which the process simply searches the set of all area or stack area.
will run after being awakened. processes, looking for any processes
which are "sleeping" for a specified
reason (given as the parameter "chan"), 2277: If the expansion is actually a
If this priority is negative the pro- and reactivating these individually by contraction, then trim off the
cess cannot be aroused from its sleep a calIon "setrun". excess from the high address end;
by the arrival of a "sianal". "sianals"
are discussed in Chapter Thirteen: . 2281: "savu" stores the values of r5
and r6 in "u.u_rsav";

UNIX Operating System 8-3 Process Management


2283: If sufficient main memory is not swtch revisited section of code which is critical with
available ••. respect to a particular set of data.
What happens to the process when it is
2284: The environment pointer and stack reactivated i.e. it becomes the "aris-
pointer are recorded again in ing" process in "swtch"? In UNIX user processes do not share
"u.u ssav". But note that since data and so do not conflict in this
no -new procedures have been 2228: The stack and environment way. Kernel processes however have
entered, and since there has been pointers are restored from shared access to various system data
no cumulative stack growth, the "u.u rsav" (Note that a pointer and can conflict.
values recorded are the same as to "un is also a pointer to
at line 2281; "u. u_rsav" (1iI415) but
In UNIX an interrupt does not cause a
2285: "xswap" (4368) copies the core 2241i1: If the core image was "swapped change in process as a direct side
image for the process designated out" e.g. by expand" ... effect. Only where kernel processes
by its first parameter to disk. may suspend themselves in the middle of
2242: No reliance is placed on the a critical section by an explicit call
Since the second parameter is values of the stack and environ- on "sleep", does an explicit lock vari-
non-zero the main memory area ment pointers, and they are reset able (which may be observed by a group
occupied by the data segment is from "u.u ssav". of processes) need to be introduced.
returned to the list of available Even then the actions of testing and
space. setting the locks do not usually have
The question is "if the values stored to be made inseparable.
However the computation continues in "u.u ssav" at line 2284 are the same
using the same area in main as valuis stored in "u.u rsav" at line
memory until the next calIon 2281, how did they git to be dif- Some critical sections of code are exe-
"retu" (2193) in "swtchn. ferent?" cuted by interrupt handlers. To pro-
tect other sections of code whose out-
come may be affected by the handling of
Note also that the calIon "savu" at Presumably this is what "you are not certain interrupts, the processor
line 2189 in "swtch" stores new values expected to understand" (line 2238) priority is raised temporarily high
in "u.u_rsav" after the disk image has clearly "xswap" should be investigated enough before the critical section is
been made (and therefore serves no use- the trail finally ends at Chapter entered to delay such interrupts until
ful purpose since the core image has Fifteen ... in the meantime you may it is safe, when the processor priority
already been officially "abandoned"); wish to investigate for yourself so is reduced again. There are of course
that you may join the "2238" club that a number of conventions which interrupt
2286: The "SSWAP" flag is set in the much sooner. handling code should observe, as will
process's "proc" array element. be discussed later in Chapter Nine.
(This is not swapped out, so the
effect is not lost!);
Critical Sections In passing it may be noted that the
2287: "swtch" is called, and the pro- strategy adopted by UNIX works only for
cess, still running in its old If two or more processes operate on the a single processor system and would be
area suspends itself. Since the same set of data, then the combined totally inappropriate in a multi-
calIon "xswap" will have output of the set of processes may processor system.
resulted in the "SLOAD" flag depend on the relative synchronisation
being switched off, there is no of the various processes. -000-
way that "swtch" will choose the
process for immediate reactiva-
tion. This is usually considered to be highly
undesirable and to be avoided at all
Only after the disk image has costs. The solution is usually to
been copied back into core again define "critical sections" (it is the
can the process be activated programmer's responsibility to recog-
again. The "return" executed by nise these) in the code which is exe-
"swtch" is a return to the pro- cuted by each process. The programmer
cedure which called "expand". must then ensure that at any time no
more than one process is executing a

UNIX Operating System 8-4 Process Management


During a hardware interrupt:
The CPU saves the current processor
status word (PS) and the current
program count (PC) in its inter-
nal registers;
Section Two is concerned with traps,
hardware interrupts and software inter- the PC and PS are then reloaded from
rupts. two consecutive words located in
the low area of main memory. The
address of the first of these
Traps and hardware interrupts introduce two words is known as the
sudden switches into the CPU's normal "vector location" of the inter-
instruction execution sequence. This rupt;
provides a mechanism for handling spe-
cial conditions which occur outside the finally the original PC and PS values
CPU's immediate control. are stored into the newly
current stack. (Whether this is
the kernel or user stack depends
Use is made of this facility as part of on the new value of the PS.)
another mechanism called the "system
call", whereby a user program may exe-
cute a "trap" instruction to cause a Different peripheral devices may have
trap deliberately and so obtain the different vector locations. The actual
operating system's attention and assis- vector location for a particular device
tance. is determined by hard wiring, and can
CHAPTER NINE only be changed with difficulty. More-
over there are well entrenched conven-
The software interrupt (or "signal") is Hardware Interrupts and Traps tions for choosing vector locations for
a mechanism for communication between the various devices.
processes, particularly when there is
"bad news".
Thus after the interrupt has occurred,
In the PDPII computer, as in many other because the PC has been reloaded, the
computers, there is an "interrupt" source of instructions executed by the
mechanism, which allows the controllers CPU has been changed. The new source
of peripheral devices (which are dev- should be a procedure associated with
ices external to the CPU) to interrupt the peripheral device controller which
the CPU at appropriate times, with caused the interrupt.
requests for operating system service.
Also since the PS has also been
The same mechanism has been usefully changed, the processor mode may have
and conveniently applied to "traps" changed. In UNIX, the initial mode may
which are events internal to the CPU, be either "user" or "kernel", but after
which relate to hardware and software the interrupt, the mode is always "ker-
errors, and to requests for service nel". Recall also that a change in mode
from user programs. implies:
(a) a change in memory mappings.
(Note that to avoid any confu-
Hardware Interrupts sion, vector locations are
always interpreted as kernel
The effect of an interrupt is to divert mode addresses.);
the CPU from whatever it was doing and
to redirect it to execute another pro- (b) a change in stack pointers.
gram. (Reca'll that the stack pointer,

UNIX Operating System 9-1 Hardware Interrupts and Traps


SP or r6, is the only special Priorities reloading the processor status word
register which is replicated for e.g. upon returning from the interrupt.
each mode. This implies that An interrupt does not necessarily occur
after a mode change, the stack immediately the peripheral device con-
pointer value will have changed troller requests it, but only when the During interrupt handling, the proces-
even though it has not been CPU is ready to accept it. It is usu- sor priority may be raised temporarily
reloaded! ) ally desirable that a request for a low to protect the integrity of certain
priority service should not be allowed operations. For instance, character
to interrupt an activity with a higher oriented devices such as the paper tape
priority. reader/punch or the line printer inter-
The Interrupt Vector rupt at level four. Their interrupt
handlers call "getc" (0930) or "putc"
For our sample system, the representa- Bits 7 to 5 of the PS determine the (3967), which raise the processor
tive peripheral devices chosen are processor priority at one of eight lev- priority temporarily to level five,
listed in Table 9.1, along with their els (labelled zero to seven). Each while the character buffer queues are
conventional hardware defined vector interrupt also has an associated prior- manipulated.
locations and priorities. ity level determined by hardware wir-
ing. An interrupt will be inhibited as
vector peripheral interrupt process long as the processor priority is The interrupt handler for the console
location device priority priority greater than or equal to the interrupt teletype makes use of a "timeout"
======== ========== ========= ======== prior i ty. facility. This involves a queue which
is also manipulated by the clock inter-
363 teletype input 4 4 rupt handler, which runs at level six.
364 teletype output 4 4 After the interrupt the processor To prevent possible interference, the
373 paper tape input 4 4 priority will be determined from the PS "timeout" procedure (3835) runs at
374 paper tape output 4 4 stored in the vector location and this level seven (the highest possible
133 line clock 6 6 does not have to be the same as the level) .
134 programmable clock 6 6 interrupt priority. Whereas the inter-
233 line printer 4 4 rupt priority is determined by
223 RK disk drive 5 5 hardware, it is possible for the Usually it does not make sense to run
operating system to change the contents an interrupt handler at a processor
Table 9.1 Interrupt of the vector location at any time. priority lower than the interrupt
vector-LOCations and Priorities priority, for this would then risk a
second interrupt of the same type, even
As a matter of curiosity, it may be from the same device, before completion
noted that the PDPII hardware restricts of the processing of the first inter-
Interrupt Handlers the possible interrupt priorities to 4, rupt. This likely to be at best incon-
5, 6 and 7 i.e. levels 1, 2 and 3 are venient and at worst disastrous. How-
Within this selection of UNIX source not supported by the Unibus. ever the clock interrupt handler, which
code, there are seven procedures known once per second has a lot of extra work
as "interrupt handlers", i.e. which are to do, does exactly this.
executed as the result of, and only as
the result of, interrupts: Interrupt Priorities
clock (3725) pcrint (8719) In UNIX, interrupt handling routines Rules for Interrupt Handlers
rkintr (5451) pcpint (8739) are initiated at the same priority as
klxint (8373 ) lpint (8976) the interrupt priority. As discussed above, interrupt handlers
klrint (8378) need to be careful about the manipula-
tion of the processor priority to avoid
This means that during the handling of allowing other interrupts to happen
"clock" will be examined in detail in the interrupt, a second interrupt from "too soon". Likewise care needs to be
Chapter 11. The others are discussed a device of the same priority class taken that the other interrupts are not
with the code for their associated dev- will be delayed until the processor delayed excessively, lest the perfor-
ices. priority is reduced, either by the exe- mance of the whole system be degraded.
cution of one of the "spl" procedures,
which are intended for just this pur-
pose (see lines 1293 to 1315), or by

UNIX Operating System 9-2 Hardware Interrupts and Traps


It is important to note that when an "Traps" are unlike "interrupts" in that The contents of Tables 9.1 and 9.2
interrupt occurs, the process which is they occur as the result of events should be compared with the file
currently active will very likely not internal to the CPU, rather than exter- "low.s" on Sheet 05. As noted earlier,
be the process which is interested in nally. (In other systems the terminol- this file is generated at each instal-
the occurrence. Consider the following ogy "internal interrupt" and "external lation (along with the file "conf.c"
scenario: interrupt" is used to draw this dis- (sheet 46», as the product of the
tinction more forcefully.) Traps may utility program "mkconf", so as to
occur unexpectedly as the result of reflect the actual set of peripherals
User process #m is active and initiates hardware or power failures, or predict- installed.
an i/o operation. It executes a trap ably and reproducibly, e.g. as the
instruction and transfers to kernel result of executing an illegal instruc-
mode. Kernel process #m initiates the tion or a "trap" instruction.
required operation and then calls Assembly Language 'trap'
"sleep" to suspend itself to await com-
pletion of the operation ... "Traps" are always recognised by the From "low.s" it appears that traps and
CPU immediately. They cannot be delayed interrupts are handled separately by
Some time later, when some other pro- in the way low priority interrupts may the software. However closer examina-
cess, user process #n say, is active, be. If you like, "traps" have an tion reveals that "call" and "trap" are
the operation is completed and an "interrupt priority" of eight. different entry points to a single code
interrupt occurs. Process #n reverts to sequence in the file "m40.s" (see lines
kernel mode, and kernel process #n 0755, 0776). This sequence is examined
deals with the interrupt, even though "Trap" instructions may be deliberately in detail in the next chapter.
it may have no interest in or prior inserted in user mode programs to catch
knowledge of the operation. the attention of the operating system
with a request to perform a specified During the execution of this sequence,
service. This mechanism is used as part a call is made on a "C" language pro-
Usually kernel process #n will include of the facility known as "system cedure to carry out further specific
waking process #m as part of its calls". processing. In the case of an inter-
activity. This will not always be the rupt, the "C" procedure is the inter-
case though, e.g. where an error has rupt handler specific to the particular
occurred and the operation is retried. Like interrupts, traps result in the device controller.
reloading of the PC and PS from a vec-
tor location, and the saving of the old
Clearly, the interrupt handler for a values of the PC and PS in the current In the case of a trap, the "C" pro-
peripheral device should not make stack. Table 9.2 lists the vector loca- cedure is another procedure called
references to the current "un structure tions for the various "trap" types. "trap" (yes, the word "trap" is defin-
for this is not likely to be the itely overworked!), which in the case
appropriate "un structure. (The vector trap type process of a system error will most likely call
appropriate "un structure could quite location priority "panic" and in the case of a "system
possibly be inaccessible, if it has call", will invoke (indirectly via
been temporarily swapped out to the "trapln(284l» the appropriate system
disk. ) 004 bus timeout 7 call procedure.
010 illegal instruction 7
014 bpt-trace 7
Likewise the interrupt handler should 020 iot 7
not call "sleep" because the process 024 power failure 7 Return
thus suspended will most likely be some 030 emulator trap 7
innocent process. instruction Upon completion of the handling of an
034 trap instruction 7 interrupt or trap, the code follows a
114 11/70 parity 7 common path ending in an "rtt" instruc-
240 programmed interrupt 7 tion (0805). This reloads both the PC
244 floating point error 7 and PS from the current stack, i.e. the
250 segmentation violation 7 kernel stack, in order to restore the
"Traps" are like "interrupts" in that processor environment that existed
they are events which are handled by Table ~.~ Trap before the interrupt or trap.
the same hardware mechanism, and hence vector Locations and priorities
by similar software mechanisms. -000-"

UNIX Operating System 9-3 Hardware Interrupts and Traps


of a trap or interrupt was expected: The routine proceeds as follows:

(a) "main" (1564) calls "fuibyte" 0846: The argument is moved to rl;
repeatedly until a negative
value is returned. This will 0848: "gword" is called;
occur after a "bus timeout
error" has been encountered with 0852: The current PS is stored on the
a subsequent trap to vector stack;
location 4 (line 0512);
0853: The priority level is raised to 7
(b) The clock has been set running (to disable interrupts);
and will generate an interrupt
every clock tick i.e. 16.7 or 20 0854: The contents of the location
milliseconds; "nofault" (1466) are saved in the
stack;
(c) Process *1 is about to execute a
"trap" instruction as part of 0855: "nofault" is loaded with the
the system calIon "exec". address of the routine "err";

0856: An "mfpi" instruction is used to


fetch the word from user space.
fuibyte (0814)
If nothing goes wrong this value will
fuiword (0844) be left on the kernel stack.

"main" uses both "fuibyte" and "fui- 0857: The value is transferred from the
word". Since the former is more compli- stack to r0;
cated in a non-essential way, we leave
it to the reader, and concentrate on 0876: The previous values of "nofault"
the latter. and PS are restored;

CHAPTER TEN 0878: Return via line 0849.


"fuiword" is called (1602) when the
The Assembler "Trap" Routine system is running in kernel mode with Now suppose something does ~ wrong
one argume~t which is an address in with the "mfpi" instruction, and a bus
user address space. The function of the time-out does occur.
routine is to fetch the value of the
corresponding word and to return it as 0856: The "mfpi" instruction will be
The principal purpose of this chapter a result (left in (0). However if an aborted. PC will point to the
is to examine the a~sembly langua~e error occurs, the value -1 is to be next instruction (0857) and a
code in "m40.s" which IS involved In returned. trap via vector location 4 will
the handling of interrupts and traps. occur;

Note that with "fuiword", there is an 0512: The new PC will have the value of
This code is found between lines 0750 ambiguity which does not occur with "trap". The new PS will indicate:
and 0805, and has two entry points, "fuibyte", namely a returned value of
"trap" (0755) and "call" (0766). There -1 may not necessarily be an error present mode kernel mode
are several different and relevant indication but the actual value in the previous mode kernel mode
paths through this code and we shall user space. Convince yourself that for priority 7;
trace some examples of these. the way it is used in "main", this does
not matter. 0756: The next instruction executed is
the first instruction of "trap".
This saves the processor status
Sources of Traps and Interrupts Also the code does not distinguish word two words beyond the current
between a "bus timeout error" and a "top of stack". (This is not
The discussion in Section One intro- "segmentation error". relevant here.);
duced three places where the occurrence

UNIX Operating System 10-1 The Assembler "Trap" Routine


0757: "nofault" contains the address of call (0776) routine that was interrupted;
"err" and is non-zero;
0777: Copy PS onto the stack; If the previous mode ~ user mode
0765: Moving 1 to SR0 reinitialises the it is not certain that the inter-
memory management unit; 0779: Copy rl onto the stack; rupted routine will be resumed
immediately;
0766: The contents of "nofault" are 0780: Copy the stack pointer for the
moved on top of the stack, previous address space onto the 0788: After the specialised interrupt
overwriting the previous con- stack. (This is only significant routine (in this case "clock")
tents, which was the return if the previous mode was user returns, a check ("runrun > 0)"
address in "gword"; mode) . is made to see if any process of
higher priority than the current
0767: The "rtt" returns, not to "gword" This represents a special case of process is ready to run. If the
but to the first word of "err"; the "mfpi" instruction. See the decision is to allow the current
"PDPII Processor Handbook", page process to continue, then it is
0880: "err" restores "nofault" and PS, 6-20; important that it be not inter-
skips the return to "fuiword", rupted as it restores its regis-
places -1 in r0, and returns 0781: Copy the copy of PS onto the ters prior to the "return from
directly to the calling routine. stack and mask out all but the interrupt" instruction. Hence
lower five bits. The resulting before the test, the processor
value designates the cause of the priority is raised to seven (line
interrupt (or trap). The or1g1- (787), thus ensuring that no more
Interrupts nal value of the PS had to be interrupts occur until user mode
captured quickly; is resumed. (Another interrupt
Suppose the clock has interrupted the may occur immediately thereafter,
processor. 0783: Test if the previous m~de is ker- however. )
nel or user.

Both clock vector locations, 100 and If the previous mode is kernel If "runrun > 0", then another, higher
104, have the same information. PC is mode the branch 1S taken (0784). priority, process 1S waiting. The pro-
set to the address of the location PS is changed to show the previous cessor priority is reset to 0, allowing
labelled "kwlp" (0568) and PS is set to mode as user mode (0798); any pending interrupt to be taken. A
show: call is then made to "swtch" (2178), to
0799: The specialised interrupt han- allow the higher priority process to
present mode kernel mode dling routine pointed to by r0 is proceed. When the process returns from
previous mode kernel or user mode entered. (In this case it is the "swtch", the program loops back to
priority 6 routine "clock", which is dis- repeat the test.
cussed in detail in the next
chapter.)
Note. The PS will contain the true pre- The above discussion obviously extends
vious mode, regardless of the value 0800: When the "clock" routine (or some to all interrupts. The only part which
picked up from the vector location. other interrupt handler) returns, relates specifically to the clock
the top two words of the stack interrupt is the calIon the special-
0570: The vector location contains a are deleted. These are the ised routine "clock".
new PC value which is the address masked copy of the PS and the
of the statement labelled "kwlp". copy of the stack pointer;
This instruction is a subroutine
calIon "call" via r0. 0802: rl is restored from the stack; User Program Traps

After the execution of this 0803: Delete the copy of PS from the The "system call" mechanism which
instruction, r0 is left with the stack; enables user mode programs to calIon
address of the code word after the operating system for assistance,
the instruction which contains 0804: Restore the value of r0 from the involves the execution by the user mode
" clock", i.e. r0 contains the stack; program of one of 256 versions of the
address of the address of the "trap" instruction. (The nversionn is
"clock" routine in the file 0805: Finally the "rtt" instruction the value of the low order byte of the
"clock.c" (3725). returns to the "kernel" mode instruction word.)

UNIX Operating System 19-2 The Assembler "Trap" Routine


1:'1518: Execution of the "trap" instruc- 0774: A branch is taken to the second Columns (1) and (2) define (or explain)
tion in a user mode program instruction of "call". the contents of the file "reg.h" (Sheet
causes a trap to occur to vector 26) •
location 34 which causes the PC From here the same path as for an
to be loaded with the value of interrupt is followed.
the label "trap" (lines 0512, "dev", ASp", Arlo, "nps" "r0", "pc" and
0755). A new PS is set which Ups" in that order are the names of the
indicates parameters used in the declaration of
The Kernel Stack the procedures "trap" (2693) and
present mode kernel mode "clock" (3725).
previous mode user mode The state of the kernel stack at the
priority 7 time that the "trap" procedure ("C"
version) or one of the specialised Note that just before entry to "trap"
0756: The next instruction executed is interrupt handling routines is entered, ("C" version) or the other interrupt
the first instruction of "trap". is shown in Figure 10.1. handling routines, the values for the
This saves the processor status registers r2, r3, r4 and r5 have not
word in the stack two words previous top yet been saved in the stack. This is
beyond the current "top of of stack performed by a call on "csv" (1420)
stack". which is automatically included by the
(rps 2) 7 1 ps old PS "CO compiler at the beginning of every
It is important to save the PS as 1 compiled procedure. The form of the
soon as possible, before it can (r7 1) 6 1 pc old PC (r7) calIon "csv" is equivalent to the
be changed, since it contains 1 assembler instruction
information defining the type of (r0 0) 5->1 r0 old r0
trap that occurred. The somewhat 1 jsr r5,csv
unconventional destination of the 4 1 nps new PS after
"move" is to provide compatibil- 1 trap This saves the current value of r5 on
ity with the handling of inter- (rl -2) 3 1 rl old rl the stack and replaces it by the
rupts, so that the same code can 1 address of the next instruction in the
be used further on; (r6 -3) 2 1 sp old SP for "C" procedure.
1 previous mode
0757: "nofault" will be zero so the 1 1 dev masked new PS 1421: This value of r5 is copied into
branch is not taken; 1 r0;
0->1 tpc return address
0759: The memory management status 1 in "call" 1422: the current value of the stack
registers are stored just in case ======================================= pointer is copied into r5.
they will be needed, and the (r5 -6) -1 (r5) old r5
memory management unit is reini-
tialised; (r4 -7) -2 (r4) old r4 Note that at this point, r5 points to a
stack location containing the previous
0762: A subroutine entry is made to (r3 -8) -3 (r3) old r3 value of r5 i.e. it points to the
"callI" using r0. (This neatly beginning of a chain of pointers, one
stores the old value of r0 in the (r2 -9) -4 (r2) old r2 per procedure, which "thread" the
stack, but not a return address. stack. When a "C" procedure exits, it
The new value is the address of actually returns to "cretA (1430) where
the address of the routine to be the value of r5 is used to restore the
entered next (in this case the (1) (2) (3) (4) (5) stack and r2, r3 and r4 to their ear-
"trap" routine in the file stack lier condition (i.e. as they were
"trap.c" (2693)); immediately prior to entering the pro-
Figure 10.1 cedure) • For this reason r5 is often
0772: The stack pointer is adjusted to called the environment pointer.
point to the location which
already contains the copy of PS; Columns (2) and (3) give the positions
of stack words relative to the posi-
0773: The CPU priority is set to zero; tions in the stack of the words -000-
labelled "r0" and "tpc" respectively.

UNIX Operating System 10-3 The Assembler "Trap" Routine


the display register is updated If this time has already been
(PDPll/45 and 11/70 only); counted to zero, decrement the
next time unless it is already
various accounting values such as zero also, etc. i.e. decrement
the time of day, accumulated pro- the first non-zero time in the
cessing times and execution pro- list. All the leading entries
files are maintained; with zero times represent opera-
tions which are already due. (The
processes sleeping for a fixed operations are actually carried
time interval are awakened as per out a little later.);
schedule;
3759: Examine the previous processor
core swapping activity is ini- status word, and if the priority
tiated once per second. was non-zero, bypass the next
section, which executes those
operations which are due;
"clock" breaks most of the rules for
peripheral device handlers: it does 3766: Reduce the processor priority to
reference the current "u" structure, five (other level six interrupts
CHAPTER ELEVEN and it also runs at a low priority for may now occur) ;
some of the time. It abbreviates its
Clock Interrupts activity if a previous execution has 3767: Search the "callout" array look-
not yet completed. ing for operations which are due
and execute them;
3740: "display" is a no-op on the
PDP11/40 ; 3773: Move the entries for operations
The procedure "clock" (3725) handles which are still not yet due, to
interrupts from either the line fre- 3743: The array "callout" (0265) is an the beginning of the array;
quency time clock (type KWII-L, inter- array of "NCALL" (0143) struc-
rupt vector address 100) or the pro- tures of type "callo" (0260). 3787: The code from here until line
grammable real-time clock (type KWII-P, The "callo" structure contains 3797 is executed, whatever the
interrupt vector address 104). three elements: an incremental previous processor priority, at
time, an argument and the address either priority level five or
of a function. When the function six;
UNIX requires that at least one of element is not null, the function
these should be available. (If both are is to be executed with the sup- 3788: If the previous mode was "user
present, only the line time clock is plied argument after a specified mode", then increment the user
used. ) time. time counter, and if an execution
profile is being accumulated,
(For the systems under study, the call "incupc" (13895) to make an
Whichever clock is used, interrupts are only function ever executed in entry in a histogram for the user
generated at line frequency (i.e. with this way is "ttrstrt" (8486), mode program counter (PC).
a 50 Hz power supply, every 20 mil- which is part of the teletype
liseconds). The clock interrupt prior- handler. (See Chapter 25.»; "incupc" is written in Assembler,
ity level is six, higher than for any presumably for efficiency and
other peripheral device on our typical 3748: If the first element of the list convenience. A description of
system, so that there will usually be is null, the whole list is null; what it does may be found in the
very little delay in the initiation of section "PROFIL(II)" of the UPM.
"clock" once the interrupt has been 37513: The "callout" list is arranged in See also the procedure "profil"
requested by the clock controller. the desired order of execution. (3667) ;
The time re.corded is the number
of clock ticks between events. 3792: If the previous mode was not user
Unless the first time (the time mode, increment the system (ker-
clock (3725) before the next event) is already nel) time counter for the pro-
zero, (meaning that the execution cess.
The function of "clock" is one of gen- is already due) this time should
eral housekeeping: be decremented by one.

UNIX Operating System 11-1 Clock Interrupts


The code just described performs the second activation of "clock" will treated as a positive integer in
basic time accounting for the system. not attempt to execute the code the range 0 to 255;
Every clock tick results in the incre- from line 3804 on also. Note how-
menting of either "u.u utime" or ever that to the hardware, prior- if the processor priority is
·u.u stime" for some process. Both ity one is functionally the same currently set at a depressed
·u.u-utime" and "u.u stime" are initi- as priority zero; level, recalculate it.
alised to zero in "fork" (3322). Their
values are interrogated in "wait" 3804: If the current time (measured in
(3270). The values will go negative seconds) is equal to the value Note that "p cpu" enters into the cal-
after 32K ticks (about 10 hours)! stored in "tout", wake all culation of process priorities,
processes which have elected to "p pri", by "setpri" (2156). "p_pri"
3795: "p_cpu" is used in determining suspend themselves for a period is- used by "swtch" (2209) in choosing
process priorities. It is a char- of time via the "sleep" system which process, from among those which
acter value which is always call i.e. via the procedure are in core ("SLOAD") and ready to run
interpreted as a positive integer "sslep" (5979). ("SRUN"), should next receive the CPU's
(0 to 255). When it is moved to a attention.
special register, sign extension
occurs so that 255, for instance, "tout" stores the time at which the
becomes like -1. Adding one then next process is to be awakened. I f "p time" is used to measure how long
leaves a zero result. In this there is more than one such process, (in seconds) a process has been either
case the value is reduced to -1 then the remainder, which will have in core or swapped out to disk.
again, and stored as 255 been disturbed, must reset "tout" "p- time" is set to zero by "newproc"
unsigned. Note that in the other between them. This mechanism, while (1869) , by "sched" (2047) and by
places where "p cpu" is refer- quite effective, will not be efficient "xswap" (4386) . It is used by sched."
II

enced (2161, 3814), the top eight if the number of such processes ever (1962, 2009) to determine which
bits are masked off after the becomes large. processes to swap in or out.
value has been transferred to a
special register; 3820: If the scheduler is waiting to
In this situation, a mechanism similar rearrange things, wake it up.
3797: Increment "lbolt" and if it to the "callout" array (see 3767) would Thus the normal rate for schedul-
exceeds "HZ", i.e. a second or need to be provided. (In fact, how dif- ing decisions is once per second;
more has elapsed .•. ficult would it be to merge the two
mechanisms? What would be the disadvan- 3824: If the previous mode before the
3798: Then provided the processor was tages ??); interrupt was "user mode", store
not previously running at a non- the address of "r0" in a standard
zero priority, do a whole lot of 3806: When the last two bits of place, and if a "signal" has been
housekeeping; "time[l]" are zero i.e. every received for the process, call
four seconds, reset the schedul- "psig" (4043) for the appropriate
3800: Decrement "lbolt" by "HZ"; ing flag "runrun" and wake up action.
everything waiting for a "light-
3801: Increment the time of day accumu- ning bolt". ("lbolt" represents a
lator; general event which is caused
every four seconds, to initiate timeout (3845)
3803: The events which follow may take miscellaneous housekeeping. It is
some time, but they may reason- used by "pcopen" (8648).); This procedure makes new entries in the
ably be interrupted to service "callout" array. In this system it is
other peripherals. So the proces- 3810: For all currently defined only called from the routine "ttstart"
sor priority is dropped below all processes: (8505), passing the procedure "ttrstrt"
the device priority levels i.e. (8486). Note that "ttrstrt" calls
below four. increment "p time" up to a maximum "ttstart", which may call "timeout",
of 127 (it- is only a character for a thoroughly incestuous relation-
However there is now a possibil- variable) ; ship!
ity of another clock interrupt
before this activation of the decrement "p_cpu" by "SCHMAG" Note also that most of "timeout" runs
"clock" procedure is completed. (3707) but do not allow it to go at priority level seven, to avoid clock
By setting the processor priority negative. Note that as discussed interrupts.
to one rather than to zero, a earlier (line 3795) "p_cpu" is

UNIX Operating System 11-2 Clock Interrupts


assembler "trap" routine carries out the various register values can
certain fundamental housekeeping tasks be referenced as "u.u_ar0[Rn]".);
to set up the kernel stack, so that
when this procedure is called, every- 2702: There is now a mUlti-way "switch"
thing appears to be kosher. depending on the value of "dev".

The "trap" procedure can operate as At this point we can observe that UNIX
though it had been called by another divides traps into three classes,
"CO procedure in the normal way with depending on the prior processor mode
seven parameters and the source of the trap:

dev, sp, rl, nps, r0, pc, ps. (A) kernel mode;

(B) user mode, not due to a "trap"


(There is a special consideration which instruction;
should be mentioned here in passing.
Normally all parameters passed to "C" (C) user mode, due to a "trap"
procedures are passed by value. If the instruction.
procedure subsequently changes the
values of the parameters, this will not
affect the calling procedure directly.
Kernel Mode Traps
However if "trap" or the interrupt
handlers change the values of their The trap is unexpected and with one
parameters, the new values will be exception, the reaction is to "panic".
CHAPTER TWELVE picked up and reflected back when the The code executed is the "default" of
"previous mode" registers are the "switch" statement:
Traps and System Calls restored.)
2716: Print:
The value of "dev" was obtained by cap- the current value of the seventh
turing the value of the processor kernel segment address register
This chapter is concerned with the way status word immediately after the trap (i.e. the address of the current
the system handles traps in general and and masking out all but the lower five per process data area);
system calls in particular. bits. Immediately before this, the pro-
cessor status word had been set using the address of Ups" (which is in
the prototype contained in the the kernel stack); and
There are quite a number of conditions appropriate vector location.
which can cause the processor to the trap type number;
"trap". Many of these are quite Thus if the second word of the vector
clearly error conditions, such_ as location was "br7+n~" (e.g. line 0516) 2719: "panic", with no return.
hardware or power failures, and UNIX then the value of "dev" will be n.
does not attempt any sophisticated
recovery procedures for these. Floating point operations are only used
2698: "savfp" saves the floating point by programs, and not by the operating
registers (for the PDPll/40, this system. Since such operations on the
The initial focus for our attention is is a no-op!); PDPll/45 and 11/70 are handled asyn-
the principal procedure in the file chronously, it is possible that when a
"trap.c". 2700: If the previous mode is "user floating point exception occurs, the
mode", the value of "dev" is processor may have already switched to
modified by the addition of the kernel mode to handle an interrupt.
octal value 020 (2662);
Thus a kernel mode floating point
27al: The stack address where r0 is exception trap can be expected occa-
The way that this procedure is invoked stored is noted in "u.u ar0" for sionally and is the concern of the
was explored in Chapter Ten. The future reference. (Subsequently current user program.

UNIX Operating System 12-1 Traps and System Calls


2793: Call "psignal" (3963) to set a 2810: This represents a case where Since there are many possible "ver-
flag to show that a floating operating system assistance is sions· of the "trap" instruction, the
point exception has occurred; required to extend the user mode type of assistance requested can be and
stack area. is encoded as part of the "trap"
2794: Return. instruction.
The assembler routine "backup"
This raises an interesting ques- (1012) is used to reconstruct the
tion: "Why are the kernel mode situation that existed before Parameters which are part of a system
and user mode floating point execution of the instruction that call may be passed from the user pro-
exceptions handled slightly dif- caused the trap. gram in different ways:
ferently?"
"grow" (4136) is used to do the (a) via the special register r0;
actual extension.
(b) as a set of words embedded in
User Mode Traps the program string following the
The procedure "backup" is non-trivial "trap" instruction;
Consider first of all a trap which is and its comprehension involves a care-
not generated as the result of the exe- ful consideration of various aspects of (c) as a set of words in the
cution of a "trap" instruction. This the PDPII architecture. It has been program's data area. (This is
is regarded as a probable error for left for the interested reader to pur- the "indirect" call.)
which the operating system makes no sue privately.
provision apart from the possibility of
a "core dump". However the user program As noted for the PDPll/40, "backup" may Indirect calls have a higher overhead
itself may have anticipated it and pro- not always succeed because the proces- than direct system calls. Indirect
vided for it. sor does not save enough information to calls are needed when the parameters
resolve all possibilities. are data dependent and cannot be deter-
The way this provision is made and mined at compile time.
implemented is the subject of the next 2818: Call "psignal" (3963) to set the
chapter. At this stage, the principal appropriate "signal". (Note that
requirement is to "signal" that the this statement is only reached Indirect calls may sometimes be avoided
trap has occurred. from those cases of the "switch" if there is only one data dependent
which included a "break" state- parameter, which is passed via r0. In
2721: A bus error has occurred while ment.) ; choosing which parameters should be
the system is in user mode. Set passed via r0, the system designers
"i" to the value "SIGBUS" (IH23); 2821: "issig" checks if a "signal" has have presumably been guided by their
been sent to the user program, own experience, since the pattern
2723: The "break" causes a branch out either just now or at some ear- doesn't satisfy the law of least aston-
of the "switch" statement to line lier time and has not yet been ishment.
2818; attended to;

2733: Apart from the one special case 2822: "psig" performs the appropriate The "C" compiler does not give special
noted, the treatment of illegal actions. (Both "issig" and "psig" recognition to system calls, but treats
instructions is the same at this are discussed in detail in the them in the same way as other pro-
level as for bus errors; next chapter.); cedures. When the loader comes to
resolve undetermined references, it
2739: 2823: Recalculate the priority for the satisfies these with library routines
2743: current process. which contain the actual "trap"
2747: instructions.
2796: Cf. the comment for line 2721.
2752: The error indicators are reset;
System Calls
Note that cases "4+USER" (power fail) 2754: The user mode instruction which
and "7+USER" (programmed interrupt) are User mode programs use "trap" instruc- caused the trap is retrieved and
handled by the "default" case (line tiQns as part of the "system call" all but the least significant six
2715) . mechanism to call upon the operating bits are masked off. The result
system for assistance. is used to select an entry from
the array of structures,

UNIX Operating System 12-2 Traps and System Calls


"sysent". The address of the System Call Handlers 3040: "getblk(NODEV)" results in the
selected entry is stored in allocation of a 512 byte buffer
"callp"; The full set of system calls may be from the pool of buffers. This
reviewed in the file "sysent.c" on buffer is used temporarily to
2755: The "zeroeth" system call is the Sheet 29, but more relevantly, these store in core, that information
"indirect" system call, in which are discussed in full detail in Section which is currently in the user
the parameter passed is actually II of the UPM. data area, and which is needed to
the address in the user program start the new program. Note that
data space of a system call the second argument in "u.u arg"
parameter sequence. The procedures which handle the system is a pointer to this information;
calls aL~ found mostly in the files
"sysl.c", sys2.c", sys3.c" and 3041: "access" returns a non-zero
Note the separate uses of "fuword" and "sys4.c". result if the file is not execut-
"fuiword". The distinction between able. The second condition exam-
these is unimportant on the PDPll/40, ines whether the file is a direc-
but is most important on machines with Two important "trivial" procedures are tory or a special character file.
separate "in and "d" address spaces; "nullsys" (2855) and "nosys" (2864) (It would seem that by making
which are found in the file "trap.c". this test earlier, e.g. just
2760: "i=077" simulates a calIon the after line 3036, the efficiency
very last system call (2975), of the code could be improved.);
which results in a calIon
"nosys" (2855), which results in The File 'sysl.£' 3052: Copy the set of arguments from
an error condition which will the user space into the temporary
usually be fatal for the user This file contains the procedures for buffer;
mode program; five system calls, of which three will
be considered now, and two ("rexit" and 3064: If the argument string is too
2762: "wait") will be deferred to the next large to fit in the buffer, take
2765: The number of arguments specified chapter. an error exit;
in "sysent" is the actual number
provided by the user programmer, The first procedure in this file, and 3071: If the number of characters in
or that number less one if one also the first system call we have the argument string is odd, add
argument is transferred via r0. encountered, is "exec". an extra, null character;
The arguments are copied from the
user data or instruction area 3090: The first four words (8 bytes) of
into the five element array the named file are read into
"u.u arg". (From "sysent" (Sheet exec (3020) "u.u arg". The interpretation of
29) -it would seem that four ele- these words is indicated in the
ments would have been sufficient This system call, #11, changes a pro- comment beginning on line 3076
for "u arg[ J" is this an cess executing one program into a pro- and, more fully, in the section
allowance for future inflation?); cess executing a different program. "A.DUT(V)" of the UPM.
See Section "EXEC(II)" of the UPM.
2770: The value of the first argument This is the longest and one of the most Note the setting of "u.u base",
is copied into "u.u dirp", which important system calls. "u.u count", "u.u offset" and
seems to function mainly as a "u.u-segflg" preparatory to the
convenient temporary storage 3034: "namei" (6618) (which is dis- read-operation;
location; cussed in detail in Chapter 19)
converts the first argument 3095: If the text segment is not to be
2771: "trapl" is called with the (which is a pointer to a charac- protected, add the .text area size
address of the desired system ter string defining the name of to the data area size, and set
routine. Note the comment begin- the new program) into an "inode" the former to zero;
ning on line 2828; reference. (" inodes" are essen-
tial parts of the file referenc- 3105: Check whether the program has a
2776: When an error occurs, the "c-bit" ing mechanism.); "pure" text area, but the program
in the old processor status word file has already been opened by
is set (see line ~UJOJ c:tllU t,..ll~ 3637: Wait if ~ne number of "exec"s some other program as a data
error number is returned via r0. currently under way is too large. file. If so, take the error exit;
(See the comment on line 3011.);

UNIX Operating System 12-3 Traps and System Calls


3127: When this point is reached, the 3335: For the new process, "fork" the areas which were formerly
decision to execute the new pro- returns the value of the parent's occupied by the stack.
gram is irrevocable i.e. there is process identification, and 1n1-
no longer the opportunity to tialises various accounting
return to the original program parameters; The following procedures which are also
with an error flag set; contained in "sysl.c" are described in
3344: For the parent process, "fork" Chapter 13:
3129: "expand" here actually implies a returns the value of the child's
major contraction, to the "per process identification, and skips rexit (3205) wait (3270)
process data" area only; the user mode program counter by exit (3219)
one word.
3130: "xalloc" takes care of allocating
(if necessary) and linking to the
text area; Note that the values finally returned
to a PC" program are slightly different
3158: The information stored in the from the above. Refer to the section "sys2.c" and "sys3.c" are mainly con-
buffer area is copied into the "FORK(II)" of the UPM. cerned with the file system and
stack in the user data area of input/output, and they have been
the new program; relegated to Section Four of the
operating system source code.
3186: The locations in the kernel stack sbreak (3354)
which contain copies of the "pre-
vious" values of the registers in This procedure implements system call
user mode are set to zero, except #17 which 1S described in the Section The File 'sys4.£'
for r6, the stack pointer, which "BREAK (II)" of the UPM. The comment at
was set at line 3155; the head of the procedure has confused All the procedures in this file imple-
more than one reader: clearly the iden- ment system calls. The following pro-
3194: Decrement the reference count for tifier "break" is used in pC" programs cedures are described in Chapter 13:
the "inode" structure; (leave an enclosing program loop) in an
entirely different way from that ssig (3614) kill (3630)
3195: Release the temporary buffer; intended here (change the size of the
program data area).
3196: Wake up any other process waiting The following procedures are straight-
at line 3037. forward and have been left for the
"sbreak" has clear similarities with amusement and edification of the
the procedure "grow" (4136) but unlike reader:
the latter, it is only invoked expli-
citly and may in fact cause a contrac- getswit (3413) sync (3486)
A calIon "exec" is frequently preceded tion of the data area as well as an gtime (3420) getgid (3472)
by a calIon "fork". Most of the work expansion (depending on the new desired stime (3428) getpid (3480)
for "fork" is done by "newproc" (1826), size) • setuid (3439) nice (3493)
but before the latter is called, "fork" getuid (3452) times (3656)
. makes an independent search for a slot 3364: Calculate the new size for the setgid (3460) profil (3667)
in the "procH array, and remembers the data area (in 32 word blocks);
place as "p2" (3327).
3371: Check that the new size is con-
sistent with the memory segmenta- The following procedures which are con-
"newproc" also searches "procH but tion constraints; cerned with file systems, are described
independently. Presumably it always later:
locates the same empty slot as "fork", 3376: The area is shrinking. Copy the
since it does not report the value stack area down into the former unlink (3510) chown (3575)
back. (Why is there no confusion on data area. Call "expand" to trim chdir (3538) smdate (3595)
this point?) off the excess; chmod (3560)

3386: Call "expand" to increase the -000-


total area. Copy the stack area
up into the new part, and clear

UNIX Operating System 12-4 Traps and System Calls


UNIX recognises 20 ("NSIG", line 0113) Thus if for example the programmer
different types of software interrupts, wishes to ignore software interrupts of
of which (as the reader may discover type #2 (which result if the user hits
for himself by perusal of the the Sec- the "delete" key on his terminal), he
tion "SIGNAL (II)" of the UPM) thirteen should set "u.u signal[2]" to one by
have standard names and associations. executing the s~stem call
Interrupt type #0 is interpreted as "no
interrupt". "signal (2,1);"
from his "C" program.
Within the "per process data area" of
each process is an array, "u.u signal",
of "NSIG" words. Each word corresponds
to a different software interrupt type Causation
and defines the action which should be
taken if the process encounters that An interrupt is "caused" for a process
kind of software interrupt: quite simply by setting the value of
"p sig" (0363) in the process's "proc"
entry, to the type number appropriate
u_signal[n] when interrupt #n occurs to the interrupt (i.e. a value in the
range 1 to "NSIG"-l).
zero the process will terminate
itself ; "p sig" is always directly accessible,
even when the affected process and its
odd the software interrupt is "per process data area" have been
non-zero ignored; swapped out to disk. Obviously this
mechanism only allows one interrupt per
even the value is taken as the process to be outstanding at anyone
non-zero address in user space of time. The outstanding interrupt will
a procedure which should always be the most recent one, unless
CHAPTER THIRTEEN be executed forthwith. one of the interrupts was of type #9,
which always prevails.
Software Interrupts
Interrupt type #9 ("SIGKIL") is espe-
cially distinguished because UNIX
ensures that "u.u signal[9]" remains Effect
zero until the ver~ end of a process's
The principal concern of this chapter existence, so that if a process is ever The effect of a software interrupt
is the content of the file "sig.c", interrupted for that reason, it will never takes place immediately. It may
which appears on Sheets 39 to 42. This always terminate itself. occur after only some slight delay if
file introduces a facility for communi- the affected process is currently run-
cation between processes. In particular ning, or possibly after a considerable
it provides for the course of one "user delay if the affected process is
mode" process to be interrupted, Anticipation suspended and has been swapped out.
diverted or terminated by the action of
another process or as the result of an Each process can set the contents of
error or operator action. the array "u.u_signal[]" (with the The action dictated by the interrupt is
exception of "u.u signal[9]" as just always inflicted on the affected pro-
noted) in anticipation of future inter- cess by itself, and hence can only
In this discussion the term "software rupts so that the appropriate action is occur when the affected process is
interrupt" has been deliberately used taken. The user programmer does this active.
in place of the term "signal". This via the ·signal" system call (see "SIG-
latter has been eschewed because it has NAL (II)" of the UPM).
obtained connotations in the UNIX Where the effect is to execute a user
milieu which are rather different from defined procedure, the kernel mode pro-
the usage of ordinary English. cess adjusts the user mode stack to

UNIX Operating System 13-1 Software Interrupts


make it appear that the procedure had "psignal" (3963) is called by "kill"
been entered and immediately inter- (3649) and "signal" (3955) (also "trap"
rupted (hardware style) before execut- (2793, 2818) and "pipe· (7828» to do This procedure implements the "signal"
ing the first instruction. The system the actual setting of ·p_sig". system call.
then returns from kernel mode to user
mode in the usual manner. The result 3619: If the interrupt reason is out of
of all this is that the next user mode range or is equal to "SIGKIL"
instruction which is executed is the C. Effect (9), take an error exit;
first instruction of the designated
procedure. "~~Hg" (3991) is called by "sleep" 3623: Capture the initial value in
( , 2085), "trap" (2821) and "clock" "u.u signal[a)" for return as the
(3826) to enquire whether there is an result of the system call;
outstanding non-ignorable software
Tracing interrupt for the active process "just 3624: Set the element of "u.u_signal"
waiting to happen". to the desired value
The software interrupt facility has
been extended to provide a powerful but 3625: If an interrupt for the current
somewhat inefficient mechanism whereby "~" (4043) is called whenever reason is pending, cancel it. (It
a parent process may monitor the pro- "issig" returns a non-zero result is not clear why this step should
gress of one or more child processes. (except in "sleep" where things are a be necessary or even desirable.
little more complex) to implement the Any suggestions??)
action triggered by the interrupt.

"core" (4094) is called by "psig" if a kill (3630)


core dump is indicated for a terminat-
Procedures ing process. This procedure implements the "kill"
system call to cause a specified type
Since the interrelationships of the of software interrupt to another desig-
procedures associated with software "grow" (4136) is called by "psig" to nated process.
interrupts are somewhat confusing at enlarge the user mode stack area if
first sight, it is worthwhile introduc- necessary. 3637: If "a" is non-zero, it is the
ing the procedures briefly before process identifying number of a
plunging in with both feet ••.. process to be interrupted. If
"exit" (3219) terminates the currently "a" is zero, then all processes
active process. originating from the same termi-
nal as the current process are to
~. Anticipation be interrupted;

"ssig" (3614) implements system call Q. Tracing 3639: Consider each entry in the "proc"
#48 ("signal") to set the value in one table in turn and reject it if:
element of the array "u.u_signal". "ptrace" (4164) implements the "ptrace" it is the current process (3640);
system call lt26. it is not the designated process
(3642) ;
no particular process was desig-
B. Causation "stop" (4016) is called by "issig" nated ("a" == 0) but it does not
(3999) for a process which is being have the same controlling termi-
"kill" (3630) implements system call traced to allow the supervising parent nal, or it is one of the two ini-
#~ ("kill") to cause a specified to have a "look-see". tial processes (3644);
interrupt to a process defined by its the user is not the "super user"
process identifying number. and the user identities do not
"procxmt" (4204) is a procedure called match (3646);
from "stop" (4028) whereby the child
"signal" (3949) causes a specified carries out certain operations related 3649: For any process that survives the
interrupt to be caused for all to tracing, at the behest of the above tests, call "psignal" to
processes controlled and/or initiated parent. change "p_sig".
from a specified terminal.

UNIX Operating System 13-2 Software Interrupts


signal (3949) 4006: Otherwise return a zero value. 4066: If "u.u signal[nj" is zero, then
for the interrupt types listed,
For every process, if it is controlled generate a core image via the
by the specified terminal (denoted by The comment regarding the frequency of procedure "core";
"tp"), hit it with "psignal". calls on "issig" which occurs on lines
3983 to 3985 needs some clarification. 4079: Store a value in "u.u arg[0j"
At least one calion "issig" is a part composed of the low order-byte of
of every execution of "trap" but only the remembered value of r0, and
psignal (3963) of one interrupt routine ("clock", of Un", which records the inter-
which calls "iS5ig" only once ~~~ rupt type and whether a core
3966: Reject the call if "sign is too second). In cases where "pri" is posi- image was successfully created;
large (but why not if negative?? tive, "sleep" (2073, 2085) calls
"kill" does not check this param- "issig" before and after calling 4080: Call "exit" for the process to
eter before passing it to "psig- "swtch". terminate itself.
nal". Admittedly the "kill" com-
mand could only result in a posi-
tive value for "sign ••• );
core (4094)
3971: If the current value of "p sign
is NOT set to "SIGKIL", -then This procedure is only called if This procedure copies the swappable
overwrite it (i.e. once a process "u.u signal[nj" was found by "issig" to program image into a file called "core"
has been "killed outright" there have-an even value. If this value is in the user's current directory. A
is no way to revive it.); found (4051) to be non-zero, it is detailed explanation of this procedure
taken as the address of a user mode must wait until the material of Sec-
3973: Seems to be an error here ••• for function which has to be executed. tions Three and Four, which deal with
"p stat" read "p pri" ••• improve input/output and file systems, have
the priority of the process if it 4054: Reset "u.u signal[nj" except in been covered.
is not too good; the case -where the interrupt is
for an illegal instruction or a
3975: If the process is waiting for a trace trap;
non-kernel event i.e. it called grow (4136)
"sleep" (2066) with a positive 4055: Calculate the user space
priority, then set it running addresses of the lower of two The parameter, "spa, of this procedure
again. words which are to be inserted defines the address of a word which
into the user mode stack should be included in the user mode
stack.
4056: Call "grow" to check the current
issig (3991) user mode stack size, and to 4141: If the stack already extends far
extend it (downwards!) if neces- enough, simply return with a zero
3997: If up_sign is non-zero, then ••• sary; value.
3998: If the "tracing" flag is on, call 4057: Put the values of the processor Note that this test relies on the
"stop" (this topic will be status register and the program idiosyncrasies of 2's complement
resumed later); counter which were captured at arithmetic, and if both
the time of the "trap" or
4000: Return a zero value if "p sign is hardware interrupt (in the case Ispl > 2 15
A

zero. (This apparently redundant of a "clock" interrupt) into the and


test is necessary because "stop" user stack, and update the lu.u_size * 641 > 2 l5A

may reset "p sign as a side "remembered" values of r6, r7 and


effect.); - the processor status word. Upon the decision to extend the stack
returning to user mode, execution may be taken wrongly at this
4003: If the value in the corresponding will resume at the beginning of juncture;
element of "u.u signal" is even the designated procedure. When
(may be zero) return a non-zero this procedure returns, the r~~ 4143: Calculate the stack size incre-
value; cedure which was originally ment needed to include the new
interrupted will be resumed; stack point plus a 20*32 word
margin;

UNIX Operating System 13-3 Software Interrupts


4144: Check that this value is in fact 3237: Find a suitable buffer (256 which is in the lower half of the "un
positive (i.e. we are not dealing words) and ..• structure i.e. the part that is written
with a failure of the test on to the "swap area" as a "zombie".
line 4141.); 3238: Copy the lower half of the nun
structure 1nto the buffer area;
4146: Check that the new stack size
does not conflict with the memory 3239: Write the buffer into the swap
segmentation constraints ("esta- area;
bur" sets "u.u error" if they do) For every calIon "exit", there should
and reset the segmentation regis- 3241: Enter the core space occupied by be a matching calIon "wait" by an anx-
• ter prototypes; the process into the free list . ious parent or ancestor. The principal
(This space is of course still in function of the latter procedure, which
4148: Get a new, enlarged data area, use, but the use will terminate implements the "wait" system call, is
copy the stack segments (32 words before any other process gets to for the parent or ancestor to find and
at a time) into the high end of dip into the free list again. dispose of a "zombie" child.
the new data area, and clear the This could not be done any
segments which now become the sooner, because, as will be seen
stack expansion; later, both "getblk" and "bwrite" "wait" also has a secondary function,
can call "sleep", during which to look for children which have
4156: Update the stack size, all sorts of things might happen. "stopped" for tracing (which is the
"u.u ssize" and return a "suc- In view of all this, it might be next major topic).
cessful" resul t. reasonable if the statement
"expand (USIZE);" 3277: Search the whole "procH array
were inserted after line 3226.); looking for child processes. (If
none exist, take an error exit
exit (3219) 3243: Set the process state to "zombie" (line 3317»;
(i.e. "a corpse said to be
This procedure is called when a process revived by witchcraft" (O.E.D.»; 3280: If the child is a "zombie":
is to terminate itself.
3245: The remaining code searches the save the child's process identi-
3224: Reset the "tracing" flag; "procH array to find the parent fying number, to report back to
process and to wake it up, to the parent;
3225: Set all of the values in the make any children "wards of the
array "u.u signal" (including state", and, if they have read the 256 word record back
"u.u signal[!IGKILj") to one so "stopped" for tracing, to release from the disk swap area, and
that- no future execution of them. Finally the code includes release the swap space;
"issig" will ever be followed by (for this process) a last calIon
execution of "psig"; "swtch". reinitialise the "procH array
entry;
3227: Call "closef" (6643) to close all
the files which the process has accumulate the various time
open. (For the most part, "clos- accounting entries;
ing" simply involves decrementing
a reference count.); save the "u arg[0j" value also to
Before going on to consider tracing, report back-to the parent;
3232: Reduce the reference count for there are two routines which are
the current directory; closely associated with "exit", which 3298: Finally, release the buffer area;
can be conveniently disposed of now.
3233: Sever the process's connection 3300: Is the child in a "stopped"
with any text segment; state? (If so, wait for the dis-
cussion on tracing);
3234: A place is needed to store "per rex it (3205)
process" information until the 3313: If one or more children were
parent process can look at it. A This procedure implements the "exit" found but none were "zombies" or
block (256 words) in the swap system call, #1. It simply salvages the "stopped", "sleep" and then look
area of the disk is a convenient low order byte of the user supplied again.
place; parameter and saves it in "u.u_arg[0j"

UNIX Operating System 13-4 Software Interrupts


Tracing into "ipc.ip_data")~ 4028: If the tracing flag has been
reset, or the result of the pro-
(6) the child then goes to "sleep" cedure "procxmt" is true, return
The tracing facilities are provided while the parent "wakes up"~ to "issig"~
through a modification and extension of
the software interrupt facilities. (7) the parent inspects the result, 4029: Otherwise start again.
Briefly, if a parent process is tracing as recorded in "ipc", of the
the progress of child process, every operation~
time the child process encounters a
software interrupt, the parent process (8) steps (3) to (7) may be repeated wait (327~) (continued)
is given the opportunity to intervene several times in succession.
as part of the total response to the 3301: If the child process has
interrupt. "stopped" and ••.
Finally the parent may allow the child
to continue its normal execution, pos- 3302: If the ~SWTED~ flag is not set
The parent's intervention may involve sibly without ever knowing that a (i.e. the parent hasn't noticed
interrogation of values within the software interrupt had occurred. this child lately) ••.
child process's data areas, including
the "per process data area". Subject to 3303: As an "aide-memoire" set the
certain constraints, the parent process A discussion of the tracing facility is "SWTED" flag. Set "u.u ar0[R0]",
may also change values within these contained in the Section "PTRACE (II)" "u.u_ar9[Rl]" so that the child
data areas. of the UPM. To the list of functional process status word is returned
limitations noted in the "Bugs" para- to the parent~
graph, we can add the following com-
The source of the software interrupts ments on efficiency: 3399: The "SWTED" flag was set. This
may be the parent process, the user means that the parent, by per-
himself (e.g. by entering "kill" com- There should be a mechanism for forming at least two "waits" in
mands or "delete"s through his termi- transferring large blocks (e.g. succession without any interven-
nal) or the child process itself (e.g. up to 256 words at a time) of ing calIon "ptrace", is not very
if it is prone to executing illegal information from the child to interested in the child. So
instructions or other maladies). the parent (though not neces- reset both the "STRC" and the
sarily in the reverse direc- "SWTED" flags and release the
tion) ~ child. (Note the use of "setrun"
The communication between child and (not "wak'eup") to complement the
parent processes is a kind of ritual There should be a proper coroutine call on "swtch" (4927)).
dance: procedure (analogous to "swtch")
to allow rapid transfer of con-
(1) the child experiences a software trol between child and parent.
interrupt and "stops"~ ptrace (4164)
(2) the waiting parent discovers the This procedure implements the "ptrace"
"stopped" child (line 3391), and stop (4916) system call, #26.
revives. Subsequently .••
This procedure is called by "issig" 4168: "u.u arg[2]" corresponds to the
(3) the parent may execute the (3999) i f the tracing flag ("STRC", first parameter in the "CO pro-
"ptrace" system call which has 9395) is set. gram calling sequence. If this is
the effect of leaving a request zero, a child process is asking
message in the system defined 4022: If your parent is process jll to be traced by its parent, so
structure Wipc" (3939) for the (Le. "/etc/init"), then call set the "STRC" flag and return.
child process ~ "exit" (line 4932) ~

(4) the parent then goes to "sleep" 4023: Otherwise look through "procH for Note that this code handles the on~y
while the child "wakes up"~ your parent ••• wake him up explicit action the child process lS
declare yourself ~stopped" and asked to take with respect to tracing.
(5) the child reads the message in call "swtch" (Note do NOT There is no real reason why even this
"ipc" and acts upon it (e.g. call "sleep". Why?)~ action should be taken by the child
copying one of its own values process and not by the parent process.

UNIX Operating System 13-5 Software Interrupts


From a security point of view it is 4209: If "ipc.ip lock" is set wrongly
most probably desirable that a child for the current process, then
process should only be traceable if it certainly the rest of "ipc"
gives its permission. On the other should be ignored.
hand, if the child asks to be traced
and is then ignored by the parent, the
child process may be blocked indefin- After "stop" (4027) calls "swtch", the
itely. Perhaps the best solution would child process is restarted by one of
be for the "STRC" flag to be set only three calls on "setrun" which leave the
after explicit action by both the "STRC" and "SWTED" flags in the state
parent and the child. ---- indicated:

4172: Search the "proch table looking STRC SWTED ipc.ip_lock


for a process which: --- -- ----
is stopped;
matches the given process identi- exit (3254) set set arbitrary
fying number; wait (3310) reset reset arbitrary
is a child of the current pro- ptrace (4188) set reset properly set
cess;
4181: Wait for the "ipc" structure to In the third case "ptrace" will always
become available if it is set "ipc.ip lock" properly, before the
currently in use; child is restarted, so that there is
then no chance of the test on 4209
4183: Copy the parameters into "ipc" failing.

4187: reset the "SWTED" flag, and ••• In the second case, where the parent
has ignored the child, "procxmt"_ will _
4188: return the child to a "ready to never in fact be called.
run" state;
4189: Sleep until "ipc.ip req" is non- By executing the statement "return
positive (4212); (0);" on line 4210, "procxmt" forces
"stop" to loop back to line 4020. In
4191: Extract a value that is to be the case where the parent has already
returned to the parent process, died, the test on line 4022 will then
check for errors, unlock "ipc" fail, and a calion "exit" (4032) will
and "wake up" any processes wait- resul t.
ing for "ipc".
4211: Store the value of "ipc.ip req"
before resetting the latter,
Note that the "sleeps" on lines 4182, "wake up" the parent, and select
4190 are for essentially different rea- the next action as indicated.
sons, and could be differentiated to
good effect by replacing "&ipc" by
"&ipc.ip_req" on lines 4190 and 4213. The various actions are adequately
explained in Section "PTRACE (II)" of
the UPM, with the one qualification
that cases 1, 2 and 4, 5 are documented
procxmt (4204) the wrong way around (i.e. "I" and "0"
spaces respectively, not "0" and "I"!).
This procedure is executed by the child
process under the influence of data
left by the parent in the "ipc" struc-
ture. -000-

Software Interrupts
UNIX Operating System 13-6
Most of the decisions regarding "swap-
ping out", and all the decisions
regarding "swapping in", are made by
the procedure "sched". "swapping in" is
handled by a direct call (2034) on the
procedure "swap" (5196), whereas "swap-
ping out" is handled by a call (2024)
on "xswap" (4368).
Section Three is concerned with basic
input/output operations between the For those archaeologists who like to
main memory and disk storage. ponder the "bones" of earlier versions
of operating systems, it seems that
originally "sched" called "swap"
These operations are fundamental to the directly to "swap out" processes,
activities of program swapping and the rather than via "xswap". The extra pro-
creation and referencing of disk files. cedure (one of several to be found in
the file "text.c") has been necessi-
tated by the implementation of the
This section also introduces procedures sharable "text segments".
for the use and manipulation of the
large (512 byte) buffers.
It is instructive to estimate how much
extra code has been necessitated by the
text segment feature: in "text.c" are
four procedures "xswap", "xalloc",
"xfree" and "xccdec", which manipulate
an array of structures called "text",
which is declared in the file "text.h".
Additional code has also been added to
"sysl.c" and "slp.c".
CHAPTER FOURTEEN
Program Swapping
Text Segments
Text segments are segments which con-
tain only "pure" code and data i.e.
UNIX, like all time-sharing systems, code and data which remain unaltered
and some multiprogramming systems uses throughout the program execution, so
"program swapping" (also called "roll- that they may be shared amongst several
in/roll-out") to share the limited processes executing the same program.
resource of the main physical memory
among several processes.
The resulting economies in space can be
quite substantial when many users of
Processes which are suspended may be the system are executing the same pro-
selectively "swapped out" by writing gram simultaneously e.g. the editor or
their data segments (including the "per the "shell".
process data") into a "swap area" on
disk
Information about text segments must be
stored in a central location, and hence
The main memory area which was occupied the existence of the "text" array. Each
can then be reassigned to other program which shares a text segment
processes, which quite probably will be keeps a pointer to the corresponding
"swapped in" from the "swap area". text array element in "u.u_textp".

UNIX Operating System 14-1 Program Swapping


The text segment is stored at the be changed by the efflux ion of 2005: Search for the process which is
beginning of the code file. The first time as measured by "clock" or by loaded, but is not the scheduler
program to begin execution causes a a call to "sleep". or locked, whose state is "SRUN"
copy of the text segment to be made in or "SSLEEP" (i .e. ready to run,
the "swap" area. When either of these situations ter- or waiting for an event of high
minate: precedence) and which has been in
main memory for the longest time;
When subsequently no programs are left 1958: With the processor running at
which reference the text segment, the priority six, so that the clock 2013: If the process image to be
resources absorbed by the text segment can't interrupt and change values swapped out has been in main
are released. The main memory resource of "p time", a search is made for memory for less than 2 seconds,
is released whenever there are no pro- the process which is ready to run then situation B holds.
grams which reference the text segment and has been swapped out for the
currently in main memory; the "swap" longest time; The constant "2" here (also the
area is released in general whenever "3" on line 2(03) is somewhat
there are no programs left running 1966: If there is no such process then arbitrary. For some reason the
which reference the text segment. situation A holds; programmer has departed from his
usual practice of naming such
1976: Search for a main memory area of constants to emphasise their ori-
The numbers in each of these states are adequate size to hold the data gins;
denoted by "x ccount" and "x count" segment. If an associated text
respectively. Decrementing these segment must be present also but 2022: The process image is flagged as
numbers is handled by the routines is not currently in main memory, not loaded and is swapped out
"xccdec" and "xfree" which also take the area is increased by the size using "xswap" (4368).
care of releasing resources when the of the text segment;
counts reach zero. ("xccdec" is called Note that the "SSWAP" flag is not
whenever a program is swapped out or 1982: If an area of adequate size is set here because the process
terminates. "xfree" is called by "exit" available the program branches to swapped out is not the current
whenever a program terminates.) "found2" (2031). (Note that the process. (Cf. lines 1907, 2286);
program does not handle the case
where there is sufficient space 2032: Read the text segment into main
for both text and data segments memory if necessary. Note that
sched (1940) but in distinct areas of main the arguments for the "swap" pro-
memory. Would it be worth while cedure are:
Process 10 executes "sched". When it is to extend the code to cover this
not waiting for the completion of an possibility?) ; an address within the swap area
input/output operation that it has ini- of the disk;
tiated, it spends most of its time 1990: Search for a process which is in
waiting in one of the following situa- main memory, but which is not the a main memory address (ordinal
tions: scheduler or locked (i.e. already number of a 32 word block);
being swapped out), and whose
A. (runout) state is "SWAIT" or "SSTOP" (but a size (number of 32 word blocks
None of the p~ocesses which are not "SSLEEP") (i.e. the process to be transferred);
swapped out 1S ready to run, so rs-waiting for an event of low
that there is nothing to do. The precedence, or has stopped during a direction indicator
situation may be changed by a call tracing (see Chapter Thirteen)). ("B_READ==I" denotes "disk to
to "wakeup", or to "xswap" called If such a process is found, go to main memory");
by either "newproc" or "expand". line 2021, to swap the image out.
2042: Swap in the data segment and ...
B. (runin) Note that there seems to be a
There--IS at least one process bias here against processes whose 2044: Release the disk swap area to the
swapped out and ready to run, but "procH entries are early in the available list, record the main
it hasn't been out more than 3 "procH array; memory address, set the "SLOAD"
seconds and/or none of the flag and reset the accumulated
processes presently in main memory 2003: If the image to be swapped in has time indicator.
is inactive or has been there more been out less than 3 seconds,
than 2 seconds. The situation may then situation B holds;

UNIX Operating System 14-2 program Swapping


4439: If there is no text segment, segment is not in main memory,
return immediately; get back into step by "swapping
4373: If ·oldsize" data was not sup- out" the data segment to disk.
plied, use the current size of 4441: Look through the "text" array for
the data segment stored in "un; both an unused entry and an entry
for the text segment. If the It will be noted that the code to han-
4375: Find a space in the disk swap latter can be found, do the book- dle text segments is very conservative
area for the process's data seg- keeping and go to "out" (4474); whenever the situation starts to get
ment. (Note that the disk swap complicated. For example, the "panic"
area is allocated in terms of 512 4452: Arrange to copy the text segment (4451) when no more text entries are
character blocks); into the disk swap area. Initi- available would seem to be a rather
alise the unused text entry, and extreme reaction. However the strategy
4378: "xccdec" (4490) is called (uncon- get space in the disk swap area; of being generous with "text" array
ditionally!) to decrease the space is quite likely to be less expen-
...... 11 n; on
count, ao~cciatcd with the text 1159: Change the space 1"\1" hu
................ l:'--- ......... ..1. the sive than the code needed to do
segment, of the number of "in process to one large enough to "better". What do you think?
main memory" processes which contain the "per process data"
reference that text segment. If area and the text segment;
the count becomes zero, the main
memory area occupied by the text 4460: The calion "estabur" is neces- xfree (4398)
segment is simply returned to the sary to set the user mode segmen-
available space. (There is no tation registers before reading "xfree" is called by "exit" (3233),
need to copy it out, since, as we the code file; when a process is being terminated, and
shall see, there will be a copy by "exec" (3128), when a process is
already in the disk swap area); 4461: A UNIX process can only initiate being transmogrified.
one input/output operation at a
4379: The "SLOCK" flag is set while the time. Hence it is possible to 4402: Set the text pointer in the
process is being swapped out. store i/o parameters at standard "procH entry to "NULL";
This is to prevent "sched" from locations in the "un structure,
attempting to "swap out" a pro- viz. "u.u count", "u.u offset []" 4403: Decrement the main memory count
cess which is already in the pro- and "u.uJ:iase"; and if it is now zero •..
cess of being "swapped out".
(This can only happen if "swap- 4462: The octal value 020 (decimal 16) 4406: and if the text segment has not
ping out" was started initially is an offset into the code file; been flagged to be saved, •••
by some routine other than
"sched" e.g. by "expand"); 4463: Information is to be read into 4408: Abandon the image of the text
the area beginning at location segment in the disk swap area;
4382: The main memory image is released zero in the user address space;
except when "xswap" is called by 4411: Call "iput" (7344) to decrement
"newproc"; 4464: Read the text segment part of the the "inode" reference count and
code file into the current data if necessary delete it.
4388: If "runout" is set, "sched" is segment;
waiting for something to "swap
in", so wake it up. 4467: "Swap out" the data segment "ISVTX" (5695) is a mask which defines
(minus the "per process data") the "sticky bit" mentioned in section
into the disk swap area reserved "CHMOD(I)" of the UPM. If this bit is
for the text segment; set, the disk copy of the text segment
xalloc (4433) is allowed to remain in the disk swap
4473: "Shrink" the data segment - it is area even when no programs are running
"xalloc" is called by "exec" (3131!l), about to be swapped out; which reference it, in the expectation
when a new program 1S being initiated, that it will be required again shortly.
to handle the allocation of, or linking 4475: "sched" always "swaps in" the This is an efficient device for com-
to, the text segment. The argument, text segment before the data seg- monly used programs such as the "shell"
nip", is a pointer to the "mode" of the ment i.e. there is no mechanism or the editor.
code file. At the time of this call, for bringing the text segment
"u.u arg[l]" contains the text segment into main memory once the data -000-
size-in bytes. segment is present. If the text

UNIX Operating System 14-3 Program Swapping


The structure "buf" is possibly The "devtab" structure contains some
misnamed because it is in fact a buffer status information for the the device
header (or buffer control block). The and serves as a list head for:
buffer areas proper are allocated
separately and declared (4720) as (a) the list of buffers associated
with the device, and simultane-
"char buffers [NBUF] [514];" ously on the "av"-list;

(b) the list of outstanding i/o


Pointers from the "buf" array to the requests for the device.
• "buffers" array are set up by the pro-
cedure "binit".

The File 'conf.h'


---
Other instances of the structure nbuf"
are declared as "swbuf n (4721) and The file "conf.h" declares:
"rrkbuf" (5387). No 514 character
buffer areas are associated with yet another way to dissect an
"bfreelist n or "swbuf n or "rrkbuf". integer into two parts ("d minor"
and "d major"). Note - that
"d major"- corresponds to "hibyte"
The "buf" structure may be divided into (0180);
three parts:
two arrays of structures;
(a) flags: These convey status infor-
mation and are contained within two integer variables, "nblkdev"
a single word. Masks for set- and "nchrdev".
ting these flags are defined as
"B WRITE", liB READ" etc. in
lines 4572 to 4586. The two arrays of structures, "bdevsw"
and "cdevsw", are declared but not
(b) list pointer: Forward and back- dimensioned or initialised in "conf.h".
ward pointers for two doubly The initialisation of these arrays is
CHAPTER FIFTEEN linked lists, which we shall performed in the file "conf.c".
refer to as the "b"-list and the
Introduction to Basic I/O "av"-list.

(c) i/Q parameters: A set of values The ---


-- file 'conf.c'
----
associated with the actual data
transfer. This file, along with "low.s", is gen-
There are three files whose contents erated individually at each installa-
need to be thoroughly absorbed before tion (to reflect the set of peripherals
the subject of UNIX input/output is actually installed) by the program
broached in detail. devtab (4551) "mkconf". (In our case, "conf.c"
reflects the representative devices for
The "devtab" structure has five words, our model system.)
the last four of which are forward and
The File 'buf.h' backward pointers.
This file initialises the following:
This file declares two structures
called "buf" (4520) and "devtab" One instance of "devtab" is declared bdevsw (4656) swapdev (4696)
(4551) . Instances of the structure within the device handler for each cdevsw (4669) swplo (4697)
"buf" are declared as "bfreelist" block type of peripheral device. For rootdev (4695) nswap (4698)
(4567) and as the array "buf" (!) our model system the only block device
(4535) with "NBUF" elements. is the RK05 disk, and "rktab" is
declared as a "devtab" structure at
line 5386.

UNIX Operating System 15-1 Introduction to Basic I/O


System generation process must wait via a calIon 5213: Explain why this calIon ·sp16"
"sleep". is necessary 1
System generation at a UNIX installa-
tion consists mainly of: Note that the code loop on lines 5214: wait until the i/o operation is
5202 to 5205 runs at priority complete. Note that the first
running "mkconf" with appropriate level six, i.e. one higher than parameter to ·sleep" is in effect
inputl the disk interrupt priority. the address of "swbuf"1
recompiling the output files (created Can you see why this is neces- 5216: Wakeup those processes (if any)
as "c.c" and "l.s") 1 sary? Under what conditions will which are waiting for "swbuf"1
the "B_BUSY" flag be set?
reloading the system with the revised 5218: Reset the process or priority to
object files. 5206: The flags are set to reflect: zero, thus allowing any pending
interrupts to "happen"1
"swbuf" is in use ("B BUSY") 1
This process only takes a few minutes 5219: Reset both the "B BUSY" and
(not the several hours of some other physical i/o implying a large "B WANTED" flags.
operating systems). Note that "bdevsw" transfer direct to/from the user
and "cdevsw" are defined differently in data segment ("B_PHYS")1
"conf.c" from elsewhere, namely as a
one dimensional array of pointers to whether the operation is read or Race Conditions
functions which return integer values. write. ("rdflg" is a parameter to
This quietly ignores the fact that, for "swap") 1 The code for "swap" has a number of
example, "rktab" is not a function, and interesting features. In particular it
relies on the linking program not to 5207: The "b dev" field is initialised. displays in microcosm the problems of
enquire too closely into the nature of (Presumably this could have been race conditions when several processes
the work which it is performing. performed once during initialisa- are running together.
tion rather than every time
"swbuf" is used, i.e. in
"binit".)1 Consider the following scenario:

5208: "b wcount" is initialised. Note


Before plunging into all the detail of the negative value and the effec- No swapping is taking place when pro-
the file "bio.c", it will be instruc- tive multiplication by 321 cess A initiates a swapping operation.
tive as well as convenient to examine Denoting "swbuf.b flags" by simply
one routine which was introduced ear- 5210: The hardware device controller "flags", we have i~itially
lier, namely "swap". requires a full physical address
(18 bits on the PDP/11-40). The flags == null
block number of a 32 word block
The buffer head "swbuf" was declared to must be converted into two parts: Process A is not delayed at line 5204,
control swapping input/output, which the low order ten bits are initiates its i/o operation and goes to
must share access to the disk with shifted left six places and sleep at line 5215. We now have
other activity. No element of "buffers" stored as "b addr", and the
is associated with "swbuf". Instead the rema~n~ng six high order bits as flags == B_BUSY I B PHYS rdflg
core area occupied (or to be occupied) "b xmem". (On the PDP 11/40 and
by the program serves as the data 11745 only two of these bits are which was set at line 5206.
buffer. significant.) 1
5200: The address of the flags in 5212: A mouthful at first glance! Shift Suppose now while the i/o operation is
"swbuf" is transferred to the "swapdev" eight places to the proceeding, process B also initiates a
register variable "fp" for con- right to obtain the major device swapping operation. It too begins to
venience and economy; number. Use the result to index execute "swap", but finds the "B BUSY"
"bdevsw". From the structure flag set, so it sets the "B WANTED"
5202: The "B BUSY" flag is tested, and thus selected, extract the stra- flag (5203) and goes to sleep also
if i t is on, a swap operation is tegy routine and execute it with (5204). We now have
already under way, so that the the address of "swbuf" passed as
"B WANTED" flag is set and the a parameter;

UNIX Operating System 15-2 Introduction to Basic I/O


flags == B BUSY B PHYS rdflg "sleep (fp, PSWP-l);" Additional Reading
S WANTED
would cost virtually nothing and ensure The article "The UNIX I/O System" by
that Case (b) never occurred! Dennis Ritchie is highly pertinent.
At last the i/o operation completes.
Process C takes the interrupt and exe-
cutes "rkintr", which calls (5471) The necessity for the raising of pro-
"iodone" which calls (5301) "wakeup" to cessor priority at various points -000-
awaken process A and process B. should be studied: for example if line
"iodone" also sets the HS DONE" flag 5201 was omitted and if process B had
and resets theUS_WANTED" flag so that just completed line 5203 when the "i/o
complete" interrupt occurred for Pro-
flags == B BUSY B PHYS rdflg cess A's operation, then "iodone" would
B DONE turn off "B WANTED" and perform
"wakeup" before-process B went to sleep
... forever! A bad scene.
What happens next depends on the order
in which process A and process Bare
reactivated. (Since they both have the
same priority, "PSWP", it is a toss-up Reentrancy
which goes first.)
Note also the assumption made above,
that both process A and process B could
Case (~): Process A goes first. execute "swap" simultaneously. All UNIX
"B_DONE" is set so no more sleeping is procedures are in general Pre-entrant"
needed. "B WANTED" is reset so there is (which means multiple simultaneous exe-
no one to "wakeup". Process A tidies up cutions are possible). How would UNIX
(5219), and leaves "swap" with have to change if re-entrancy were not
allowed?
flags B PHYS I rdflg B DONE

Process B now runs and is able to ini- For


- --the
- Uninitiated
tiate its i/o operation without further
delay. we can now return to complete an inves-
tigation started in Chapter Eight con-
cerning "aretu" and "u.u ssav":
Case (b): Process B goes first. It
finds -"B_BUSY" on, so it turns the
"B WANTED" flag back on, and goes to After setting "u.u ssav" (2284),
sleep again, leaving "expand" calls (2285) "xswap",
which calls (4380) "swap",
flags == B BUSY I B PHYS rdflg which calls (5215) "sleep",
B_DONE I B_WANTED which calls (2"'84) "swtch",
which resets "u.u_rsav" (2189).

Process A starts again as in Case (a),


but this time finds "B WANTED" on so it Thus in fact "u.u rsav" finally gets
must call "wakeup" (5217) in addition reset to a value appropriate to four
to its other chores. Process B finally procedure calls deeper than that for
wakes again and the whole chain com- "u.u ssav".
pletes.

Case (b) is obviously much less effi-


cient than case (a). It would seem that
a simple change to line 5215 to read

UNIX Operating System 15-3 Introduction to Basic I/O


A requirement for more than eight RKDS. When an error occurs, UNIX simply
drives would require an additional con- calls ndeverror n (2447) to display RKER
troller with a different set of UNIBUS and RKDS on the system console, without
addresses. Also the code in the file any attempt at analysis. An operation
"rk.c" would have to be modified to is repeated up to ten times before an
handle the case of two or more con- error is reported by the device driver.
trollers. This case is most unlikely
because requirements for large amounts
of on-line disk storage will be more The register formats which are
economical Iv provided otherwise e.g. described fully in the "PDPll Peri-
by the RPa4-disk system. pherals Handbook n are reflected in the
program code at several points. The
following summaries suffice to describe
the features used by UNIX:
Cartridge capacity: 1,228,8AA words
(4800 512 byte records)
Surfaces/cartridge: 2
Tracks/surface: 2a0(plus 3 spare) Control Status Register (RKCS)
Sectors/Track: 12
Words/Sector: 256 bit description
Recording density: 2040 bpi maximum
Rotation speed: 1500 rpm 15 Set when any bit of RKER (the
Half revolution: 20 msecs Error Register) is set;
Track positioning:
10 msecs (one track) 7 Set when the control is no
50 msecs (average) longer engaged in actively exe-
85 msecs (worst case) cuting a function and is ready
Interrupt Vector Address: 220 to accept a command;
Priority Level: 5
6 When set, the control will issue
unibus Register Addresses an interrupt to vector address
Drive Status RKDS 777400 220 upon operation completion or
Error RKER 777402 error;
Control Status RKCS 777404
Word Count RKWC 777406 5-4 Memory Extension. The two most
Current bus address RKBA 777410 significant bits of the 18 bit
Disk address RKDA 777412 physical bus address. (The other
CHAPTER SIXTEEN Data Buffer RKDB 777416 16 bits are recorded in RKBA.);
The RK Disk Driver RK -
Vital Statistics 3-1 Function to be performed:
- --
CONTROL RESET 000
The average total access time is 70 WRITE 001
milliseconds. With multi-drive subsys- READ 010
The RK disk storage system employs a tems, seeking by one drive may be over- etc. ,
removable disk cartridge containing a lapped with reading or writing by
single disk, which is mounted inside a another drive. However this feature is Initiate the function designated
drive with moving read/write heads. not used by UNIX because of bugs which by bits 1 to 3 when set. (write
existed at one time in the hardware only) ;
controller.
The device designated RKll-D consists
of a disk controller together with a
single drive. Additional drives, desig- Word Count Register (RKWC)
nated RKa5, up to a total of seven, may In initiating a data transfer, RKDA,
be added to a single RKll-D. RKBA and RKWC are set, and then RKCS is Contains the twos complement of the
set. Upon completion, status informa- number of words to be transferred.
tion is available in RKCS, RKER and

UNIX Operating System 16-1 The RK Disk Driver


Disk Address Register (RKDA) the address of the RKDA disk rkintr (5451)
address register. (The value
bit description passed is in effect 0177412. See This procedure is invoked to handle the
lines 5363, 5382.); interrupts which occur when RK disk
15-13 Drive number (0 to 7) operations are completed.
12-5 Cylinder number (0 to 199) a "disk address" computed by
4 Surface number (0,1) "rkaddr"; 5455: Check for a false alarm!
3-0 Sector address (0 to 11)
zero (not really important in our 5459: Inspect the error bit; if set ...
discussion, and may be ignored).
5460: Call Rdeverror" (2447) to display
The file 'rk • .£' a message on the system console
terminal;
This file contains the code which is
specific to the RK disk system, i.e. rkaddr (5420) 5461: Clear the internal registers of
which is the RK "device driver". the disk controller and ...
The code in this procedure incorporates
a special feature for files which 5462: wait till this is completed (usu-
extend over more than one disk drive. ally a few microseconds) ;
rkstrategy (5389) This feature is described in the UPM
Section "RK(IV)". Its usefulness seems 5463: If the operation has been retried
The strategy routine is called, e.g. to be restricted. less than ten times, call
from "swap" (5212), to handle both read "rkstart" to try again. Otherwise
and write requests. give up and report an error;
The value returned by "rkaddr" is for-
5397: The test and calIon "mapalloc" matted for direct transmission to the 5469: Set the "retry" (!) count back to
here is a "no-op" except on the control register, RKDA. zero, remove the current opera-
PDPll/70 system; tion from the "actf" list, and
complete the operation by calling
5399: The code from here to line 5402 "iodone";
appears to be unnecessarily devi- devstart (5096)
ous! See the discussion of 5472: "rkstart" is called uncondition-
"rkaddr" 6elow. If the block This procedure when called for the RK ally here. If the call is not
number is too large, set the disk loads appropriate values into the necessary (because the "actf"
"B ERROR" flag and report "com- registers RKDA, RKBA, RKWC and RKCS in list IS empty) "rkstart" will
pletion"; succession. Only the last value needs return immediately (5444).
to be computed at this stage.
5407: Link the buffer into a FIFO list
for the controller. The list is
singly linked, uses the "av forw" The calculation, though messy in iodone (5018)
pointer of the "buf" structures, appearance, is straight forward. Note
and has head and tail pointers in that "hbcom" is zero and "rbp->b xmem" This routine is primarily concerned
"rktab". Interrupts from disk contains the two high order bits of the with the return of resources when a
devices may not be allowed after physical core address. The loading of block i/o operation has completed. It:
the first step; RKCS initialises the disk controller
i.e. the operation is now entirely frees up the Unibus map (for II/70'S,
5414: If the RK controller is not under the control of the hardware. if appropriate);
currently active, wake it up via
a calIon "rkstart" (5440), which sets the "B_DONE" flag;
checks that there is something to "devstart" returns to "rkstart" (5448),
do (5444), flags the controller which returns to "rkstrategy" (5416). releases the buffer if the i/o was
as busy (5446) and calls which resets the processor priority and asynchronous, or else resets the
"devstart" (5447), passing as returns to "swap" (5213), which ... "B WANTED" flag and wakes up any
parameters: process waiting for the i/o
operation to complete.
a pointer to the first enqueued
buffer header; -000-

UNIX Operating System 16-2 The RK Disk Driver


the av -list is a list of buffers Finally when programs are terminated
whic~ may--be detached from their and files are closed, the problems of
current use and converted to an ensuring that the program's buffers are
alternate use. flushed properly (problems which have
plagued other operating systems) have
largely disappeared.
Both the "av"-list and the various
"b"-lists are doubly linked to facili- There is one area of practical concern:
tate insertion and deletion at any if the decision "when to write" is left
point. to the operating system alone, then
some buffers may not be written out for
a very long time. Accordingly there is
a utility program which runs twice per
minute and forces all such buffers to
be written out unconditionally. This
If a buffer is withdrawn temporarily limits the likely amount of damage that
from the "av"-list, then its "B BUSY" a sudden system crash may cause.
flag is raised.
If the contents of a buffer correctly
reflect the information that is or clrbuf (5038)
should be stored on disk, then the
"B_DONE" flag is raised. This routine zeros out the first 256
words (512 bytes) of the buffer. Note
I f the "B DELWRI" flag is raised, the that the parameter passed to "clrbuf"
contents -of the buffer are more up to is the address of the buffer header.
date than the contents of the "clrbuf" is called by "alloc" (6982).
corresponding disk block, and hence the
buffer must be written out before it
CHAPTER SEVENTEEN can be reassigned.
incore (4899)
Buffer Manipulation
This routine searches for a buffer that
~ Cache-like Memory is already assigned to a particular
(device, block number) pair. It
It will be seen that the large buffers searches the circular "b"-list whose
In this chapter we look at the file in UNIX are manipulated in a way which head is the "devtab" structure for the
"bio.c" in detail. It contains most of is analogous to the operation of a device ~. If a buffer is found, the
the basic routines used to manipulate hardware cache attached to the main address of the buffer header is
buffer headers and buffers (4535, memory of a computer e.g. the PDPll/70. returned. "incore" is called by
4720) . "breada" (4780, 4788).
Buffers are not assigned to any partic-
ular program or file, except for very
Individual buffer headers are tagged by short intervals at a time. In this way
a device number"b dev", (4527) and a a relatively small number of buffers getblk (4921)
block number "b blkno", (4531) • (Note can be shared effectively amongst a
the way in which the latter is declared large number of programs and files. This routine performs the same search
as an unsigned integer.) as "incore" but goes further in that if
Information is left in the buffers the initial search is unsuccessful, a
until the buffer is needed i.e. immedi- buffer is allocated from the "av"-list
Buffer headers may be linked simultane- ate "write through" is avoided if only (available list).
ously into two lists: part of the buffer has recently been
changed. Programs which read or write
the b -lists are lists, one per records which are small compared with By a cal~ on "notavail n (4999), the
device controller, whlcn link the buffer size are then not penalised Dutter 1S removed from the "avO-list
together buffers associated with unduly. and flagged as "B_BUSY".
that device type;

UNIX Operating System 17-1 Buffer Manipulation


"getb1k n is more SUSP1C10US of its brelse (4869) Note that buffer headers are removed
parameters than nincore". It is called from the navn-list by "notavail" and
by This procedure takes the buffer passed are returned by "brelse n • Buffer
as a parameter and links it back into headers are moved from one AbA-list to
exec (3041!!) wr i tei (6304) the navn-list. another by "getblk".
exit (3237) iinit (6928)
bread (4758) alloc (6981) Any process which is either waiting for
breada (4781,4789) free (7016) the particular buffer or any available
smount (6123) update (7216) buffer is woken up. bini t (5055)
This procedure is called by nmain"
Note however that since both nsleeps" (1614) to initialise the buffer pool.
4940: At this point the required buffer (4943, 4955) are at the same priority, Empty, doubly linked circular lists are
has been located by searching the if two processes are waiting - one for set up:
"b"-list. Either it is nB BUSY" the particular buffer and one for any
in which case a "sleep" must be buffer - it will be a toss-up which for the "avn-list ("bfreelist" is
taken (4943), or else it is will get it. head) ;
appropriated (4948);
the "b"-list for null devices ("dev
4953: If the required buffer has not By glvlng the first priority over the == NODEV") ("bfreelist" is again
been located, and if the second (e.g. by biasing by one) the head) ;
"av"-list is empty, set the race should be resolved more satisfac-
"B WANTED" flag for the "av"-list torily. The disadvantage of such a a "b"-list for each major device
and go to "sleep" (4955); change might be that it could lead to a type.
deadlock situation in certain rather
4960: If the "av"-list is not empty, peculiar circumstances.
select the first member, and if For each buffer:
it represents a "delayed write" If an error has occurred e.g. upon
arrange to have it written out reading information into the buffer, the buffer header is linked into the
asynchronously (4962); the information in the buffer may be "b"-list for the device "NODEV"
incorrect. The assignment on line 4883 (-1) ;
4966: "B RELOC" is a relic! (See 4583); ensures that the information in the
buffer will not be mistakenly retrieved the add~ess of the buffer is set in
4967: The code from here until 4973 subsequently. The "B ERROR" flag is the header (5067);
unconditionally removes the set e.g. by "rkstrategy" (5403) and
buffer from the "b"-list for its "rkintr" (5467). the buffer flags are set as "B BUSY"
current device type and reinserts (this doesn't seem to be really
it into the "bn-list for the new To see how this could occur, consider necessary) (5072);
device type. Since this will fre- what happens to a buffer when a disk
quently be a "no-op" i.e. the new i/o operation is completed: the buffer header is linked into the
and old device type will be the "av"-list by a calIon "brelse"
same, it would seem desirable to 5471 "rkintr" calls "iodone"; (5073) ;
insert a test 5026 "iodone" sets the"B DONE" flag;
if (bp->b dev == dev) 5028 "iodone n calls "breIse";
before executing lines 4967 to 4887 "brelse" resets the "B WANTED", The number of block devices is recorded
4974. "B BUSY" and "B ASYNC" flags as "nblkdev". This is used for checking
but not the "B DONE" flag; values for "dev" in "getblk" (4927),
Note the special handling for "getmdev" (6192) and "openi" (6720).
calls where "dev == NODEV" (-1). Inspection of "bdevsw" (4656) shows
(Such calls incidentally are made that "nblkdev" will be set to eight
without a second parameter - tut! 4948 "getblk n finds the buffer and whereas the value one is what is really
tut! See e.g. 3040). calls "notavail"; required.
5010 "notavail" sets the "B BUSY"
flag; - This result could be obtained by "edit-
"bfreelist" serves as the "devtab" 4759 "bread n (which called "getblk") ing" as follows:
structure for the "b"-list for "NODEV". finds the "B DONE" flag set /5084/m/5081/ "nblkdev=i;
and exits. - /5083/m/5077/ "i++

UNIX Operating System 17-2 Buffer Manipulation


bread (4754) bwrite (4809) bflush (5229)
This is the standard procedure for This is the standard procedure for This procedure is called by "update"
reading from block devices. It is writing to block devices. It is called (7201), which is called by "panic"
called by: by "exit" (3239), "bawrite" (4863), (2420), "sync" (3489) and "sumount"
"getblk" (4963) , "bflush" (5241) , (6150) •
wait (3282) iinit (6927) "free" (7021), "update" (7221) and
breada (4799) alloc (6973) "iupdat" (7400). N.B. "writei" calls "bflush" searches the "av"-list for
statl (6051) ialloc (7097) "bawrite" (6310) 1 "delayed write" blocks and forces them
smount (6116) iget (7319) to be written out asynchronously.
readi (6258) iupdat (7386) 4820: If the "B ASYNC" flag is not set,
writei (6305) itrunc (7426,7431) the procedure does not return Note that as "notavail" adjusts the
bmap (6472,6488) namei (7625) until the i/o operation is com- links of the "av"-list, the search
pleted; (which runs at processor priority six)
is reinitiated after each "delayed
"getblk" finds a buffer. If the 4823: If the "B ASYNC" flag is set, but write" block is encountered.
"B DONE" flag is set no i/o is needed. "B DELWRI1I" was not set (note
"flag" is set----at line 4816) call Note also that since it happens that
"geterror" (5336) to check on the "bflush" is only called by "update"
error flag. (If "B DELWRI" was with "dev" equal to "NODEV", line 5238,
breada (4773) set, and there is an-error, send- in particular, could be simplified.
ing the error indication to the
This procedure has an additional param- right process is "too hard.").
eter, as compared with "bread". It is The call (4824) on "geterror"
called only by "readi" (6256). will only report errors related physio (5259)
to the initiation of the write
4780: Check i f the desired block has operation. This routine is called to handle "raw"
already been assigned to a input/output i.e. operations which
buffer. (It may not yet be ignore the normal 512 character block
available, but at least is it size.
there?); bawrite (4856)
"physio" is called by "rkread" (5476)
4781: If not initiate the necessary This procedure is called by "writei" and "rkwrite" (5483) which appear as
read operation but don't wait for (6310) and "bdwrite" (4845). "writei" entries in the array "cdevsw" (4684)
it to finish; calls either "bawrite" or "bdwrite" i.e. as entries for a character device.
depending on whether the block to be
4788: Look around for the "read ahead" written has been wholly or partially "Raw i/o" is not an essential feature
block. If it is not there, allo- filled. of UNIX. For disk devices it is used
cate a buffer (4789) but release mainly for copying whole disks and
it (4791) if the buffer is checking the integrity of the file sys-
already ready; tem as a whole (see e.g. ICHECK (VIII)
bdwrite (4836) in the UPM) , where it is convenient to
4793: The "read ahead" block is not read whole tracks, rather than single
ready, so initiate an asynchro- This procedure is called by "writei" blocks, at a time.
nous read operation; (6311) and "bmap" (6443, 6449, 6485,
6500 and 6501 I). Note the declaration of "strat" (5261).
4798: If a buffer was assigned to the Since the actual parameter used e.g.
current block call "bread" to 4844: Don't delay the write if the dev- "rkstrategy" (5389) does not return any
wrap it up, else ••. ice is a magnetic tape drive ... value, is this form of declaration
keep everything in order; really necessary?
4800: Wait for the completion of the
operation which was started at 4847: Set the "B DONE", "B DELWRI"
line 4785. flags and call "brelse"-to link -000-
the buffer into the "av"-list.

UNIX Operating System 17-3 Buffer Manipulation


"file.h" describes the structure
of the-"file" array;

"filsys.h" describes the structure


of the "super block" for "mounted"
file systems;
Section Four is concerned with files
and file systems. "ino.h" describes the structure of
"TnOdes" recorded on "mounted"
devices;
A file system is a set of files and
associated tables and directories "inode.h" describes the structure
organised onto a single storage device of the "inode" array;
such as a disk pack.

The next two files, "sys2.s:" and


This section covers the means of "~ys3.s:"contain code for system calls.
( sysl.c" and "sys4.c" were presented
creating and accessing files; in Section Two) .
locating files via directories;
organising and maintaining
file systems. The next five files, "rdwri.c",
"subr.c", "fio.c", nalloc~ and
"iget..§:", together-present the -princi-
It also includes the code for an exotic pal routines for file management, and
breed of file called a "pipe". provide a link between the i/o oriented
system calls and the basic i/o rou-
tines.

The file "nami.c" is concerned with


searching directories to convert file
pathnames into "inode" references.
CHAPTER EIGHTEEN

File Access and Control Finally, ".ei.E.§..s:" is the "device


driver" for pipes.

A large part of every operating system File Characteristics


seems to be concerned with data manage-
ment and file management, and UNIX A UNIX file is conceptually a named
turns out to be no exception. character string, stored on one of a
variety of peripheral devices (or in
the main memory), and accessible via
mechanisms appropriate to the usual
Section Four peripheral devices.
Section Four of the source code con-
tains thirteen files. It will be noted that there is no
record structure associated with UNIX
files. However "newline" characters may
The first four contain common declara- be inserted into the file to define
tions needed by various of the other substrings analogous to records.
routines:

UNIX Operating System 18-1 File Access and Control


U~~h carries the ideas of device contents of the parent's "u.u ofile" Resources Reguired
independence to their logical extreme array.
by allowing the file name in effect to Each file requires the dedication of
determine uniquely all relevant attri- certain system resources. When a file
butes of the file. Each element of "file" includes a exists, but is not being referenced in
counter, "f count", to determine the any way, it requires:
number of current processes which
reference it. (a) a directory entry (16 characters
System Calls in a directory file);
"f count" is incremented by "newproc"
The following system calls are provided (1878) , "dup" (6079) and "falloc" (b) a disk "inode" entry (32 char-
expressly for file manipulation: (6857) ; it is decremented by "closef" acters in a table stored on the
(6657) and (if the file can't be disk) ;
opened) by "openl" (5836).
# Name Line # Name Line (c) zero, one or more blocks of disk
storage (512 characters each).
The "f flag" (5509) of the "file" ele-
3 read 5711 14 mknod 5952 ment notes whether the file is open for
4 write 5720 15 chmod 3560 reading and/or writing or whether it is In addition if the file is being refer-
5 open 5765 16 chown 3575 a "pipe" or not. (Further discussion of enced for some purpose, it requires
6 close 5846 19 seek 5861 "pipes" will be deferred till Chapter
8 creat 5781 21 mount 6086 Twenty-One.) (d) a core "inode" entry (32 charac-
9 link 5909 22 umount 6144 ters in the "inode" array);
10 unlink 3510 41 dup 6069
12 chdir 3538 42 pipe 7723 The "file" structure also contains a
pointer, "f inode" (5511) to an entry Finally if a user program has "opened"
in the "inode" table, and a 32 bit the file for reading or writing, a
integer, "f offset" (5512), which is a number of resources are required:
Control Tables logical pointer to a character within
the file. (e) a "file" array entry (8 charac-
The arrays "file" and "inode" are ters) ;
essential components of the file access
mechanism. (f) an entry in the user program's
inode (5659) "u.u ofile" array (one word per
file~ pointing to a "file" array
"inode" is defined as an array of entry) ;
structures (also named "inode").
The array "file" is defined as an array Mechanisms have to be set up for allo-
of structures (also named "file"). An element of the "inode" array is con- cating and deallocating each of these
sidered to be unallocated if the refer- resources in an orderly manner. The
ence count, "i_count", is zero. following table gives the names of the
An element of the "file" array is con- principal procedures involved:
sidered to be unallocated if "f count"
is zero. At each point in time, "inode" contains resource obtain free
a single entry for each file which may ======== ======
be referenced for normal i/o opera-
Each "open" or "creatH system call tions, or which is being executed or directory entry namei namei
results in the allocation of an element which has been executed and has the disk "inode" entry ialloc ifree
of the "file" array. The address of "sticky" bit set, or which is the work- disk storage block alloc free
this element is stored in an element of ing directory for some process. core "inode" entry iget iput
the calling process's array "file" table entry falloc closef
"u.u ofile". It is the index of the "u ofile" entry ufalloc close
newly allocated element of the latter Several "file" table entries may point -
array which is passed back to the user to a single Rinode" entry. The inode
process. Descendants of a process entry describes the general disposition
created by "newproc" inherit the of the file.

UNIX Operating System 18-2 File Access and Control


Opening ~ File ("FWRITEn) or 03 (-FREADIFWRITE") the "creat" system call, there are no
when "trf" is 0, but only 02 oth- disk blocks associated with the file,
When a program wishes to reference a erwise; now classed as "small".
file which already exists, it must
"open" the file to create a "bridge" to 5813: Where a file of the desired name
the file. (Note that in UNIX, already exists, check the access
processes usually inherit the open permissions for the desired open (5763)
files of their parents or predecessors, mode(s) of activity via calls on
so that often all needed files are "access" (6746), which may set We now turn to consider the case where
already implicitly open.) If the file "u.u error" as a side-effect; a program wishes to reference a file
does not already exist, it must be which already exists.
"created". 5824: If the file is being "created",
eliminate its previous contents
via a calIon "itrunc" (7414) • "namei" is called (5770) with a second
This second case will be investigated The code here could be improved parameter of zero to locate the named
first: by changing the test to "(trf file. ("u.u arg[0]" contains the
1)". Verify that this would be address in the user space of a charac-
so. ter string which defines a file path
name. )
creat (5781) 5826: "prele" (7882) is used to
"unlock" "inodes". Where, you
5786: "namei" (7518) converts a path- may ask, did the "inode" get "u.u arg[lj" has to be incremented by
name into an "inode" pointer. "locked", and why? one,- because there is a mismatch
"uchar" is the name of a pro- between the user programming conven-
cedure which recovers the path- 5827: Note that "falloc" (6847) calls tions and the internal data representa-
name, character by character, "ufalloc" (6824) as the first tions. )
from the user program data area; thing i t does;
5787: A null "inode" pointer indicates 5831: "ufalloc" leaves the user file
either an error or that no file identifying number in openl revisited
of that name already exists; "u.u_ar0[R0j". Why does this
statement occur where i t does, "trf" is now zero, so access permis-
5788: For error conditions, see "CREAT instead of after line 5834? sions are checked (5813) but the exist-
(II)" in the UPM; ing file (if any) is not deallocated
5832: "openi" (6702) is called to call (5824) .
5790: "maknode" (7455) creates a core handlers for special files, in
"inode" via a calIon "ialloc" case any device specific actions
and then initialises it and are required (for disk files What is a little disconcerting here is
enters it into the appropriate there is no action); that, apart from the calIon "falloc"
directory. Note the explicit (5827), there is no direct calIon any
resetting of the "sticky" bit 5839: In the case of an error while of the "resource allocation" routines.
("ISVTX") . making the "file" array entry, Of course, for an existing file, nei-
the "inode" entry is released by ther directory entry nor disk "inode"
a calIon "iput". entry nor disk blocks need be allo-
cated. The core "inode" entry is allo-
openl (5804) cated (if necessary) as a side-effect
It will be seen that responsibility is of the calIon "namei", but ... where
This procedure is called by "open" quite widely distributed. The "file" is it initialised?
(5774) and "creatH (5793, 5795), pass- table entry is initialised by "falloc"
ing values of the third parameter, and "openl"; the "inode" table entry,
"trf", of 0, 2 and 1 respectively. The by "iget", "ialloc" and "maknode".
value 2 represents the case where no close (5846)
file of the desired name already
exists. Note that "ialloc" clears out the The "close" system call is used to
"i addr" array of a newly allocated sever explicitly the connection between
5812: The second parameter, "mode", can "i~ode" and "itrunc" does the same for a user program and a file and thus can
take the values 01 ("FREAD"), 02 a pre-existing "inode", so that after be regarded as the inverse of "open".

UNIX Operating System 18-3 File Access and Control


The user program's file identification recorded on the disk "inode"; Reading and Writing
is passed via r0. The value is vali-
dated by "getf" (6619), the "u.u ofile" 7358: "prele" unlocks the "inode". Why It is of interest to work through an
entry is erased, and a call is made on should it be called here as well abbreviated summary of the code which
"closef". as at line 7363? is invoked when a user process performs
a "read" system call before examining
the code in detail.

closef (6643)
read (f; b; n) i /*user program*/
"closef" is called by "close" (5854) Deletion of Files
and by "exit" (3230). (The latter is {trap occurs}
more common since most files do not get New files are automatically entered
closed explicitly but only implicitly into the file directory as permanent 2693 trap
when the user program terminates.) files as soon as they are "opened".
Subsequent "closing" of a file does not {system call :ft3}
automatically cause its deletion. As
was seen at line 7352, deletion will 5711 read ( );
6649: If the file is a pipe, reset the occur when the field "i nlink" of the 5713 rdwr (FREAD);
mode of the pipe and "wakeup" any core "inode" entry is ziro. This field
process which is waiting for the is set to one initially by "maknode"
pipe, either for information or (7464) when the file is first created. Execution of the system call by the
for space; It may be incremented by the system user process results in the activation
call "link" (5941) and decremented by of "trap" running in kernel mode.
6655: If this is the last process to the system call "unlink" (3529). "trap" recognises system call #3, and
reference the file, call "closei" calls (via "trapl") the routine "read",
(6672) to handle any special end which calls "rdwr".
of file processing for special
files and then call "iput"; 5731 rdwr
Programs which create temporary "work
6657: Decrement the "file" entry refer- files" should remove these files before 5736 fp = getf (u.u_ar0[R0]);
ence count. If this now zero, the terminating, by executing an "unlink" 5743 u.u base = u.u arg[0];
entry is no longer allocated. system call. Note that the "unlink" 5744 u.u-count = u.u arg[l];
call does not of itself remove the 5745 u.u-segflg = 0;-
file. This can only happen when the 5751 u.u-offset[l] = fp->f offset[l];
reference count ("i count") is about to 5752 u.u-offset[0] = fp->f-offset[0];
be decremented to ziro (7350, 7362). 5754 readi(fp->f inode); -
5756 dpadd(fp->f-offset,
"closei", as its last action calls u.u=arg[l]-u.u_count) ;
"iput". This routine is in fact called To minimise the problems associated
from many places, whenever a connection with "temporary" files which survive
to a core "inode" is to be severed and program or system crashes, programmers "rdwr" includes much code which is com-
the reference count decremented. should observe the conventions that: mon to both "read" and "write" opera-
tions. It converts, via "getf" (6619),
(a) temporary files should be the file identification supplied by the
"unlinked" immediately after user process into the address of an
7350: If the reference count is one at they are opened; entry in the "file" array.
this point, the "inode" is to be
released. While this is happen- (b) temporary files should always be
ing, it should be locked. placed in the "tmp" directory. Note that the first parameter of the
Unique file names can be gen- system call is passed in a different
7352: If the number of "links" to the erated by incorporating the way from the remaining two parameters.
file is zero (or less) the file process's identifying number
is to be deallocated (s~e below); into the file name (See "getpid"
(3480») • "u.u segflg" is set to zero to indicate
7357: "iupdat" (7374) updates the that- the operation destination is in
accessed and update times as the user address space. After "readi"

UNIX Operating System 18-4 File Access and Control


is called with a parameter which is an "write" (5729), call "rdwr" immediately block number" of the file which
"inode" pointer, the final accounting to: is to be referenced;
is performed by adding the number of
characters requested for transfer less 5736: Convert the user program file 6249: "on" is a character offset within
the residual number not transferred identification to a pointer in the block:
(left in "u.u count") to the file the file table;
offset. - 6241: "n" is determined initially as
5739: Check that the operation (read or the minimum of the number of
write) is in accordance with the characters beyond "on" in the
6221 readi mode with which the file was block, and the number requested
opened; for transfer. (Note that "min"
6239 Ibn = lshift (u.u offset, -9); (6339) treats its arguments as
6240 on = u.u offset[lT & 0777; 5743: Set up various standard locations unsigned integers.)
6241 n = min (512 - on, u.u count); in "un with the appropriate
6248 bn bmap(ip, Ibn); - parameters; 6242: If the file is not a special
6250 dn = ip->i dev; block file then ...
6258 bp = bread-(dn, bn); 5746: "pipes" get special treatment
6260 iomove (bp, on, n, B_READ); right from the start! 6243: Compare the file offset with the
6261 brelse (bp); current file size;
5755: Call "readi" or "writei" as
appropriate; 6246: Reset "n" as the minimum of the
characters requested and the
"readi" converts the file offset into 5756: Update the file offset by, and remaining characters in the file:
two parts: a logical block number, set the value returned to the
"Ibn", and an index into the block, user program to, the number of 6248: Call "bmap" to convert the logi-
"on". The number of characters to be characters actually transferred. cal block number for the file to
transferred is the minimum of a physical block number for its
"u.u count" and the number of charac- host device. There will be more
ters left in the block (in which case on "bmap" shortly. For now, note
additional block(s) must be read (not readi (6221) that "bmap" sets "rablock" as a
shown» (and the number of characters side effect;
remaining in the file (this case is not 6230: If no characters are to be
shown». transferred, do nothing; 6250: Set "dn" as the device identifi-
cation from the "inode":
6232: Set the "inode" flag to indicate
"dn" is the device number which is that the "inode" has been 6251: If the file is a special block
stored within the "inode". "bn" is the accessed; file then •..
actual block number on the device
(disk), which is computed by "bmap" 6233: If the file is a character spe- 6252: Set "dn" from the "i addr" field
(6415) using "Ibn". cial file, call the appropriate of the "inode" entry: (Presumably
device "read" procedure, passing this will nearly always be the
the device identification as same as the "i dev" field, so why
The calIon "bread" finds the required parameter; the distinction?)
block, copying it into core from disk
if necessary. "iomove" (6364) 6238: Begin a loop to transfer data in 6253: Set the "read ahead block" to the
transfers the appropriate characters to amounts up to 512 characters at a next physical block:
their destination, and performs time until (6262) either an irre-
accounting chores. coverable error condition has 6255: If the blocks of the file are
been encountered or the requested apparently being read sequen-
number of characters has been tially then ...
transferred;
6256: Call "breada" to read the desired
6239: "lshift" (1410) concatenates the block and to initiate reading of
two words of the array the "read ahead block":
"read" and "write" perform similar "u.u_offset", shifts right by
operations and share much code. The nine places, and truncates to 16 6258: else just read the desired block:
two system calls, "read" (5711) and bits. This defines the "logical

UNIX Operating System 18-5 File Access and Control


6260: Call "iomove" to transfer infor- 6427: Start with the "small" file algo-
mation from the buffer to the rithm (file is not greater than
user area; eight blocks i.e. 4096 charac-
ters) ;
6261: Return the buffer to the
"av"-list. 6431: If the block number is 8 or more,
the "small" file must converted
into a large file. Note this is
a side effect of "bmap", and
writei should occur only when "bmap" has
been called by "writei" (and
6303: If less than a full block is never by "readi" see line
being written the previous con- 6245). Thus all files start life
tents of the buffer must be read as "small" files and are never
so that the appropriate part can explicitly changed to "large"
be preserved, otherwise just get files. Note also that the change
any available buffer; is irreversible!
6311: There is no "write ahead" facil- 6435: "alloc" (6956) allocates a block
ity, but there is _a "delayed on device "d" from the device's
write" for buffers whose final free list. It then assigns a
characters have not been changed; buffer to this block and returns
a pointer to the buffer header;
6312: If the file offset now points
beyond the recorded end of file 6438: The eight buffer addresses in the
character, the file has obviously "i addr" array for the "inode"
grown bigger! are copied into the buffer area
and then erased;
6318: Why is it necessary/desirable to
set the "IUPO" flag again? (See 6442: "i addr[0]" is set to point to
line 6285.) the buffer which is set up for a
"delayed" write;
6448: The file is still small. Get the
next block if necessary;
iomove (6364)
6456: Note the setting of "rablock";
The comment at the beginning of this
procedure says most of what needs to be
said. "copyin", "copyout", "cpass" and
"passc" may be found at lines 1244, Leftovers
1252, 6542 and 6517 respectively.
You should investigate the following
procedures for yourself:

seek (5861) statl (6045)


sslep (5979) dup (6069)
fstat (6014) owner (6791)
A general description of the function stat (6028) suser (6811)
of "bmap" may be found on Page 2 of
"FILE SYSTEM (V)" of the UPM.

6423: Files of more than 2**15 blocks -000-


(2**24 characters) are not sup-
ported;

UNIX Operating System 18-6 File Access and Control


Each file must have at least one name. The name for a file is obtained from
A file may have more than one distinct the names of the edges of the path
name, but the same name may not be between the root and the node
shared by two distinct files, i.e. co~responding to the file. (For this
each name must define a unique file. ~eason, the name is often referred to
as a "pathname".) If the~e are several
paths, then the file has several names.
A name may be multipa~t. When w~itten,
the parts o~ components of the name are
sepa~ated by slashes ("/"). The orde~
of components within a name is signifi- Di~ectory Files
cant i.e. "a/b/c" is different from
"a/c/b". A directory file is in many respects
indistinguishable from a non-directory
file. However it contains information
If file names a~e divided into two which is used in locating other files
pa~ts: an initial part or "stem" and a and hence its contents are ca~efully
final part or "ending", then two files protected, and are manipulated by the
whose names have identical stems a~e operating system alone.
usually related in some way. They may
reside on the same disk, they may
belong to the same use~, etc. In every file, the information is
stored as one or more 512 character
blocks. Each block of a directory file
is divided into 32 * 16 character
The Directory Data Structure structures. Each structure consists of
a 16 bit "inode" table pointer and a 14
Users make initial ~efe~ence to files character name. The "inode" pointer is
by quoting the file name, e.g. in the to the "inode" table on the same disk
"open" system call. An important or file system volume as the files
CHAPTER NINETEEN ope~ating system function is to decode which the directo~y references. (More
the name into the corresponding "inode R on this later.) An "inode" value of
File Di~ecto~ies and Directo~y Files entry. To do this, UNIX c~eates and zero defines a null entry in the direc-
maintains a directory data structure. tory.
This structu~e is equivalent to a
di~ected g~aph with named edges.
The procedures which reference direc-
As we have seen, much impo~tant info~­ tories are:
mation about individual files is con- In its purest form, the g~aph is a tree
tained in the "inode" tables. If the i.e. it has a single root node, with namei (7518) search directory
file is cu~~ently accessible, o~ being exactly one path between the root and link (5909) create alternate name
accessed, the relevant info~mation is any node. More commonly in UNIX (but wdir (7477) write directory entry
held in the co~e "inode" table. If a not so commonly in othe~ operating sys- unlink (3510) delete name
file is on disk (mo~e generally, on tems) the graph is a lattice which may
some "file system volume") and is not be obtained f~om a t~ee by coalescing
cu~~ently accessible, then the relevant one o~ more groups of leaves.
"inode" table is the one ~ecorded on namei (7518)
the disk (file system volume) .
In this case, while there is still only 7531: "u.u cdir" defines the "inode" of
one path between the root and any inte- a process's current directory. A
rior node, the~e may be mo~e than one process inherits its parent's
File Names path between the ~oot and a leaf. current directory at birth
- ---- Leaves are nodes without successo~s and ("newproc", 1883). The current
Notably absent f~om the "inode" table correspond to data files. Inte~ior directory may be changed using
is any info~mation ~egarding the "name" nodes a~e nodes with successo~s and the "chdir" (3538) system call;
of the file. This is sto~ed in the cor~espond to directo~y files.
directory files.

UNIX Operating System 19-1 File Directories and Directory Files


7532: Note that "func" is a parameter extends to line 7647. Each cycle "goto out;" (7605, 7613)
to "namei" and is always either of the loop handles a single
"uchar" (7689) or "schar" (7679); directory entry; a successful match so that the
branch to "eloop" (7647) is not
7534: "iget" (7276) is called to: 7600: If the directory has been taken;
searched (linearly!) without
wait until such time as the matching the supplied pathname 7657: If the name is to be deleted
"inode" corresponding to "dp" is component, then there must be an ("flag==2"), if the pathname has
no longer locked; error unless: been completed, and if the user
(a) this is the last component of program has "write" access to the
check that the associated file the pathname, i.e. "c=='\0'"; directory, then return a pointer
system is still mounted; (b) the file is to be created, to the directory "inode";
i.e. "flag == 1"; and
increment the reference count; (c) the user program has "write" 7662: Save the device identity tem-
permission for the directory; porarily (why not in the register
lock the "inode"; "c"?) and call "iput" (7344) to
7606: Record the "inode" address for unlock "dp", to decrement the
7535: Multiple slashes are acceptable! the directory for the new file in reference count on "dp" and to
(i.e. "////a///b/" is the same as "u.u_pdir"; perform any consequent process-
"/a/b") ; ing;
7607: If a suitable slot for a new
7537: Any attempt to replace or delete directory entry has previously 7664: Revalidate "dp" to point to the
the current working directory or been encountered (7642), store "inode" for the next level file;
the root directory is bounced the value in "u.u offset[l]";
immediately! else set the "IUPD" frag for the 7665: "dp==NULL" shouldn't happen,
"dp" designated "inode" (but since the directory says the file
7542: The label "cloop" marks the why?); exists! However "inode" table
beginning of a program loop that overflows and i/o errors can
extends to line 7667. Each cycle 7622: When appropriate, read a new occur, and sometimes the file
analyses a component of the path- block from the directory file system may be left in an incon-
name (i.e. a string terminated by (note the use of "bread") (why sistent state after a system
a null character or one or more not "breada"?), after carefully crash.
slashes). Note that a name may releasing any previously held
be constructed from many dif- buffer;
ferent characters (7571);
7636: Copy the eight words of the Some Comments
7550: The end of the pathname has been directory entry into the array
reached (successfully). Return "u.u dent". The reason for c6py- "namei" is a key procedure which would
the current value of "dp"; ing -before comparing is obscure! seem to have been written very early,
Can this actually be more effi- to have been thoroughly debugged and
7563: "search" permission for direc- cient? (The reason for copying then to have been left essentially
tories is coded in the same way the whole directory at all is unchanged. The interface between
as "execute" permission for other rather perplexing to the author "namei" and the rest of the system is
files; of these notes.); rather complex, and for that reason
alone, it would not win the prize for
7570: Copy the name into a more acces- 7645: This comparison makes efficient "Procedure of the Year".
sible location before attempting use of a single character pointer
to match it with a directory register variable, "CpR. The
entry. Note that a name of loop would be even more efficient "namei" is called thirteen times by
greater than "DIRSIZ" characters if word by word comparison were twelve different procedures:
is truncated; used;
7589: "u.u count" is set to the number 7647: The "eloop" cycle is terminated
of entries in the directory; by one of:

7592: The label "eloop" marks the "return (NULL);" (7610)


beginning of a program loop which

UNIX Operating System 19-2 File Directories and Directory Files


line routine Earameters (b) if "flag==2" (Le. the call is Under what conditions would the
from "unlink"), the value failure to unlock the "inode"
3034 exec uchar 0 returned (in normal cir- here be disastrous? The chances
3543 chdir uchar 0 cumstances) is an "inode" that the existing file would be a
5770 open uchar 0 pointer for the parent directory directory encountered in the
5914 link uchar 0 of the named file (7660); search for the new name would
6033 stat uchar 0 seem slight, if not impossible.
6097 smount uchar 0 (c) i f "flag==l" (Le. the call is Most probably the relevant cir-
6186 getmdev uchar 0 from "creat" or "link" or cumstance is where the system is
6976 owner uchar 0 "mknod", and a file is to be attempting to recreate an alter-
created if it does not already native file name or alias, which
5786 creat uchar 1 exist) and if the named file already exists;
5928 link uchar 1 does not exist, then a "NULL"
5958 mknod uchar 1 value is returned (7610). In 5927: Search the directory for the
this case a pointer to the second name, with the intention
3515 unlink uchar 2 "inode" for the directory which of creating a new entry;
will point to the new file, is
4101 core schar 1 left in "u.u pdir" (7606). (Note 5930: There is an existing file with
also that- in this case, the second name;
"u.u offset" is left pointing
I t will be seen that: either at an 2mpty directory 5935: "u.u pdir is set as a side effect
entry or at the end of the of the call on "namei" (5928).
(a) there are two calls from "link"; directory file.); Check that the directory resides
on the same device as the file;
(b) the calls can be divided into (d) if in the remaining cases, the
four categories, of which the file exists, an "inode" pointer 5940: write a new directory entry (see
first is by far the largest; for the file is returned (7551). below) ;
The "inode" is locked and the
(c) the last two categories have reference count has been incre- 5941: Increase the "link" count for the
only one representative each; mented. A call to "iput" is file.
needed subsequently to undo both
(d) in particular, there is only one these side effects.
call involving the routine
"schar", which is always for a wdir (7477)
file called "core". (If this
case were handled as a special link (5909) This procedure enters a new name into a
case e.g. where the second directory. It is called by "link"
parameter had the value "3", This procedure implements a system call (5940) and "maknode" (7467) with a
then the "uchar"s and "schar" which enters a new name for an existing pointer to a (core) "inode" as parame-
could be eliminated.) file into the directory structure. ter.
Arguments to the procedure are the
existing and the new names of the file;
"namei" may terminate in a variety of The sixteen characters of the directory
ways: 5914: Look up the existing file name; entry are copied into the structure
"u.u dent", and written from there into
(a) if there has been an error, then 5917: If the file already has 127 dif- the directory file. (Note that the pre-
a "NULL" value is returned and ferent names, quit in disgust; vious content of "u.u dent" will have
the variable "u.u error" is set. been the name of the-last entry in the
5921: If the existing file turns out to directory file.)
(Most errors result in a branch be a directory, then only the
to the label "out" (7669) so super-user may rename it;
that reference counts for the The procedure assumes that the direc-
"inode"s are properly maintained 5926: Unlock the existing file "inode" tory file has already been searched,
(7670). This is not necessary if This is locked when the first that the "inode" for the directory file
the failure occurs in "iget" calIon "namei" does an "iget" has already been allocated and that the
(7664) .) ; (7534,7664) . values of "u.u offset" have been set
appropriately.-

UNIX Operating System 19-3 File Directories and Directory Files


rnaknode (7455) second parameter passed to "mknod" is 6774: Compare "rn" and the access per-
used, without modification or restric- missions.
This procedure is called from "core" tion to set" i mode". (Compare "creatH
(4105) , "creatH (5790) and "mknod" (5790) and "chmod" (3569)). This is
(5966), after a previous calion the only wayan "inode" can get flagged Note that there is an anomaly here in
"namei" with a second parameter of one, as a directory, for instance. that if a file has a "mode" of 0077,
has revealed that no file of the speci- the owner cannot reference it at all,
fied name existed. but everyone else can. This situation
In such cases, the third parameter could be changed satisfactorily by
passed to "mknod" must be zero. This inserting a statement
value is copied into--n-r-addr[0]" (as is
unlink (3510) appropriate for special-files), and, if m =I (m I (m» 3)) » 3;
non-zero, will be accepted uncritically
This procedure implements a system call by "bmap" (6447). It might be prudent after line 6752, and replacing lines
which deletes a file name from the to insert a test 6764, 6765 by
directory structure. (When all refer-
ences to a file are deleted, the file if (ip->i_mode & (IFCHR & IFBLK) != 0) if (m & IEXEC && (m & ip->i_mode) 0)
itself will be deleted.)
before line 5969, rather than rely
3515: Search for a file with the speci- indefinitely on the infallibility of
fied name, and if it exists, the super-user. -000-
return a pointer to the "inode"
of the immediate parent direc-
tory;
access (6746)
3518: Unlock the parent directory;
This procedure is called by "exec"
3519: Get an "inode" pointer to the (3041), "chdir" (3552), "core" (4109),
file itself; "openl" (5815, 5817) r "namei" (7563,
7664, 7658) to check access permission
3522: Unlinking directories is forbid- to a file. The second parameter,
den, except for super-users; "mode", is equal to one of "IEXEC",
"IWRITE" and "IREAD", with octal values
3528: Rewrite the directory entry with of 0100, 0200 and 0400 respectively.
the "inode" value set to zero;
6753: "write" permission is denied if
3529: Decrement the "link" count. the file is on a file system
volume which has been mounted as
"read only" or if the file is
Note that there is no attempt to reduce functioning as the text segment
the size of a directory below its "high for an executing program;
water" mark.
6763: the super-user may not execute a
file unless it is "executable" in
at least one of the three "per-
mknod (5952) mission" groups. In any other
situation he is always allowed
This procedure, which implements a sys- access;
tem call of the same name, is only exe-
cutable by the super-user. As explained 6769: If the user is not the owner of
in the Section "MKNOD(II)" of the UPM, the file, shift "m" three places
this system call is used to create to the right so that group per-
"inodes" for special files. missions will be operative ... If
the groups don't match, shift "m"
again;
"mknod" also solves the problem of
"where do directories come from"? The

UNIX Operating System 19-4 File Directories and Directory Files


~ storage device is only The "super block" contains information
accessible if it is inserted in an used in allocating resources, viz. the
access device. In this situation, storage blocks and the entries in the
reference to the storage device is "inode" table recorded on the file sys-
made via a reference to the access tem. While the file system volume is
device; mounted a copy of the "super block" is
maintained in core and updated there.
a storage device is acceptable as To prevent the storage device copy
a fIle system volume if: becoming too far out of date, its con-
tents are written out at regular inter-
(a) information is recorded as vals.
addressable blocks of 512 char-
acters each, which can be
independently read or written.
The 'mount' table (0272)
(Note IBM compatible magnetic
tape does not satisfy this con- The "mount" table contained an entry
dition.); for each mounted file system volume.
Each entry defines the device on which
(b) the information recorded on the the file system volume is mounted, a
device satisfies certain con- pointer to the buffer which stores the
sistency criteria: "super block" for the device, and an
"inode" pointer. The table is refer-
block ill is formatted as a enced as follows:
"super block" (see below);
iinit (6922) which is called by
blocks #2 to # (n+l) (where n is "main" (1615), makes an entry for
recorded in the "super block") the root device;
CHAPTER TWENTY contain an "inode" table which
references all files recorded on smount (6086) is a system call
File Systems the storage device, and does not which makes entries for additional
reference any other files; devices;

directory files recorded on the iget (7276) searches the "mount"


storage device reference all, table if it encounters an "inode"
In most computer systems more than one and only, files on the same with the 'IMOUNT' flag set;
peripheral storage device is used for storage device, i.e. a file sys-
the storage of files. It is now neces- tem volume constitutes a self- getfs (7167) searches the "mount"
sary to discuss a number of matters contained set of files, direc- table to find and return a pointer
pertaining to the management by UNIX of tories and "inode" table; to the "super block" for a partic-
the whole set of files and file storage ular device;
devices. First, some definitions: ~ file system volume is mounted if
the presence of the storage device update (7201) is called periodi-
file system: an integrated collec- in an access device has been for- cally and searches the "mount"
tion of files with a hierarchical mally recognised by the operating table to locate information which
system of directories recorded on system. should be written from core tables
a single block oriented storage into the tables maintained on the
device; file system volumes;

storage device: a device which can The 'Super Block' (5561) sumount (6144) is a system call
store information (especially disk whIch deletes entries from the
pack or DECtape, etc.); The "super block" is always recorded as table.
block #1 on the storage device. (Block
access device: a mechanism for #0 is always ignored and is available
transferring information to or for miscellaneous uses not necessarily
from a storage device; concerned with UNIX.)

UNIX Operating System 20-1 File Systems


iinit (6922) parameters. while the file system volume is
mounted. (In practice, the second
This routine is called by "main" (1615) file is an empty file created
to initialise the "mount" table entry especially for this purpose.)
for the root device. smount (6086)
6926: Call the "open" routine for the 6093: "getmdev" decodes the first argu-
root device. Note that "rootdev" ment to locate a block oriented Notes
is defined in "conLc" (4695); access device;
1. The "read/write r, status of a mounted
6931: Copy the contents of the root 6096: "u.u dirp" is reset preparatory device depends only on the parameters
device "super block" into a to ~alling "namei" to decode the provided to "smount". No attempt is
buffer area not associated with second file name. (Note that made to sense the hardware "read/write"
any particular device; "u.u dirp" is set by "trap" to status. Thus if a disk is readied with
"u.u=arg[0j" (2770); "write protect" on, but is not mounted
6933: The zeroeth entry in the "mount" "read only", then the system will com-
table is assigned to the root 6100: Check that the file named by the plain vigorously.
device. Only two of the three second parameter is in a satis-
elements are explicitly initial- factory condition, i.e. no one
ised. The third, the "inode" else is currently accessing the 2. The "mount" procedure does not carry
pointer, will never be refer- file, and that the file is not a out any kind of label checking on the
enced; special file (block or charac- "mounted" file system volume. This is
ter) ; reasonable in a situation where file
6936: The "locks" stored in the "super system volumes are rarely rearranged.
block" are explicitly reset. 6103: Search the "mount" table looking However in situations where volumes are
(These locks may have been set for an empty entry mounted and remounted frequently, some
when the "super block" was last ("mp->m bufp==NULL") or an entry means of verifying that the correct
written onto the file system already- made for the device. volume has been mounted would seem
volume) ; (The "mount" data structure is desirable. (Further, if a file system
defined at line (272); volume contains sensitive information,
6938: The root device is mounted in a it may be desirable to include some
"writable" state; 6111: "smp" should point to a suitable form of password protection as well.
entry in the "mount" table; There is room in the "super block"
6939: The system sets its idea of the (5575) for the storage of a name and an
current time and date from the 6113: Perform the appropriate "open" encrypted password.)
time recorded in the "super routine, with the device name and
block". (If the system has been a read/write flag as arguments.
stopped for an appreciable (As was seen earlier, for the
period, the computer operator RK05 disk the "open" routine is a
will need to reset the contents "no-op") ;
of "time".) This procedure is called by "main"
6116: Read block #1 from the device. (1616,1618), "unlink" (3519), "ialloc"
This block is the "super block"; (7078) and "namei" (7534, 7664) with
two parameters which together uniquely
6124: Copy the "super block" into a identify a file: a device, and the
buffer associated with "NODEV", "inode" number of a file on the device.
From an operational view point, "mount- from the buffer associated with "iget" returns a reference to an entry
ing" a file system volume involves "d". The second buffer will not in the core "inode" table.
placing it in a suitable access device, be released again until the dev-
readying the device, and then entering ice is unmounted;
a command such as When "iget" is called, the core "inode"
6130: "ip" points to the "inode" for table is searched first to see i f an
"/etc/mount /dev/rk2 /rk2" the second named file. This entry already exists for the file in
"inode " is now flagged as the core "inode" table. If not, then
to the "shell", which forks a program "IMOUNT". The effect of this is "iget" creates one.
to perform a "mount" system call, pass- to force "iget" (7292) to ignore
ing pointers to the two file names as the normal contents of the file,

UNIX Operating System 20-2 File Systems


7285: Search the core "inode n table ••• "smount", when a file system update (7201)
volume was mounted;
7286: If an entry for the designated The function of this procedure, in its
file already exists .•. 7293: Search the "mount" table to find broadest terms, is to ensure that
the entry which points to the information on the file system volumes
7287: Then if it is locked go to sleep; current "inode". (Although is kept up to date. The comment for
searching this table is not a this procedure (beginning on line 719~)
729~: Try again. (Note the whole table horrendous overhead, it does seem describes the three main sub-functions,
needs to be searched again from possible that a "back pointer" (in the reverse order!).
the beginning, because the entry could be conveniently stored in
may have vanished!); in the "inode" e.g. in the
"i lastr" field. This would save "update" is the whole business of the
7292: If the "IMOUNT" flag is on both time and code space.); "sync" system call (3486). This may be
this is an important possibility invoked via the "sync" shell command.
for which we will delay the dis- 7396: Reset "dev" and "ino" to the Alternatively there is a standard sys-
cussion; mounted device number and the tem program which runs continuously and
"inode" number of the root direc- whose only function is to call "sync"
73~2: I f the "IMOUNT" flag is not set, tory on the mounted file system every 3~ seconds. (See "UPDATE (VIII) "
increase the "inode" reference volume. Start again. in the UPM.)
count, set the "ILOCK" flag and
return a pointer to the "inode";
Clearly, since "iget" is called by "update" is called by "sumount" (615~)
73~6: Make a note of the first empty "namei" (7534, 7664), this technique before a file system volume is
slot in the "inode" table: allows the whole directory structure on unmounted, and by "panic" (242~) as the
the mounted file system volume to be last action of the system before
73~9: I f the "inode" table is full, integrated into the pre-existing direc- activity ceases.
send a message to the operator, tory structure. If we momentarily
and take an error exit; ignore the possible deviations of 72~7: If another execution of "update"
directory structures away from tree is under way, then just return;
7314: At this point, a new entry is to structures, we have the situation where
be made in the "inode" table; a leaf of the existing tree is being 721~: Search the "mount" table;
replaced by an entire subtree.
7319: Read the block which contains the 7211: For each mounted volume,
file system volume "inode". Note
the use of "bread" instead of 7213: Unless the file system has not
"readi", the assumption that getfs (7167) been recently modified or the
"inode" information begins in "super block" is locked or the
block #2 and the convention that There is little that needs to be said volume has been mounted "read
valid "inode" numbers begin at about this procedure in addition to the only" •••
one (not zero); author's comment. This procedure is
called by 7217: Update the "super block", copy it
7326: A read error at this point isn't into a buffer and write the
very well reported to the rest of "access" (6754) "ialloc" (7~72) buffer out onto the volume:
the system; "alloc" (6961) "ifree" (7138)
"free" (7~~4) "iupdat" (7383) 7223: Search the "inode" table, and for
7328: Copy the relevant "inode" infor- each non-null entry, lock the
mation. This code makes implicit entry and call "iupdat" to update
use of the contents of the file Note the cunning use of "nl", "n2" the "inode" entry on the volume
"ino.h" (Sheet 56), which isn't which are declared as character if appropriate;
referenced explicitly anywhere. pointers i.e. as unsigned integers.
This allows only one sided tests on the 7229: Allow additional executions of
two variables at line 7177. "update" to commence;
Let us now return to unfinished busi-
ness: 723~: "bflush" (5229) forces out any
"delayed write" blocks.
7292: The "IMOUNT" flag is found to be
set. This flag was set by

UNIX Operating System 2~-3 File Systems


sumount (6144) list of up to 100 available "inode" 6967: Obtain the block number of the
entries; next available storage block;
This system call deletes an entry for a
mounted device from the "mount" table. locks to control manipulation of the 6968: If the last block number on the
The purpose of this call is to ensure above lists; list is zero, the entire list is
that traffic to and from the device is now empty;
terminated properly, before the storage flags;
device is physically removed from the 6970: "badblock" (7040) is used to
access device. current date of last update. check that the block number
obtained from the list seems rea-
6154: Search the "mount" table for the sonable;
appropriate entry; If the list in core of available
"inode" entries for the file system 6971: If the list of available blocks
6161: Search the "inod~" table for any volume ever becomes exhausted, then the in the "super block" is now
outstanding entries for files on entire table on the FSV is read and empty, then the block just
the device. If any such exist, searched to rebuild the list. Con- located will contain the
take an error exit, and do not versely if the available "inode" table addresses of the next group of
change the "mount" table entry; overflows, additional entries are sim- 100 free blocks;
ply forgotten to be rediscovered later.
6168: Clear the "IMOUNT" flag. 6972: Set "s flock" to delay any other
procesi from getting a "no space"
A different strategy is used for the indication before the list of
list of available storage blocks. available blocks in the "super
Resource Allocation These blocks are arranged in groups of block" can be replenished;
up to one hundred blocks. The first
Our attention now turns to the manage- block in each group (except the very 6975: Determine the number of valid
ment of the resources of an individual first) is used to store the addresses entries in the list to be copied;
FSV (file system volume) . of the blocks belonging to the previous
group. Addresses of blocks in the last 6978: Reset "s flock", and "wakeup"
incomplete group are stored in the anyone waIting;
Storage blocks are allocated from the "super block".
free list by "alloc" at the request of 6982: Clear the buffer so that any
"bmap". Storage blocks are returned to information recorded in the file
the free list by "free" at the behest The first entry in the first list of by default will be all zeros;
of "itrunc" (which iscalled by "core", block numbers is zero, which acts as a
"openl" and "iput"). sentinel. Since the whole list is sub- 6983: Set the "modified" flag to ensure
ject to a LIFO discipline, discovery of that the "super block" will be
a block number of zero in the list sig- written out by "update" (7213).
Entries in the FSV "inode" tables are nifies that the list is in fact empty.
made by "ialloc", which is called by
"maknode" and "pipe". Entries in this
table are cancelled by "ifree", which itrunc (7414)
is called by "iput". alloc (6956)
This procedure is called by "core"
This is called by "bmap" (6435, 6448, (4112), "openl" (5825) and "iput"
The "super block" for the FSV is cen- 6468, 6480, 6497) whenever a new (7353). In the first two cases, the
tral to the resource management pro- storage block is needed to store part contents of the "file" are about to be
cedures. The "super block" (5561) con- of a file. replaced. In the third case, the file
tains: is about to be abandoned.
6961: Convert knowledge of the device
size information (total resources name into a pointer to the "super 7421: If the file is a character or
available) ; block"; block special file then there is
nothing to do;
list of up to 100 available storage 6962: If "s flock" is set, the list of
blocks; available blocks is currently 7423: Search backwards the list of
being updated by another process; block numbers stored in the
"inode";

UNIX Operating System 20-4 File Systems


7425: If the file is "large", then an 7014: If the available list in the iupdat (7374)
indirect fetch is needed. (A dou- "super block" is already full, it
ble indirect fetch is needed for is time to write it out onto the This procedure is called by "statl"
blocks numbered seven and FSV. Set us_flock"; (6050), "update" (7226) and "iput"
higher.); (7357) to revise a particular "inode"
71316: Get a buffer, associated with the entry on a FSV. It does nothing if the
7427: Reference all 257 elements of the block now being entered in the corresponding core "inode" is not
buffer in reverse order. (Note free list; flagged ("IUPD" or "IACC");
this seems to be the only place
where characters #512, #513 of 7019: Copy the contents of the super
the buffer area are referenced. block list, preceded by a count The "IUPD" flag may be set by one of
Since they will presumably con- of the number of valid blocks,
tain zero, they will contribute into the buffer; write the unlink (3530) bmap (6452,6467)
nothing to the calculation. Hence buffer; unset the lock and chmod (35713) itrunc (7448)
if "510" were substituted for "wakeup" anybody waiting; chown (3583) maknode (7462)
"512" here, and again on line link (5942) namei (7609)
7432, a general improvement all 7025: Add the returned block to the writei (6285,6318) pipe (7751)
round would result (?»; available list.
The "IACC" flag may be set by one of
7438: "free" returns an individual
block to the available list; readi (6232) maknode (7462)
wr itei (6285) pipe (7751)
7439: This is the end of the "for"
statement commencing on line This procedure is one of the most popu-
7427. (Likewise the statement lar in UNIX (called from nearly thirty The flags are reset by "iput" (7359) •
which begins at 7432 ends at different places) and its use will have
7435.); already been frequently observed. 7383: Forget it, if the FSV has been
mounted as "read only";
7443: Clear the entry in "i_addr[ l";
In essence it simply decrements the 7386: Read the appropriate block con-
7445: Reset size information, and flag reference count for the "inode" passed taining the FSV "inode" entry.
the "inode" as "updated". as a parameter, and then calls "prele" As observed earlier with respect
(7882) to reset the "inode" lock and to to "iget", note the the use of
perform any necessary "wakeup"s. "bread" instead of "readi", the
assumption that the "inode" table
begins at block #2 and the con-
"iput" has an important side effect. If vention that valid "inode"
This procedure is called by "itrunc" the reference count is going to be numbers begin at one;
(7435, 7438, 7442) to reinsert a simple reduced to zero, then a release of
storage block into the available list resources is indicated. This may be 7389: Copy the relevant information
for a device. simply the core "inode", or both that from the core "inode";
and the file itself, if the number of
713135: It is not clear why the "s fmod" links is also zero. 7391: If appropriate, update the time
flag is set here as well-as at of last access;
the end of the procedure (line
71326). Any suggestions? 7396: I f appropriate, update the time
ifree (7134) of last modification;
71306: Observe the locking protocol;
This procedure is called by "iput" 741313: write the updated block back to
7010: If no free blocks previously (7355) to return a FSV "inode" to the the FSV.
existed for the device, restore available list maintained in the "super
the situation by setting up a one block". If this list is already full
element list containing an entry (as noted above) or if the list is
for block #0. This value will locked (using "s ilock") the informa- -000-
subsequently be interpreted as an tion is simply discarded.
"end of list" sentinel;

UNIX Operating System 20-5 File Systems


7731: Allocate a "file" table entry; 7799: "prele" unlocks the file and
"wakes up" any process waiting
7736: Remember the "file" table entry for the pipe.
as Dr" and allocate a second
"file" table entry;

7744: Return user file identifications wr itep (7805)


in R0 and Rl;
The structure of this procedure echoes
7746: Complete the entries in the that of "readp" in many respects.
"file" array and the "inode"
entry. 7828: Note that a "writer", which finds
that there are no more "readers"
left, receives a "signal" just in
case he is not monitoring the
result of his "write" operation.
"pipes" are different from other files (A "reader" in the analogous
in that two separate offsets into the situation receives a zero charac-
file are kept - one for "read" opera- ter count as the result of the
CHAPTER TWENTY-ONE tions and one for "write" operations. read, and this is the standard
The "write" offset is actually the same end-of-file indication.)
Pipes as the file size.
7835: The "pipe" size is not allowed to
7763: the parameter passed to "readp" grow beyond "PIPSIZ" characters.
is a pointer to a "file" array As long as "PIPSIZ" (7715) is no
entry, from which an "inode" greater than 4096, the file will
A "pipe" is a FIFO character list, pointer can be extracted; not be converted to a "large"
which is managed by UNIX as yet another file. This is highly desirable
variety of file. 7768: "plock" (7862) ensures that only from the viewpoint of access
one operation takes place at a efficiency.
time: either "read" or "write";
One group of processes may "write" into (Note that "PIPSIZ" limits the
a "pipe" and another group may "read" 7776: If a process wishing to write to "write" offset pointer value. If
from the same "pipe". Hence "pipeRs may a "pipe" has been blocked because the "read" offset pointer is not
be, and are used, primarily for inter- the pipe was "full" (or rather far behind, the true content of
process communication. because the valid part of the the "pipe" may be quite small).
file had reached the file limit),
it will have signified its predi-
By exploiting the concept of a cament by setting the "IWRITE"
"filter", which is a program which flag in "ip->i_mode";
reads an input file and transforms it plock (7862)
into an output file, and by using 7786: Release the lock before going to
"pipes" to link two or more programs of sleep; Lock the "inode" after waiting if
this type together, UNIX offers its necessary. This procedure is called by
users a surprisingly comprehensive and 7787: "i count" is the number of file "readp" (7768) and "writep" (7815).
sophisticated set of facilities. table entries pointing at the
"inode". If this is less than
two, then the group of "writers"
must be extinct; prele (7882)

7789: A process waiting for input will Unlock the "inode" and "wake" any wait-
A "pipe" is created as the result of a raise the "IREAD" flag. Since a ing processes. This procedure is called
system calion the "pipe" procedure. pipe cannot be full and empty by several others (especially "iput"),
simultaneously, no more than one in addition to "readp" and "writep".
7728: Allocate an "inode" for the root of the flags "IWRITE" or "IREAD"
device; should be set at one time; -000-

UNIX Operating System 21-1 Pipes


types, and, where appropriate, several
such devices simultaneously. The group
of "interactive terminals" (with key-
board input and a serial printer or
visual display output) can just be
coerced with difficulty into a single
device driver, as the reader may judge
during his perusal of the file "tty.c".

Section Five is the final section: last The standard UNIX device handlers for
but not least. It is concerned with character devices make use of the pro-
input/output for the slower, character cedures "putc" and "getc" which store
oriented peripheral devices. and retrieve characters into and from a
standard buffer pool. This will be
described in more detail in Chapter
Such devices share a common buffer Twenty-Three.
pool, which is manipulated by a set of
standard procedures.
The "PDPII Peripherals Handbook" should
be consulted for more complete informa-
The set of character oriented peri- tion on the device controller hardware
pheral devices are exemplified by the and the devices themselves.
following:

KL/DLll interactive terminal


PCll paper tape reader/punch LPll Line Printer Driver
LPll line printer.
This driver is to be found in the file
"lp.c" (Sheets 88, 89). Much of the
complexity of this driver is contained
CHAPTER TWENTY-TWO in the procedure "lpcanon" (8879).
This procedure is involved in the
Character Oriented Special Files proper handling of special characters
and this is a separate issue from the
one we wish to study first.

Character oriented peripheral devices Initially one may ignore "lpcanon" by


are relatively slow ( < 1000 characters assuming that all calls upon it (lines
per second) and involve character by 8859, 8865, 8875) are simply replaced
character transmission of variable by similar calls upon "lpoutput"
length, usually short, records. (8986). "lpcanon" acts as a "final
filter" for characters going to the
line printer: handling code conver-
A device handler (as its name suggests) sions, special format characters, etc.
is the software part of the interface
between a device and the general sys-
tem. In general, the device handler is
the only part of the software which
recognises the idiosyncrasies of a par-
ticular device. lpopen (8850)
When a line printer file is opened, the
As far as possible or reasonable, a normal calling sequence is followed:
single device driver is written to
serve many separate devices of similar

UNIX Operating System 22-1 Character Oriented Special Files


"open" (5774) calls "openl", controller is ready to receive real problem here, but one can
which (5832) calls "openi", which the next character; wonder.
(6716) calls, in the case of a
character special file, bit 6 "IENABLE" Set to allow "DONE" 8991: Raise the processor priority suf-
"cdevsw[ .. l.d open". In the case or "Error" to cause an inter- ficiently to inhibit the inter-
of the line printer, this latter rupt; rupts from the line printer, call
translates (4675) to "lpopen". "lpstart" and then drop the
Line Printer Data Buffer Register priority again.
("Ipbuf")
8853: Take the error exit if either
another line printer file is Bits 6 through e hold the seven bit
already open, or if the line ASCII code for the character to be lpstart (8967)
printer is not ready (e.g. the printed. This register is "write only".
power is off, or there is no While the line printer is ready, and
paper, or the printer drum gate 8858: Set the "enable interrupts" bit while there are still characters stored
is open, or the temperature is in the line printer status regis- away in the "safe place", keep sending
too high, or the operator has ter. characters to the printer controller.
switched the printer off-line.)
8859: Send a "form feed" (or "new
8857: Set the "lpll.flag" to indicate page") character to the printer, The presumption is that while the con-
that the file is open, the to ensure that characters which troller is building up a set of charac-
printer has a "form feed" capa- follow will start on a new page. ters for a complete line, the "DONE"
bility and lines are to be (As already noted above, at this bit will reset faster than the CPU can
indented by eight characters. stage we are ignoring "lpcanon" feed characters to the controller.
and assuming line 8859 to be sim-
ply "lpoutput (FORM)". "lpcanon"
does things like suppressing all However once a print cycle has been
Notes but the first "form feed" in a initiated, the "DONE" bit will not be
string of "form feed"s and "new reset again for a period of the order
(A). "lpll" is a seven word structure linens, to avoid wasting paper.); of lee milliseconds (depending on the
defined beginning at line 8829. The speed of the printer).
first three words of the structure in
fact constitute a structure of type
"clist" (7ge8). Only the first element lpoutput (8986) Note that during this series of data
is explicitly manipulated in "lp.c". transfers, interrupts will be inhibited
The next two are used implicitly by This procedure is called with a charac- and so "lpint" will not be getting into
"putc" and "getc". ter to be printed, as a parameter. the act whenever the "DONE" bit is set,
except possibly once at the very end
8988: "lpll.cc" is a count of the when the processor priority is reduced
(B). "flag" is the fourth element of number of characters waiting to again.
the structure. The remaining three ele- be sent to the line printer. If
ments are this is already large enough
("LPHWAT", 8819), "sleep" for a
"mee" maximum character count while (so as not to flood the lpint (8976)
"cec" current character count character buffer pool);
"mlc" maximum line count This procedure is called to handle
89ge: Call "putc" (e967) to store the interrupts from the line printer. As
character in a safe place. (The mentioned above, most potential inter-
(C). The line printer controller has function of "putc" and its com- rupts are ignored by the processor.
two registers on the UNIBUS. panion "getc" is a major topic to Those interrupts which are accepted by
be discussed in Chapter Twenty- the CPU will be associated with either
Line Printer Status Register (".!J?g") Three.) It should be noted that
no check is made that "putc" was (a) completion of a print cycle; or
bit 15 Set when an error condition successful in storinq the charac-
exists (see above); ter. (There may have been no (b) the printer going ready after a
space in the character buffers.) period during which the "Error"
bit 7 "DONE" Set when the printer In practice there seems to be no bit was set; or

UNIX Operating System 22-2 Character Oriented Special Files


(c) the last transfer in a series of "write n (5722) calls "rdwrn, Ipcanon (8879)
character transfers; which (5755) calls "writein,
which (6287) calls This procedure interprets characters
8980: Start transferring characters "cdevsw[ •• J.d write n , which being sent to the line printer and make
into the printer buffer again: translates (4675) to "lpwrite". various modifications, insertions and
deletions. It thus functions as a
8981: Wakeup the process waiting to filter.
feed characters to the printer if "lpwrite" takes the non-null characters
the number of characters waiting of a null terminated string recorded in 8884: The section of code from here to
to be sent is either zero or the user area, and passes them to line 8913 is concerned with char-
exactly "LPLWAT" (8818). "lpoutput" (via "lpcanon n ) one at a acter translation when the full
time. 96 character set is not avail-
able, and a 64 character set is
This latter condition is somewhat puz- in use.
zling in that it will only occasionally Since the capabilities of a
be satisfied. The intention surely is lpclose (8863) printer do not usually change
"if the number of characters in the with time, the defined variable
list is getting low, start refilling". The list of procedure calls which leads "CAP" (8840) must be set once and
However if "lpstart" carries out a to the invocation of this procedure is for all (at a particular instal-
series of transfers without interrup- similar to that for "lpopen". A "form lation) .
tion (at least by "lpint") the number feed" character is output to clear the The run-time test on
of characters could go from a value current page, and the "open" flag is (lpll. flag & CAP)
greater than "LPLWAT" to one less than reset. could be replaced by a compile-
this without this test ever being made. time test on
Accordingly the waiting process will (CAP)
not be awakened until the list is com- and if the compiler has its
pletely empty. The result could be fre- Discussion "druthers", if CAP turns out to
quently to delay the initiation of the be zero, the whole section of
next print cycle, and hence to allow "lpwrite" is called one or more times code to line 8913 could be com-
the printer to run below its rated to send a string of characters to the piled down to nothing.
capacity. printer. In turn it calls "lpcanon" The present code could be said
which calls "lpoutput". If at any point to plan ahead for a situation
too many characters are stored away, where an installation may have
One solution to this problem is to the process will "sleep" in "lpoutput". two or more printers of different
change entirely the buffering strategy Sooner or later "lpoutput" will con- types. Even so there is a basic
for line printers. A less drastic tinue, will store the character in a inconsistency here in the use of
change would involve inventing a new buffer area, and will then call "CAP", "IND" and "EJECT" on the
flag, "lpll.wflag" say, replacing lines "lpstart" to send, if ~ossible, a one hand, and "EJLINE" ,and "MAX-
8981, 8982 by something like string of characters to the printer COL" on the other. lri fact since
controller. forms of different s±zes are not
if (lpll.cc <= LPLWAT && lpll.wflag) uncommonly used on a single
{ wakeup (&lpll); printer, the last two should not
Ipll.wflag = 0 "lpstart" is called both when more be constants at all, but should
} characters are available to be sent, be dynamically settable.
and replacing line 8989 by and when an interrupt from the printer
is taken. 8885: Lower case alphabetics are
lpll. wflag++; translated by the addition of a
sleep (&lpll, LPPRI): constant, which is conveniently
The majority of calls on "lpstart" will defined as "'A' - • a 1 I I ;
in fact achieve nothing. Occasionally
(usually when the printer has just com- 8887: Certain of the remaining charac-
pleted a print cycle) "lpstart" will be ters are special characters which
lpwrite (8870) able to send a whole string of charac- are printed as a similar charac-
ters to the printer controller. ter with an overprinted minus
This is the procedure which is invoked sign, e.g. "{" (8889) is printed
as a result of the "write" system call: as "-t";

UNIX Operating System 22-3 Character Oriented Special Files


8909: The "similar" character is output (a) Any string of "form feed"s or used per line in such cases.
via a recursive calion "new linens which begins with a
"lpcanon", which will increment "form feed", will, if sent to a
"lpll.ccc" by one as a side printer with "form feed" capa-
effect; bility, be reduced to a single
"form feed";
8910: Decrement the current character PC-II paper Tape Reader/Punch Driver
count (for the same effect as a (b) A "form feed" character sent to
"back space" character) and ... a printer without the "form
feed" capability, will cause a This driver is to be found in the file
8911: prepare to output a minus sign: new line to be started but will "pc.c· on Sheets 86, 87. It is simpler
be passed on otherwise without than the line printer driver in that
8915: The "switch" statement beginning comment. there is no routine analogous to
here extends to line 8963. Cer- "lpcanon". However it is more compli-
tain characters involved in vert- cated in that there is both an input
ical and horizontal spacing are and an output device which can be
given special interpretations 8934: For "carriage return"s, and, simultaneously and independently
with delayed actions; note, "form feed"s and "new active.
linens, reset the current charac-
8917: For a horizontal tab character, ter count to zero or eight,
round the current character count depending on "IND", and return; A description of the operation of this
up to the next multiple of eight. device is included in the document "The
Do not output any blank charac- 8949: For all other characters ••• UNIX I/O System" by D. Ritchie. Certain
ters immediately; special features may be noted:
8950: If a string of "backspaceRs (real
8921: For a "form feed" or nne. line" or contrived) and/or "carriage
character, if: return"s has been received, out- (1). Only one process may open the file
put a single "carriage return" for reading at a time, but there is no
(a) the printer does not have a "page and reset the maximum character limit on the number of writers;
restore" capability; or count to zero;

(b) the current line is not empty; or 8954: Provided the current character (2). This routine pays a little more
count does not exceed the maximum attention to error conditions than the
(c) some lines have been completed line length, output blank charac- line printer driver, but the treatment
since the last "form feed" char- ters to bring the maximum charac- is still not exhaustive;
acter. then '" ter count to the current charac-
ter count. (Perhaps these two
3925: reset "lpll.mcc" to zero; variables would be more accu- (3). "passc" (8695) knows how many
rately called the "actual charac- characters are required and returns a
8926: Increment the completed line ter count" and the "logical char- negative value when "enough" is
count; acter count".); reached;
8927: Convert a "new line" character to 8959: Output the actual character.
a "form feed" if sufficient lines (4). "pcclose" is careful to flush out
have been completed on the any remaining characters in the input
current page, and the printer has queue if and only if it believes the
a "form feed" capability; For idle readers: A suggestion device was opened for input.
8929: Output the character, and i f i t It will be observed that backspaces for -000-
was a "form feed", reset the overprinting or underscoring characters
number of completed lines to introduce separate print cycles, and
zero; where these features are in heavy use,
the effective output rate of the
printer may be drastically reduced. If
Examination of this code will show this is considered a serious problem,
that: "lpcanon" could be rewritten to ensure
that no more than two print cycles are

UNIX Operating System 22-4 Character Oriented Special Files


"cfreelist" (8149). The head pointer situation becomes as in Figure ~3.3.
for the last element of the list has The character count has been decre-
the value "NULL". mented; the first "cblock" no longer
contains any useful information and has
been returned to "cfreelist"; and the
A list of "cblock"s provides storage head pointer now points to the first
for a list of characters. The procedure character in the second "cblockn.
"putc" may be used to add a character
to the tail of such a list, and "getc", $
to remove a character from the head of m
such a list. n
i 0
12 j P
Figures 23.1 through 23.4 illustrate head $, k q
the development of a list as characters tail $ 1 r
are deleted and added.
$ .~ Figure 23.3
g m
h n
i 0'
.14 j :P' The question now poses itself: "how is
head k q the difference between the first and
tail 1 ,r second situations detected so that the
action taken is always appropriate?":

Figure 23·1
The answer (if you have not already
guessed) involves looking at the value
$ $' $ of the pointer address modulo 8. Since
g m division by eight is easily performed
CHAPTER TWENTY-THREE h ' n' on a binary computer, the reason for
i 0 the choice of six charac~ers per
Character Handling 13 j, P "cblock" should now also be apparent.
head k q
tail f ,1 r
The addition of a character to the list
is illustrated in the change between
Buffering for character special devices Figure 23.~ Figure 23.3 and Figure 23.4.
is provided via a set of four word
blocks, each of which provides storage $ :$ $1
for six characters. The prototype g m' s
storage block is "cblock" (81413) which Initially the list is assumed to con- h n
incorporates a word pointer (to a simi- tain the fourteen characters i 0
lar structure) along with the six char- "efghijklmnopqr". Note that the head 13 j 'p
acters. and tail pointers point to characters. !head $ k q
If the first character, "e", is removed tail $i 1 r
by "getc", the situation portrayed in
Structures of type "clist" (79138) which Figure 23.1 changes to that of Figure
contain a character counter plus a head 23.2. The character count has been Figure Q.!,
and tail pointer are used as "headers" decremented and the head pointer has
for lists of blocks of type "cblock". been advanced by one character posi-
tion.
Since thi last "cblock" in Figure 23.3
"cblock"s which are not in current use was' full, a new one has been obtained
are linked via their head pointers into If a further character, "f", is removed from "cfreelist" and linked into the
a list whose head is the pointer from the head of the list, the list of "cblock"s. The character count

UNIX Operating System 23-1 Character Handling


and tail pointer have been adjusted 0931: Copy the parameter to rl and save Figures 23.2 and 23.3);
appropriately. the initial processor status word
and value of r2 on the stack; 0952: At this point, a "cblock" deter-
mined by r2 is to be returned to
0934: Set the processor priority to .. cfreelist.... Either r2 points
cinit (8234) five (higher than the interrupt into the "cblock" or just beyond
priority of a character device); it. Decrement r2 so that r2 will
This procedure, which is called once by point into the "cblock";
"main" (1613), links the set of charac- 0936: rl points to the first word of a
ter buffers into the free list, "clist" structure (i.e. a charac- 0953: Reset the three least significant
"cfreelist", and counts the number of ter count). Move the second word Dlts of r2, leaving a pointer to
character device types. of this structure (i.e. a pointer the "cblock";
to the head character) to r2;
8239: "ccp" is the address of the first 0954: Link the "cblock" into "cfreel-
word in the array "cfree" (8146) I 0937: If the list is empty (head ist";
pointer is "NULL") go to line
8240: Round "ccp" up to the next 0961; 0957: Restore the values of r2 and PS
highest multiple of eight, and from the stack and return;
mark out "cblock" sized pieces, 0938: Move the head character to r0 and
taking care not to exceed the increment r2 as a side effect; 0961: At this point the list is known
boundary of "cfree". to be empty because a "NULL" head
Note. In general there will be 0939: Mask r0 to get rid of any pointer was encountered. Make
"NCLIST 1" (rather than extended negative sign; sure that the tail pointer is
"NCLIST") blocks so defined; "NULL" also;
0940: Store the updated head pointer
8241: Set the first word of the back in the "clist" structure. 0962: Move -1 to r0 as the result to be
"cblock" to point to the current (This may have to be altered returned when the list is empty.
head of the free list. later.);
Note that "c next" is defined on
line 8141, -and that the initial 0941: Decrement the character count and
value of "cfreelist" is "NULL". if this is still positive, go to putc (0967)
line 0947;
8242: Update "cfreelist" to point to This procedure is called by
the new head of the list; 0942: The list is now empty, so reset
the head and tail character canon (8323)
8244: Count the number of character pointers to "NULL". Go to line tty input (8355,8358)
device types. Upon reference to 0952; ttyoutput (8414, 8478)
"cdevsw" on Sheet 46, it will be pcrint (8730)
seen that "nchrdev" will be set 0947: Look at the three least signifi- pcoutput ( 8756)
to 16, whereas a more appropriate cant bits of r2. If these are lpoutput (8990)
value would be 10. non-zero, branch to line 0957
(and return to the calling rou- with two arguments: a character and the
tine forthwith); address of a "clist" structure.

0949: At this point, r2 is pointing at


the next character position "getc" and "putc" have related func-
This procedure is called by beyond the "cblock". Move the tions and the codes for the two pro-
value stored in the first word of cedures are similar in many respects.
flushtty (8258, 8259, 8264) the "cblock n (i.e. at r2 8) , For this reason the code for "putc"
canon (8292) pcread (8688) which is the address of the next will not be examined in detail, but is
ttstart (8520) pcstart (87l4) "cblock n in the list, to the head left for the reader.
ttread (8544) Ips tart (897l) pointer in the "clist". (Note
pcclose (8673) that rl was incremented as a side
effect at line 094l); It should be noted that "putc" can fail
with a single argument which is the lL a new ~cblock~ is needed and
address of a "clist" structure. 0950: The last value stored needs to "cfreelist" is empty. In this case a
incremented by two (Consult non-zero value (line 1092) is returned

UNIX Operating System 23-2 Character Handling


rather than a zero value (line (996). Graphic Characters Sometimes some of the graphic symbols
may be non-standard, e.g.","" n instead
There are 96 graphic characters. Two of of • ", and this can be inconvenient,
Note. The procedures "getc n and "pu tc n these, the space and the delete, are though not usually fatal.
discussed here are NOT directly related not "visible", and may be classified
to the procedures dISCussed in the Sec- with the control characters.
tions "GETC(III)n and "PUTC(III)" of
the UPM. UNIX Conventions
The graphic characters may be divided
into three groups of 32 characters, UNIX prefers, as the reader is no doubt
which may be roughly characterised as well aware, to view the world through
Character Sets "lower case" spectacles. Alphabetic
I. numeric and special characters characters received from an "upper case
II. upper case alphabetic characters only" terminal are translated
UNIX makes use of the full ASCII char- III. lower case alphabetic characters. immediately upon receipt from upper
acter set, which is displayed in Sec- case to lower case. A lower case alpha-
tion "ASCII(V) " of the UPM. Since betic may subsequently be translated
knowledge of this character set is Of course, since there are only 26 back to upper case if it is preceded by
often assumed without comment, not alphabetic characters, the latter two a single backslash. For output to such
always justifiably, some comment here groups include some special characters a terminal, both upper and lower case
would seem to be in order. as well. In particular, the last group alphabetic characters are mapped to
includes the following six non- upper case.
alphabetic characters:
"ASCII" is an acronym for "American
Standard Code for Information Inter- 140 reverse apostrophe Equivalences for the five "upper case"
change" • 173 left brace special characters are as follows:
174 vertical bar
175 right brace character line printer terminal
176 tilde
Control Characters 177 delete .L \'
of \(
The first 32 of the 128 ASCII charac- of \1
ters are non-graphic and are intended t \)
for the control of some aspect of \A
transmission or display. The control Graphic Character Sets
characters explicitly used or recog-
nised by UNIX are Devices such as line printers or termi- The conventions for line printers and
nals which support all the ASCII terminals are different because:
Numeric Mnemonic Description UNIX graphic symbols are often-said to sup-
Value Name port the 96 ASCII character set (though (a). for line printers, horizontal
there are only 94 graphics actually alignment is usually important,
004 eot end of transmission 004 involved) • and it is possible (without too
or (control-D) much difficulty) to print compo-
010 bs back space 010 site, overstruck characters
011 ht (horizontal) tab '\t' Devices which support all the ASCII (using the minus sign in this
1n2 nl new line or line feed FORM graphic symbols except those in the case); and
1n4 np new page or form feed '\n' last group of 32, are said to support
inS cr carriage return '\r' the 64 ASCII character set. Such dev- (b) for terminals, horizontal align-
034 fs file separator or quit CQUIT ices lack the lower case alphabetics ment is not considered to be so
040 sp forward space or blank , , and the symbols listed above, namely important: backspacing to pro-
0177 del delete CINTR "-II, "{", "I", "}" and II-II. Note that vide overstruck characters does
"delete", since it is not a visible not work on most VDUs; and,
character, can still be supported. since the same graphic conven-
It will be noted that the last two of tions are used for both input
these belong to the last 96 characters, and output, the symbols should
or the graphic portion, of the code. Devices in this latter group may be be as convenient to type as pos-
referred to as "upper case only". sible.

UNIX Operating System 23-3 Character Handling


maptab (8117) char par tab [] {
This array is used in the translation 0001,0201,0201,0001,0201,0001,0001,0201,
of character input from a terminal pre- 0202,0004,0003,0205,0005,0206,0201,0001,
ceded by a single backslash, "\". 0201,0001,0001,0201,0001,0201,0201,0001,
0001,0201,0201,0001,0201,0001,0001,0201,
0200,0000,0000,0200,0000,0200,0200,0000,
There are three characters, 004 (eot), 0000,0200,0200,0000,0200,0000,0000,0200,
'#' and '@', which always have special 0000,0200,0200,0000,0200,0000,0000,0200,
meanings and need to be asserted by a 0200,0000,0000,0200,0000,0200,0200,0000,
backslash whenever they are to be 0200,0000,0000,0200,0000,0200,0200,0000,
interpreted literally. These three 0000,0200,0200,0000,0200,0000,0000,0200,
characters occur in "map tab" in their 0000,0200,0200,0000,0200,0000,0000,0200,
"natural" locations (i.e. their loca- 0200,0000,0000,0200,0000,0200,0200,0000,
tions in the ASCII table). Thus for 0000,0200,0200,0000,0200,0000,0000,0200,
example 'i' has code 043 and 0200,0000,0000,0200,0000,0200,0200,0000,
0200,0000,0000,0200,0000,0200,0200,0000,
maptab[043] == 043. 0000,0200,0200,0000,0200,0000,0000,0201

};
The other non-null characters in "map-
tab" are involved in the translation of
input characters from "upper case only"
devices and do not occur in their Each element of "par tab" is an eight
"natural" locations but in the location bit character, which, with the use of
of their equivalent character, e.g. "I" appropriate bitmasks (0200 and 0177) ,
occurs in the natural location for "en, can be interpreted as a two part struc-
since "\(n will be interpreted as "I", ture:
etc.
bit 7 parity bit;
bits 3-6 not used. Always zero;
Note the situation regarding alphabetic bits 0-2 code number.
characters. This is only explicable
when it is remembered that the alpha-
betic characters are all translated to The parity bit is appended to the seven
lower case before any backslash is bit ASCII code when a character is
recognised. transmitted by the computer, to form an
eight bit code with even parity.

The code number is used by "ttyoutput"


(8426) to classify the character into
partab (7947) one of seven categories for determining
the delay which should ensue before the'
This array consists of 256 characters, transmission of the next character.
like "maptab". Unfortunately the initi- (This is particularly important for
alisation of "par tab" was omitted from mechanical printers which require time
the UNIX Operating System Source Code for the carriage to return from the end
booklet. It is certainly needed, and so of a line, etc.)
is given now:
-000-

UNIX Operating System 23-4 Character Handling


(~) output technique: serial printer Each interface has its own control
or visual display; characteristics and it requires a
separate operating system device
(~) miscellaneous: combined carriage driver. The common code which can be
return/line feed character; half shared between these is gathered into a
duplex terminal (input charac- single file "tty.c n , to be found on
ters do not need echoing); Sheets 81 to 85. A set of common defin-
recognition of tab characters; itions is gathered in the file "tty.h"
on Sheet 79.
(!) characteristic delays for cer-
tain control functions, e.g.
carriage returns may not be com- By way of example, Sheet 80 contains
pleted within a single character the file "k1.c", which constitutes the
transmission time, etc. device driver for a set of DL11/KLll
interfaces. This device driver always
needs to be present, since one KLl1
interface is invariably included in a
Interfaces system for the the operator's console
terminal.
As well as the wide variety of termi-
nals which are available and in use,
CHAPTER TWENTY-FOUR there is also a variety of hardware
devices which may be used to interface The '!lY' Structure (7926)
Interactive Terminals a terminal to a PDP 11 computer. For
example: An instance of "tty" is associated with
every terminal port to the system (no
DLll/KLll single line, asynchronous matter what type of hardware interface
interface; 13 standard is used). A "port" in this context is a
Our remaining task, to be completed in transmission rates between place to attach a terminal line. Hence
this and the following chapter, is to 40 and 9600 baud; a DLII supplies only one port, whereas
consider the code which controls a DJl1 supplies up to sixteen ports.
interactive terminals (or "terminals", DJl1 16 line, asynchronous, buf-
for short). fered serial line multi-
plexer; 11 speeds between The "tty" structure consists of sixteen
75 and 9600 baud, select- words and includes:
A wide variety of terminals is avail- able in four line groups;
able and several different types may be A. t dev fixed for a particular
simultaneously attached to a single DHII 16 line, asynchronous, buf- t addr terminal port;
computer. Distinguishing characteris- fered, serial line multi-
tics for different classes of terminal plexer; 14 speeds, indivi- B. t speeds fixed for a particular
include (besides such non-essential dually selectable; DMA t-erase terminal. These values may
features as shape, size and colour): transmission t-kill be set by "stty" and
t=flags interrogated by "gtty";
(~) transmission s~3ed, e.g. 110
baud for an ASR teletype, 300 Each of the above interfaces will work C. t rawq list heads for three char-
baud for a DECwriter, 2400 baud in full or half duplex mode; handle 5, t-canq acter queues: the so-
or 9600 baud for a Visual 6, 7 or 8 level codes; generate odd, t=outq called "raw" input,
Display Unit ("VDU"); even or no parity; and generate a stop "cooked" input and the
code of 1, 1.5 or 2 bits. output queues;
(£) graphic character set, notably
the full ASCII graphic set and D. t state status information which
the 64 graphic subset; In addition to the above asynchronous t-delct changes frequently during
interfaces, there are a number of syn- t-col normal processing;
(£) transmission parity: odd, even, chronous interfaces, e.g. DQll. t-char
none or inoperative;
Table 24.1

UNIX Operating System 24-1 Interactive Terminals


Note "u.u arg[ .• j" using the parameter sup- 8593: Reset the "flags" defining some
plied as a pointer, and then calls relevant terminal characteristics
The reader should study the information "sgtty". (see Sheet 79):
on Sheet 79 carefully. Certain items
listed below are not referenced in any flag bit i f-
- set
- .•.
essential way in the selection of code
examined here. ~ (8201) XTABS 1 the terminal will not inter-
pret horizontal tab characters
t char (7940) NLDELAY (7974) 8206: Get a validated pointer to a correctly;
t-speeds (7941) TBDELAY (7975) "file" array entry;
HUPCL (7966) CRDELAY (7976) LCASE 2 the terminal supports only the
ODDP (7972) WOPEN (7985) 8209: Check that the file is a "charac- 64 character ASCII subset;
EVENP (7973) ASLEEP (7993) ter special";
ECHO 3 the terminal is operating in
8213: Call the appropriate "d sgtty" full duplex mode, and input
routine for the device type. (See characters must be echoed
Sheet 46.) back;
Initialisation
CRMOD 4 upon input, a "carriage
Initialisation of the "tty" structures Note that the "d sgtty" routine is return" is replaced by a "line
is the responsibility of the various "nodev" for the li~e printer and paper feed"; upon output, a "line
"open" routines in the device drivers, tape reader/punch. feed" is replaced by a "car-
for example, "klopen" (8023). riage return" and a "line
feed";

The items in Group B of Table 24.1 may klsgtty (8090) RAW 5 input characters are to be
be changed by a "stty" system call. sent to the program exactly as
The current values may be interrogated This is an example of a "d sgtty" rou- received, without "erase" or
by a "gtty" system call. tine. It calls "ttystty" passing a "kill" processing, or adjust-
pointer to the appropriate "tty" struc- ment for backslash characters.
ture as a parameter.
A description of these is contained in
the sections, "STTY(II)" and "GTTY(II)"
of the UPM. These calls are invoked by In addition, the following bits are
the "stty" shell command which is ttystty (8577) interrogated by "ttyoutput" (8373) in
described in the section "STTY(I)". choosing the delay which should ensue
A call originating from "stty" will after the character indicated is sent,
have a second parameter of zero. before sending the next character:
Since the "stty" and "gtty" system
calls require a file descriptor as a 8589: Empty all the queues associated 8,9 line feed;
parameter, they can only be applied to with the terminal forthwith. They 10,11 horizontal tab;
an "open" character special file. quite likely contain nonsense; 12,13 carriage return;
14 vertical tab or form feed.
8591: Reset the speed information (use-
The two system calls share a good deal ful in the case of a DHII inter-
of common code. We will trace the pro- face, but of little interest for
gress of an execution of "stty" below the present selection of code);
and leave the tracing of a similar exe- The DLll/KLll Terminal Device Handler
cution of "gtty" to the reader. 8592: Reset the "erase" character and
the "kill" character. ("kill" The file "kl.c" constitutes the device
here denotes "throwaway the handler for terminals connected to the
current input line".) Note that system via DLll/KLll interfaces. This
if these characters are changed group always has at least one member -
away from their normal values of the operator's console terminal. Hence
This procedure implements the "stty" -.- and "@" respectively, no this device handler will always be
system call. It copies three words of corresponding changes are made to present.
user parameter information into "maptab". Nor should they!);

UNIX Operating System 24-2 Interactive Terminals


Each DLll/KLll hardware controller pro- interrupt to be generated It will be seen that "klopen· calcu-
vides an asynchronous, serial interface whenever bit 7 is set.) lates the correct kernel mode address
to connect a single terminal to a PDP (16 bits) for the Receiver Status
11 system. For more complete details Register for each interface, and this
regarding this interface, the reader Transmitter Data Buffer Register is stored (8e44) into the the At addr"
should consult the "PDPII Peripherals (kltbuf) element of the appropriate "tty" struc-
Handbook". ture.
bits 7-0 Transmitted data. Write only.

Device Registers Interrupt vector Addresses


UNIBUS Addresses
Each DLll/KLll unit has a group of four The vector addresses for the first
registers occupying four consecutive The Receiver Status Register always has interface are 060 and 064 (for receiver
words on the UNIBUS. UNIX maps a its lowest address starting on a four and transmitter interrupts, respec-
structure of type "klregs" (8016) onto word boundary. (The addresses which tively). Additional DLll/KLll inter-
each register group. follow are all 18 bit octal addresses.) faces have vector addresses which are
always at least e300, and which are
Receiver Transmitter assigned according to rules which take
Receiver Status Register (klrcsr) Status Data into consideration other interfaces
which may be present.
bit 7 Receiver Done. (A character has Operator's console 777560 -> 777566
been transferred into the
Receiver Data Buffer Regis- Group Two 776500 -> 776506 The second word of an interrupt doublet
ter.) ; 776510 -> 776516 is the "new processor status" word. The
------ ------ five low order bits of this word may be
bit 6 Receiver Interrupt Enable. 776670 -> 776676 chosen arbitrarily, and are in fact
(When set, an interrupt is used to define the minor device number
caused every time bit 7 is Group Three 775610 -> 775616 (cf. a similar use to distinguish the
set.) ; 775620 -> 775626 various kinds of "traps" see Sheet
------ ------ 05) • A masked version of the new pro-
bit 1 Data terminal ready; 776170 -> 776176 cessor status word is provided to the
interrupt handling routines as the
bit 0 Reader Enable. write only. parameter "dev" (see e.g. line 8070).
(When set, bit 7 is Apart from the operator's console
cleared.). interface which has its own standard
UNIBUS location, the interfaces are
gathered into two groups (for reasons Source Code
Receiver Data Buffer Register (klrbuf) which are irrelevant here). within
each group, by convention, registers We can now turn to a detailed study of
bit 15 Error indication, when set. are allocated in consecutive locations the code in the files "kl.c" (Sheet 80)
starting at the lowest address. and "tty.c" (Sheets 81 to 85). We
bits 7-0 Received character, Read shall look first at "opening" and
only. "closing" terminals as character spe-
cial files and the handling of inter-
Software Considerations rupts. Then in the next chapter we
Transmitter Status Register (kltcsr) shall look at the receipt of data from
"NKLll" (8011) must be set to define, the terminal, and finally transmission
bit 7 Transmitter ready. This is for a particular installation, the of data to the terminal.
cleared when data is loaded number of interfaces in the first two
into the Transmitter Data groups, and "NDLll" (8012), the number
Buffer, and is set when the in the third group. Any hardware "klread" (8062), "klwrite" (8066) and
latter is ready to receive alterations which changed the actual "klsgtty" (8090) have already been dis-
another character; number of interfaces would have to be cussed above.
reflected in the software by changing
bit 6 Transmitter Interrupt Enable. and recompiling "kl.c", and reI inking
(When set, causes an the operating system.

UNIX Operating System 24-3 Interactive Terminals


klopen (8023) (This does not seem to be 8059i "t state" is reset so that "ISO-
entirely true, and this point PEi" and "CARR ON" are no longer
This procedure is called to "open" a will be taken up again later.); true.
terminal as a character special file.
This call is usually made by the pro- 8047: The standard terminal is assumed
gram "/etc/init" for each terminal to be unable to interpret hor-
which is to be active in the system. izontal tabs, to support only the klxint (8070)
Since child processes inherit the open 64 character ASCII subset, to run
files of their parents, it is not usu- in full duplex mode and to This procedure is executed in response
ally necessary for other processes to require both "carriage return" to a transmitter interrupt. It should
~open" the device again. It will be and "line feed" characters to be compared with "pcpint" (8739) and
noted that the there is no attempt to provide normal "new Line" pro- "lpint" (8976). Note that the parameter
stop two unrelated processes having the cessing. (Could this be a Model "dev" is a masked version (low order
terminal as an open file simultane- 33 teletype?); five bits preserved) of the "new pro-
ously. cessor status" word in the interrupt
8048: The "erase" and "kill" characters vector. Provided the vector was prop-
are set according to the UNIX erly initialised, the minor device
convention; number will be properly identified.
8026: Check the minor device number;
8051: The Receiver Control Status
8030: Locate the appropriate "tty" register is initialised with the The second part of the test on line
structure; pattern "0103" so that the termi- 8074 will be discussed at the end of
nal is made ready, reading is the next chapter.
8031: If ·the process opening the file enabled and receiver interrupts
has no associated controlling are enabled;
terminal designate the current
terminal for this role. (Note 8052: The Transmitter Control Status klrint (8078)
that the reference stored is the register is initialised so that
address of a "tty" structure.); an interrupt will be generated This procedure is executed in response
whenever the interface is ready to a receiver interrupt. It is not so
8033: Store the terminal device number to receive another character. readily compared with "pcrint" (8719)
in the "tty" structure; although similarities certainly exist.

8039: Calculate the address of the Note that the "open" routine does not
appropriate set of device regis- distinguish between the cases where the
ters for the terminal and store file is opened for reading only, or 8083: Read the input character from the
in "t_addr"; writing only, or for both reading and Receiver Data Buffer register;
writing.
8045: If the terminal is not already 8084: Enable the receiver for the next
"open", do some initialisation of character;
the "tty· structure
klclose (8055) 8085: The comment says "hardware
8046: "t state" is set to show the file botch". Better believe it;
is-"open", so that the next three 8057: Find the address of the appropri-
lines will not be executed if the ate "tty" structure in the array 8086: Pass the character to "ttyinput"
file is opened a second time, of such structures, "kIll" to insert it into the appropriate
possibly undoing the effect of a (8015) • (This operation may be "raw" input queue.
"stty" system call; observed in all the procedures in
"t state" is also set to show the second column of Sheet 80,
"CARR ON" ("carrier on"). This is and its relevance should be
a software flag which shows that noted.) ; -000-
the terminal is logically
enabled, regardless of the true 8058: "wflushtty" (8217) allows the
hardware status of the terminal. output queue for the terminal to
If ·CARR ON fi is reset for a ter- "drain" and then flushes the
minal, the system should ignore input queue;
all input from the terminal.

UNIX Operating System 24-4 Interactive Terminals


flushtty (8252) "wflushtty" is called (8"'58) by
"klclose". This does not happen very
The purpose of this procedure is to often - in fact only when all files
"normalise" the queries associated with referencing the terminal are closed
a particular terminal. Its effect is i.e. usually only when the user logs
to terminate transmission to the termi- off.
nal forthwith and to throwaway any
accumulated input characters.
It is also called by "ttystty" (8589)
8258: Throwaway everything in the just before the terminal environment
"cooked" input queue; parameters are adjusted.

8259: ditto for the output queue;

826"': Wakeup any process waiting to


extract a character from the Character Input
"raw" input queue;
For a program requesting input from a
8261: ditto for the output queue; terminal, there is a chain of procedure
calls which extends to "ttread" •.•

8263: Raise the processor priority to


prevent an interrupt from the
terminal while ••• ttread (8535)
82~4: the "raw" input queue is flushed, 8541: Check that the terminal is
and logically active;

8265: the "delimiter count" is properly 8543: If there are characters in the
set to zero. "cooked" input queue or a calion
CHAPTER TWENTY-FIVE "canon" (8274) is successful ••.
The File ~tty.c" "flush tty" is called by "wflushtty" 8544: transfer characters from the
(see below) and "ttyinput" (8346,835"') "cooked" input queue until either
when either: it is empty or enough characters
have been transferred to suit the
(a) the terminal is not operating in user's requirements.
In this, the last chapter, the intrica- "raw" mode and a "quit" or
cies of interactive terminal handlers "delete" character is received
are finally unveiled, including: from the terminal; or
(a) the handlihg of the "erase" and (b) the "raw" input queue has grown
"kill" characters; unreasonably large (presumably This procedure is called by "ttread"
because no process is reading (8543) to transfer characters from the
(b) the conversion of characters input from the terminal); "raw input queue to the "cooked" input
during input and output for queue (after processing "erase" and
upper case only terminals; "kill" characters and, in the case of
upper case only terminals, processing
(c) the insertion of delays after wflushtty (8217) "escaped" characters, i.e. characters
various special characters such preceded by the character ','). "canon"
as "carriage return". This procedure waits until the queue of returns a non-zero value if the
characters for a terminal is empty "cooked" input queue is no longer
(because they"ve all been sent!) and empty.
The routines "gtty" (8165), "stty" then calls "flushtty" to clean up the
(8183), "sgtty" (82"'1) and "ttystty" input queues. 8284: If the number of delimiters in
(8577) were dealt within the previous the "raw" input queue is zero
chapter. then •••

UNIX Operating System 25-1 The File "tty.c"


8285: if the terminal is logically cycle; practical use of UNIX) are still not
inactive, then just return; (If this character occurs at immediately apparent. Since
the beginning of a line, then "maptab[cl" is zero for "c == '\'"
8286: otherwise go to "sleep". subsequently "ttread" (8544) will (octal value of 134), all backslashes
find no characters in the get copied into "canonb". A single
"cooked" input queue i.e. it will backslash will be subsequently over-
Note that delimiters in this context read a zero length record, which written if the following character is
are characters of all ones (octal value then leads to the program receiv- to be asserted (as in the case of '#'
is 377) and are inserted by "ttyinput" ing the normal "end of file" or '@' or eot (004), or if the case of
(8358) • indication.) an alphabetic character is to be
changed for an upper case only terminal
8291: Set "bp" to point to the third
character of the work array,
"canonb"; Previous character was a backslash
ttyinput (8333)
8292: Begin a loop (extending to line 8309: If "maptab[cl" is non-zero, and
8318) which removes one character either "maptab[cl c" or the "canon" removes characters from the
from the "raw" queue per cycle; terminal is upper case only, then "raw" input queue. They are put there
in the first place by "ttyinput" which
8293: If the character is a delimiter, is called by "klrint" (8087) whenever
reduce the delimiter count by one 8310: if the last character but one was an input character is received from the
and exit the loop i.e. go to line not a backslash ('\'), then hardware controller.
8319; replace "c" by "maptab[cl" and
back up "bp" (so that the
8297: If the terminal is not operating backslash will be erased). The parameters passed to "tty input" are
in "raw" mode '" a character and a reference to a "tty"
structure.
8298: If the previous character (note
the "bp[-ll" notation!) was not a Character ready 8342: If the character is a "carriage
backslash, ' \ ' , execute the code return" and the terminal operates
from line 8299 to 8307, otherwise 8315: Move "c" into the next character with a "carriage return" only
execute the code beginning at in "canonb", and if this array is (instead of a "carriage return"
line 8309. now full, leave the loop. "line feed" pair) change the
character to a "new line";

Previous character was not a backslash 8344: If the terminal is not operating
line completed in "raw" mode and the character
8299: If the character is an "erase" is a "quit" or "delete" (7958)
and .•• 8319: At this point, an input line has then call "signal" (3949) to send
been assembled in the array a software interrupt to every
8300: if there is at least one charac- "canonb"; process which has the terminal as
ter to erase, backup the pointer its controlling terminal, flush
"bp"; 8322: Shift the contents of "canonb" all the queues associated with
into the "cooked" input queue, the terminal, and return;
8302: Start on the next cycle of the and return a "successful" result.
loop beginning at line 8292; 8349: If the "raw" input queue has
grown excessively large, flush
8304: If the character is a "kill", Notes all the queues for the terminal
throwaway all the characters and return. (This may seem a
accumulated for the current line, (A) The reason why "bp" starts (8291) trifle harsh at first sight but
by going back to line 8290; at the third character of "canonb" can it will usually be what is
be found on line 8310. required.);
8306: If the character is an "eot"
(004) (usually generated at the 8353: If the terminal has a limited
terminal as "control-O"), ignore (B) A number of subtleties in the han- character set, and the character
it (and do not put it into dling of backslashes (which the reader is an upper case alphabetic,
"canonb") and start on the next will no doubt have encountered in his translate it into lower case;

UNIX Operating System 25-2 The File "tty.c"


8355: Insert the character into the reset until the file is closed: "ttrstrt n calls nttstart n again,
"raw· input queue: and that the manipulation of the
8563: Go to sleep. In the meanwhile the "TIMEOUT n flag (8524, 8491) will
8356: I f the terminal is operating in interrupt handler will be drain- ensure that if another execution
"raw· mode, or the character was ing characters from the output of nttstart" is initiated in the
a "new line" or neot" then ... queue and sending them down the interim, on behalf of the same
line to the terminal: terminal, it will (8518) return
8357: ·wakeup" any process waiting for without doing anything.
input from the terminal, place a 8566: Call "ttyoutput· to insert the
delimiter character (all ones) character in the output queue and
also in the "raw· queue and arrange to have it transmitted:
increment the delimiter count. ttrstrt (8486)
Note this is one point where pos- 8568: Call "ttstart" again, for luck.
sible failure of "putc" (when See the comment above for line 8524.
there is no buffer space) is
explicitly recognised. A failure
occurring here would explain why ttstart (85~5)
the test on line 8316 may some- ttyoutput (8373)
times succeed. This procedure is called whenever it
seems reasonable to try and send the This procedure has more comments in the
8361: Finally, if the input character next character to the terminal. It source code and hence requires less
is to be echoed i.e. the terminal often achieves nothing useful. explanation than some others. Note the
is running in full duplex mode, use of recursion (8392) to generate a
insert a copy of the character 8514: See the comment on line 8499. string of blanks in place of a tab
into the output queue, and and This code is not relevant here: character. Other recursive calls are
arrange to have it transmitted on lines 84~3 and 8413.
("ttstart") back to the terminal. 8518: If the controller is not ready
(i.e. bit 7 of the transmitter
status register is not set) or
the necessary delay following the Terminals with ~ restricted character
previous character has not yet set
Character Output elapsed, do nothing:
84~~: "colp" points to a string of
ttwrite (855~) 852~: Remove a character from the out- pairs of characters. If the char-
put queue. If DC" is positive, acter to be output matches the
This procedure is called via "klwrite" the queue was not empty (as second character of ~ny of these
(8~67) when output is to be sent to the expected) ••• pairs, the character 1S replaced
terminal. by a backslash followed by the
8521: If DC" is less than "~177" it is first character of the pair.
8556: If the terminal is logically a character to be transmitted
inactive, do nothing: 84~7: Lower case alphabetics are con-
8522: After setting the parity bit from verted to upper case alphabetics
8558: Loop for each character to be the corresponding element of the by the addition of a constant.
transmitted array "partab", write "c" to the
transmitter data buffer register
856~: While there are still an adequate to initiate the hardware opera- Note. The conversion here should be
number of characters queued for tion: compared with the handling of the
transmission to the terminal ••. reverse problem on input. Here we have
8524: Otherwise ("c" > ~177) the char- an algorithm which clearly trades space
8561: call "ttstart" just in case it is acter was inserted in the output (no table analogous to "maptab") for
time to send another character to queue to signal a delay. Call time (a serial search through the
the terminal: "timeout" (3845) to make an entry string on line 84~~). A space conserv-
in the "callout" list. The ing approach could be adopted in
8562: Setting the "ASLEEP" flag here result of this will be to ini- "canon" but the problem is rather more
(also in "wflushtty" (8224)) is tiate an execution of "ttrstrt" complicated there.
rather pointless since it is (8486) after "c & ~177" clock
never interrogated and never ticks. It will be seen that

UNIX Operating System 25-3 The File "tty.c·


8414: Insert the character into the incorporates buffer storage and Before leaving the file "tty.c", there
output queue. If perchance, has a double speed "catch up" are two matters which deserve further
"putc" fails for lack of buffer print mode; examination.
space, don't worry about insert-
ing any subsequent delay, or 8451: (Case 4) Horizontal tab. Assign
updating the system's idea of the the value of bits 10, 11 of
current printing column; nt_flags" to "ctype"; A. The test for 'TTLOWAT' (Line 8074)
8423: Set "colp" to point to the 8453: For the only non-trivial case On line 8074 in "klxint", a test is
nt_col" character of the "tty" recognised ("c" == 1 or Model 37 made whether to restart any processes
structure, i. e. "*colp" has a teletype), calculate the the waiting to send output to the terminal.
value which is the ordinal number number of positions to the next The test is successful if the number of
of the column which has just been tab stop (via the obscure calcu- characters is zeio or if it is equal to
printed; lation of line 8454). I f this "TTLOWAT".
turns out to be four columns or
8424: Set "ctype" to the element of less, take it as zero; •
"par tab" corresponding to the If the number of characters is between
output character nco; 8458: Round "*colp" (Le. the value these values, no "wakeup" is performed
pointed to by "colp"!) to the until the queue is completely empty,
8425: Clear nco; next multiple of 8 less one; with the strong likelihood that there
will then be a hiatus in the flow of
8426: Mask out the significant bits of 8459: Increment "*colp" to be an exact output to the terminal. Since tem-
"ctype" and use the result as the multiple of eight; porary interruptions to the flow of
"switch" index; output are quite frequently observed in
8462: (Case 5) vertical Motion. I f bit practice and represent a source of
8428: (Case 0) The common situation! 14 is set in "t flags", make the occasional irritation if nothing more,
Increment nt_col"; delay as long as- possible, i.e. one may reasonably enquire "is there
0177 or 127 clock ticks, i.e. any way the character count can get
8431: (Case 1) Non-printing characters. just over two seconds; from being greater than "TTLOWAT" to
This group consists of the first, below it, without this being detected
third and fourth octet of the 8467: (Case 6) Carriage Return. Assign at line 8074?"
ASCII character set, plus "so" the value of bits 12, 13 of
(016), "sin (017) and "del" "t_flags" to "ctype";
(0177). Don't increment "t_col"; Quite clearly there is, since each call
8469: For the first class, allow a on "ttstart" can decrement the queue
8434: (Case 2) Backspace. Decrement delay of five clock ticks; size, and only one such call is fol-
"t col" unless it is already lowed by the test. Thus if the calIon
zero; 8472: For the second class, allow a "ttstart" from one of "ttrstrt" (8492)
delay of ten clock ticks; or "ttwrite" (8568) happens to cross
8439: (Case 3) Newline. Obviously the boundary, a delay will result. The
"t col" should be set to zero. 8475: Set the "*colp" (the last column probability that this will happen is
The main problem is to calculate pr inted) to zero. small, but finite and hence the event
the delay which should ensue is likely to be observed in any reason-
before another character is sent. ably long output sequence.
For a Model 37 teletype, this
depends on how far the print There are two other situations in which
mechanism has progressed across "ttstart" is called which seem to be
the page. The value chosen is at satisfactory. At "ttwrite" (8561) the
least a tenth of a second (six queue is at its maximum extent; and at
clock ticks) and may be as much "ttyinput" (8363) there is a preceding
as ( (132/16) + 3) /60 0.19 calIon "ttyoutput" which usually (but
seconds. not invariably!) will have added a
character to the output queue.
For a VT05, the delay is 0.1
second. For a DECwriter it is
zero because the terminal

UNIX Operating System 25-4 The File "tty.c·


~. Inactive Terminals

When the last special f.ile for a termi-


nal is closed, "klclose" (8055) is
called and resets (8059) the "ISOPEN"
and ·CARR ON n flags. However the "read
enable" bit of the receiver control
status register is not reset, so that
incoming characters may still be
received and will be stored away (8087)
in the terminal's "raw" input queue by
nklrint" (8078), and nttyinput" (8333),
which do not test the "CARR ON" flag,
to see if the terminal is logically
connected.

These characters may accumulate for a


long time and clog up the character
buffer storage. Only when the nraw"
input queue reaches 256 characters
("TTYHOG", 8349) will the contents of
this queue be thrown away. It does seem
therefore, that a statement to disable
reader interrupts should be included in
nklclose n before line 8058.

-000-

well, that'§. all, folks .••


Now that you, oh lo.~g-suffer ing,
exhausted reader have reached this
point, you will have. ,not-roubl,e in.
disposing of the last 'remaining fiie,
Amem.C n (Sheet 9il). And on ·this note,
we end this discussion of the UNIX
Operating System Source Code.

Of course there are lots more device


drivers for your patient examination,
and in truth the whole UNIX Time-
sharing System Source Code has hardly
been scratched. So this is not really

THE END
..

UNIX Operating System 25-5 The File ntty.c n


Section One that the 367 word area which is pro-
vided is adequate.
1.1 Devise changes to "malloc" (2528)
to-implement the Best Fit algorithm.
1.9 If main memory consists of several
Independent memory modules and one of
1.2 Rewrite the procedure "mfree" these, not the last, is down, "main"
(2556) to render its function more will not include memory modules beyond
easily discernible by the reader. the one which is down, in the list of
available space in "coremap". Devise
some simple changes to ~main~ to handle
1.3 Investigate the adequacy of the this situation. What other parts of the
sizes of the arrays "coremap" and system would also need revision?
"swapmap" (0203, 0204). How should
"CMAPSIZ" and "SMAPSIZ" change when
"NPROC" is increased? lollilRewrite the routines "estabur"
(1650) and "sureg" (1739) so that they
will work as efficiently as possible on
1.4 Prove that "malloc" and "mfree" the PDPll/40. How often are these rou-
JoIntly solve the memory allocation tines used in practice? Would it really
problem correctly. be worthwhile trying to implement your
improved versions?
1.5 By monitoring the contents of
"coremap", estimate the efficiency with ~.ll Investigate the overheads involved
which main memory is utilised. Esti- ln initiating a new process. Perform a
mate also the cost of compacting "in series of measurements for a set of
use areas" of main memory from time to different sized programs under dif-
time to reduce memory fragmentation. ferent conditions.
Hence decide whether it would be
CHAPTER TWENTY-SIX worthwhile to extend the present memory
allocation scheme to include memory 1.12 Evaluate the following scheme
Suggested Exercises compaction. which is intended by Ken Thompson as
the basis for a revised scheduling
algorithm:
l.~ In setting the first six kernel A number "pH is kept for each pro-
page description registers, UNIX does cess, stored as "p cpu". "p" is incre-
Any operating system design involves not make use of all the hardware pro- mented by one every-clock tick that the
many sUbjective and ad hoc judgements tection features that are available process is found to be executing. "pH
on the part of system's designers. At e.g. some pages which contain only pure therefore accumulates the CPU usage.
many places in the UNIX source code, text could be made read-only. Devise Every second, each value of Up" is
you will find yourself wondering "Why changes to the code to maximise the use replaced by four fifths of its value
did they do it that way?", "What would of the available hardware protection. rounded to the nearest integer. This
happen if I changed this?" means that "pH has values which are
bounded by zero and the solution of the
1.7 Compile the program equation { k = 0.8*(k + HZ) } i.e.
The following exercises express some of char *init "/etc/init": 4*HZ. Hence if HZ is 50 or 60, and "pH
these questions. Some can be answered main ( ) { is integerised, "pH can be stored in
from an examination of the source code execl (init, init, 0) 1 one byte.
alone after a study in more depth: oth- while (1):
ers require some experimental probing }
and measurement, for which read-only and compare the result with the con- 1.13 The "procH table is always
access to the file "/dev/kmem" via ter- tents of the array "icode" (1516). searched via a direct linear search. As
minal will prove invaluable: and still the table ·size is increased, the search
others really require the construction overheads also increase. Survey the
and testing of experimental versions of 1.8 Investigate the size required for alternatives for improving the search
the operating system. kernel mode stack areas. Hence show mechanism, when "NPROC" is say 300.

UNIX Operating System 26-1 Suggested Exercises


Section Two of the crash are lost. whenever the "read" pointer is greater
However if a core dump is taken, than 512, rotate the non-null block
1.1 Explain in detail how the system the contents of the buffers can be numbers in the "inode" and decrease
reacts to a floating point trap which obtained and hence the contents of the both the "read n and "write" pointers by
occurs when the processor is in kernel disk can be brought completely up to 512.
mode. date. Outline a detailed plan for car-
rying out this scheme. How effective
do you think it would be?
1.1 When a process dies, a "zombie" Section Five
record is written to disk, and is sub-
sequently read back by the parent. Dev- 3.4 Explain why the buffer areas 5.1 By monitoring the number of free
ise a scheme for passing back the declared on line 4720 are 514, and not buffers or otherwise, determine ,lhether
necessary information to the parent 512, characters long. the number of character buffers pro-
which will avoid the overhead of the vided at your installation is adequate.
two i/o operations.
3.5 Explain how deadlock situations may
irTse if there are too few "large" 5.2 Perform measurements and/or experi-
2.3 Document "backup" (1012). buffers available. What measures can ments to determine whether the charac-
you suggest to alleviate the problem, ter buffer blocks would be more effi-
assuming that increasing the number of ciently utilised if they consisted of
2.4 It is relatively easy using the buffers is not possible. four or eight characters, rather than
·shell" to set up a set of asynchronous six, per block.
processes which will flood your termi-
nal with useless output. Trying to stop
these processes individually can be a Section Four 5.3 Redesign the line printer driver to
problem, since their identifying handle overprinting and backspacing
numbers may not be known. Use of the 4.1 Devise a scheme for labelling file more efficiently in the sense of
command "kill 0" is usually an act of system volumes and checking these minimising the number of print cycles.
sheer desperation. Devise an alterna- labels when the volumes are mounted.
tive scheme, e.g. based on the use of
messages such as "kill -99", which will 5.4 Document "mmread" (0916) and
be effective, but more selective. 4.2 Discuss the problems of supporting "mmwrite" (9042).
ANSI standard labelled tapes under
UNIX, and propose a solution.
2.5 Design a form of coroutine jump
whTch will cause control to pass more General
efficiently between a program which is 4.3 Design a scheme for providing index
being traced, and its parent. sequential access to files. 6.1 The easiest way to vary the main
memory space used by the operating sys-
tern is to vary "NBUF". If this is for-
4.4 The emergence of the "sticky bit" bidden, propose the best way to:
Section Three 1s~e "CHMOD(I)" in the UPM) confirms
that there are some residual advantages (a) reduce the space required by 500
3.1 Rewrite the procedure "sched" to in allocating all the space for a file words;
avoid the use of "goto" statements. contiguously. Discuss the merits of
making "contiguous files" more gen- (b) utilise an additional 500 words.
erally available.
1.1 Modify "sched" so that the text
segment and data segment for a program 6.2 Discuss the merits of "C" as a sys-
will possibly be allocated in separate 4.5 Devise a technique to measure the tems programming language. What
main memory areas if a single large ~fficiency of pipes. Apply the tech- features are missing? or superfluous?
area is not immediately available. nique and report your results.
-000-

3.3 If the system crashes and must be 4.6 Devise modifications to "pipe.c"
Wribooted" the contents of the buffers which will make pipes more efficient
which were not written out at the time according to the following scheme:

UNIX Operating System 26-2 Suggested Exercises


Procedure Index

Procedure Page Procedure Page Procedure Page

access (6746) 19-4 iput (7344) 29-5 readi (6221) 18-5


al10c (6956) 29-4 issig (3991) 13-3 readp (7758) 21-1
aretu (9734 ) 8-2 itrunc (7414) 29-4 retu (9749) 8-2
iupdat (7374) 29-5 rexit (3295) 13-4
bawrite (4856) 17-3 rkaddr (5429) 16-2
bdwrite (4836) 17-3 kill (3639) 13-2 rkintr (5451) 16-2
bflush (5229) 17-3 klclose (8955) 24-4 rkstrategy (5389) 16-2
binit (5955) 17-2 klopen (8923) 24-4
bmap (6415) 18-6 klrint (8978) 24-4 savu (9725) 8-2
bread (4754) 17-3 klsgtty (8999) 24-2 sbreak (3354) 12-4
breada (4773) 17-3 klxint (8979) 24-4 sched (1949) 6-4
~
brelse (4869) 17-2 sched (19411l) 14-2
bwrite (4899) 17-3 link (5999) 19-3 setpri (2156) 8-2
Ipcanon (8879) 22-3 setrun (2134) 8-3
call (9776) 19-2 Ipclose (8863) 22-3 sgtty (8291) 24-2
canon (8274) 25-1 Ipint (8976) 22-2 signal (3949) 13-3
cinit (8234) 23-2 Ipopen (8859) 22-1 sleep (2966) 6-4
clock (3725) 11-1 Ipoutput (8986) 22-2 sleep (2966) 8-3
close (5846) 18-3 Ips tart (8967) 22-2 smount (6986) 29-2
closef (6643) 18-4 Ipwrite (88711l) 22-3 ssig (3614) 13-2
clrbuf (5938) 17-1 start (9612) 6-1
core (4994) 13-3 main (15513) 6-2 stop (4916) 13-5
creat (5781) 18-3 main revisited 6-5 stty (8183) 24-2
maknode (7455) 19-4 sumount (6144) 29-4
deverror (2447) 5-6 mal10c (2528) 5-2 sureg (1739) 7-4
devstart (5996) 16-2 map tab (8117) 23-4 swap (5196) 15-2
devtab (4551) 15-1 mfree (2556) 5-3 swtch (2178) 6-4
mknod (5952) 19-4 swtch revisited 8-4
estabur (1650) 7-4 swtch (2178) 8-2
exec (39211l) 12-3 namei (7518) 19-1
exit (3219) 13-4 newproc (1826) 7-5 timeout (3845) 11-2
expand (2268) 8-3 trap (2693) 12-1
open (5763) 18-3 ttread (8535) 25-1
file (5597) 18-2 openl (5894) 18-3 ttrstrt (8486) 25-3
flushtty (8252) 25-1 open 1 revisited 18-3 ttstart (8595) 25-3
fork (3322) 12-4 ttwrite (85511l) 25-3
free (71399) 29-5 panic (2419) 5-5 tty input (8333) 25-2
fuibyte (9814) 19-1 par tab (7947) 23-4 ttyoutput (8373) 25-3
fuiword (1Il844) 19-1 physio (5259) 17-3 ttystty (8577) 24-2
pipe (7723) 21-1
getblk (4921) 17-1 plock (7862) 21-1 unlink (3519) 19-4
getc (9939) 23-2 prdev (2433) 5-6 update (7291) 29-3
getfs (7167) 29-3 prele (7882) 21-1
grow (4136) 13-3 printf (23411l) 5-3 wait (3279) 13-5
• printn (2369) 5-4 wait (3279) 13-4
ifree (7134) 211l-5 procxmt (4294) 13-6 wakeup (2113) 8-3
iget (7276) 29-2 psig (4943) 13-3 wdir (7477) 19-3
iinit (6922) 20-2 psignal (3963) 13-3 wflushtty (8217) 25-1
incore (4899) 17-1 ptrace (4164) 13-5 writep (7805) 21-1
inode (5659) 18-2 putc (0967) 23-2
iodone (5918) 16-2 putchar (2386) 5-4 xalloc (4433) 14-3
iomove (6364) 18-6 xfree (4398) 14-3
iput (7344) 18-4 rdwr (5731) 18-5 xswap (4368) 14-3
Source Code Line Index Page 1

Line Page Line Page Line Page Line Page Line Page

9512 19-1 fuiword 0844 1I~-1 1615 6-3 sched 1940 14-2 2189 8-2
0518 10-3 0846 10-1 1627 6-4 1958 6-4 2193 6-4
0570 10-2 0848 19-1 1627 6-5 1958 14-2 2193 8-2
0852 10-1 1628 6-5 1960 6-4 2195 8-2
start 0612 6-1 0853 10-1 1629 6-5 1966 6-4 2196 8-2
0613 6-1 0854 10-1 1630 6-5 1966 14-2 2201 6-4
• 0615 6-1 0855 19-1 1635 6-5 1968 6-4 2218 6-4
0619 6-1 0856 10-1 1637 6-4 1976 14-2 2224 8-2
0632 6-2 0857 10-1 1982 14-2 2228 6-5
0634 6-2 0876 19-1 estabur 1650 7-4 1990 14-2 2228 8-2
• 0641 6-2 0878 10-1 1654 7-4 21:'03 14-2 2228 8-4
0646 6-2 0880 10-2 1664 7-4 2005 14-2 2229 6-5
0647 6-2 1667 7-4 2013 14-2 2229 8-2
0649 6-2 getc 9930 23-2 1672 7-4 2022 14-2 2230 8-2
0668 6-2 9931 23-2 1677 7-4 2932 14-2 2240 6-5
0669 6-2 9934 23-2 1682 7-4 2042 14-2 2249 8-4
9936 23-2 1703 7-4 2944 14-2 2242 8-4
savu 0725 8-2 0937 23-2 1711 7-4 2247 6-5
9938 23-2 1714 7-4 sleep 2066 6-4 2247 8-2
aretu 0734 8-2 0939 23-2
0940 23-2 sureg 1739 7-4 sleep 2066 8-3 expand 2268 8-3
retu 0749 8-2 0941 23-2 1743 7-5 2079 6-4 2277 8-3
0942 23-2 1744 7-5 2970 8-3 2281 8-3
9756 10-1 0947 23-2 1752 7-5 2071 6-4 2283 8-3
9756 10-3 9949 23-2 1754 7-5 2072 6-4 2284 8-4
9757 10-1 0959 23-2 1762 7-5 2072 8-3 2285 8-4
0757 10-3 9952 23-2 2075 8-3 2286 8-4
0759 10-3 9953 23-2 newproc 1826 7-5 2989 8-3 2287 8-4
0762 19-3 0954 23-2 1841 7-5 2084 8-3
0765 10-2 0957 23-2 1846 7-5 2087 8-3 printf 2340 5-3
0766 19-2 9961 23-2 1860 7-5 2093 6-4 2341 5-4
0767 10-2 0962 23-2 1861 7-5 2346 5-4
0772 10-3 1876 7-5 wakeup 2113 8-3 234.8 5-4
9773 10-3 putc 0967 23-2 1879 7-5 2349 5-4
0774 10-3 1883 7-5 setrun 2134 8-3 2350 5-4
1421 10-3 1889 7-5 2149 8-3 2351 5-4
call 0776 19-2 1422 10-3 1890 7-5 2141 8-3 2353 5-4
0777 10-2 1896 7-5 2143 8-3 2354 5-4
9779 10-2 main 1550 6-2 1992 7-5 2356 5-4
0789 10-2 1903 7-5 setpri 2156 8-2 2361 5-4
0781 10-2 main 1559 6-5 1904 7-5 2161 8-2 2362 5-4
0783 10-2 1559 6-2 1905 7-5 2165 8-3
0788 10-2 1569 6-2 1996 7-5 printn 2369 5-4
• 0799 10-2 1562 6-2 1997 7-5 swtch 2178 6-4 2375 5-4
0800 10-2 1582 6-3 1908 7-5
0802 10-2 1583 6-3 1913 7-5 swtch 2178 8-2 putchar 2386 5-4
0803 10-2 1589 6-3 1917 7-5 2391 5-4
0804 10-2 1599 6-3 1918 7-5 swtch 2178 8-4 2393 5-5
0805 19-2 1697 6-3 2184 6-4 2395 5-5
1613 6-3 sched 1940 6-4 2184 8-2 2397 5-5
fuibyte 9814 10-1 1614 6-3 2189 6-4 2398 5-5
Source Code Line Index Page 2

Line Page Line Page Line Page Line Page Line Page

2399 5-5 2776 12-3 3301 13-5 3801 11-2 4168 13-5
2400 5-5 2793 12-1 3302 13-5 3803 11-2 4172 13-6
2405 5-5 2794 12-2 3303 13-5 3804 11-2 4181 13-6
2406 5-5 2796 12-2 3309 13-5 3806 11-2 4183 13-6
2810 12-2 3313 13-4 3810 11-2 4187 13-6
panic 2419 5-5 2818 12-2 3820 11-2 4188 13-6 •
2420 5-5 2821 12-2 fork 3322 12-4 3824 11-2 4189 13-6
2421 5-5 2822 12-2 3335 12-4 4191 13-6
2422 5-5 2823 12-2 3344 12-4 timeout 3845 11-2
procxmt 4204 13-6
prdev 2433 5-6 exec 3020 12-3 sbreak 3354 12-4 signal 3949 13-3 4209 13-6
3034 12-3 3364 12-4 4211 13-6
deverror 2447 5-6 3037 12-3 3371 12-4 psignal 3963 13-3
3040 12-3 3376 12-4 3966 13-3 xswap 4368 14-3
malloc 2528 5-2 3041 12-3 3386 12-4 3971 13-3 4373 14-3
2534 5-2 3052 12-3 3973 13-3 4375 14-3
2535 5-2 3064 12-3 unlink 3510 19-4 3975 13-3 4378 14-3
2536 5-2 3071 12-3 3515 19-4 4379 14-3
2537 5-2 3090 12-3 3518 19-4 issig 3991 13-3 4382 14-3
2538 5-2 3095 12-3 3519 19-4 3997 13-3 4388 14-3
2539 5-2 3105 12-3 3522 19-4 3998 13-3
2542 5-2 3127 12-3 3528 19-4 4000 13-3 xfree 4398 14-3
2543 5-2 3129 12-4 3529 19-4 4003 13-3 4402 14-3
3130 12-4 4006 13-3 4403 14-3
mfree 2556 5-3 3158 12-4 ssig 3614 13-2 4406 14-3
2564 5-3 3186 12-4 3619 13-2 stop 4016 13-5 4408 14-3
2565 5-3 3194 12-4 3623 13-2 4022 13-5 4411 14-3
2566 5-3 3195 12-4 3624 13-2 4023 13-5
2567 5-3 3196 12-4 3625 13-2 4028 13-5 xal10c 4433 14-3
2568 5-3 4029 13-5 4439 14-3
2569 5-3 rexit 3205 13-4 kill 3630 13-2 4441 14-3
2576 5-3 3637 13-2 psig 4043 13-3 4452 14-3
2579 5-3 exit 3219 13-4 3639 13-2 4054 13-3 4459 14-3
3224 13-4 3649 13-2 4055 13-3 4460 14-3
trap 2693 12-1 3225 13-4 4056 13-3 4461 14-3
2698 12-1 3227 13-4 clock 3725 11-1 4057 13-3 4462 14-3
2700 12-1 3232 13-4 3741il 11-1 41il66 13-3 4463 14-3
2701 12-1 3233 13-4 3743 11-1 4079 13-3 4464 14-3
2702 12-1 3234 13-4 3748 11-1 4080 13-3 4467 14-3
2716 12-1 3237 13-4 3750 11-1 4473 14-3
2719 12-1 3238 13-4 3759 11-1 core 4094 13-3 4475 14-3
2721 12-2 3239 13-4 3766 11-1
2723 12-2 3241 13-4 3767 11-1 grow 4136 13-3 devtab 4551 15-1
2733 12-2 3243 13-4 3773 11-1 4141 13-3

2752 12-2 3245 13-4 3787 11-1 4143 13-3 bread 4754 17-3
2754 12-2 3788 11-1 4144 13-3
2755 12-3 wait 3271il 13-4 3792 11-1 4146 13-4 breada 4773 17-3
2760 12-3 3277 13-4 3795 11-2 4148 13-4 4780 17-3
2765 12-3 3280 13-4 3797 11-2 4156 13-4 4781 17-3
2770 12-3 3298 13-4 3798 11-2 4788 17-3
2771 12-3 3300 13-4 3800 11-2 ptrace 4164 13-5 4793 17-3
Source Code Line Index Page 3

Line Page Line Page Line Page Line Page Line Page

4798 17-3 5397 16-2 5917 19-3 6318 18-6 7916 20-5
4803 17-3 5399 16-2 5921 19-3 7919 23-5
5437 16-2 5926 19-3 iomove 6364 18-6 UJ25 23-5
bwrite 4839 17-3 5414 16-2 5927 19-3
4823 17-3 5933 19-3 bmap 6415 18-6 ifree 7134 23-5
4823 17-3 rkaddr 5423 16-2 5935 19-3 6423 18-6
5943 19-3 6427 18-6 getfs 7167 23-3
bdwrite 4836 17-3 rkintr 5451 16-2 5941 19-3 6431 18-6
4844 17-3 5455 16-2 6435 18-6 update 7231 23-3
4847 17-3 5459 16-2 mknod 5952 19-4 6438 18-6 7237 23-3
5463 16-2 6442 18-6 7213 23-3
bawrite 4856 17-3 5461 16-2 smount 6086 23-2 6448 18-6 7211 23-3
5462 16-2 6393 23-2 6456 18-6 7213 23-3
bre1se 4869 17-2 5463 16-2 6396 23-2 7217 23-3
5469 16-2 6103 23-2 c10sef 6643 18-4 7223 23-3
incore 4899 17-1 5472 16-2 6133 23-2 6649 18-4 7229 23-3
6111 23-2 6655 18-4 7233 20-3
getb1k 4921 17-1 file 5537 18-2 6113 23-2 6657 18-4
4943 17-2 6116 23-2 iget 7276 20-2
4953 17-2 inode 5659 18-2 6124 23-2 access 6746 19-4 7285 23-2
4963 17-2 6133 23-2 6753 19-4 7286 20-3
4966 17-2 rdwr 5731 18-5 6763 19-4 7287 23-3
4967 17-2 5736 18-5 sumount 6144 23-4 6769 19-4 7290 20-3
5739 18-5 6154 23-4 6774 19-4 7292 20-3
iodone 5018 16-2 5743 18-5 6161 20-4 7293 20-3
5746 18-5 6168 20-4 iinit 6922 20-2 7302 23-3
c1rbuf 5038 17-1 5755 18-5 6926 23-2 7336 23-3
5756 18-5 readi 6221 18-5 6931 23-2 7339 23-3
binit 5055 17-2 6233 18-5 6933 23-2 7314 23-3
open 5763 18-3 6232 18-5 6936 20-2 7319 23-3
devstart 5396 16-2 6233 18-5 6938 23-2 7326 23-3
creat 5781 18-3 6238 18-5 6939 23-2 7328 23-3
swap 5196 15-2 5786 18-3 6239 18-5
5233 15-2 5787 18-3 6243 18-5 a110c 6956 23-4 iput 7344 18-4
5232 15-2 5788 18-3 6241 18-5 6961 23-4
5236 15-2 5793 18-3 6242 18-5 6962 23-4 iput 7344 23-5
5237 15-2 6243 18-5 6967 20-4 7353 18-4
5238 15-2 open1 5834 18-3 6246 18-5 6968 23-4 7352 18-4
5213 15-2 5812 18-3 6248 18-5 6970 23-4 7357 18-4
5212 15-2 5813 18-3 6253 18-5 6971 20-4 7358 18-4
5213 15-2 5824 18-3 6251 18-5 6972 23-4
5214 15-2 5826 18-3 6252 18-5 6975 23-4 iupdat 7374 23-5
5216 15-2 5827 18-3 6253 18-5 6978 23-4 7383 23-5
5218 15-2 5831 18-3 6255 18-5 6982 23-4 7386 23-5
5219 15-2 5832 18-3 6256 18-5 6983 23-4 7389 23-5
5839 18-3 6258 18-5 7391 23-5
bflush 5229 17-3 6263 18-6 free 7333 23-5 7396 23-3
" close 5846 18-3 6261 18-6 7335 23-5 7396 23-5
physio 5259 17-3 6303 18-6 7336 23-5 7433 23-5
link 5939 19-3 6311 18-6 7013 20-5
rkstrategy 5389 16-2 5914 19-3 6312 18-6 7014 23-5 itrunc 7414 23-4
Source Code Line Index Page 4

Line Page Line Page Line Page Line page Line Page

7421 2t:!-4 7799 21-1 cinit 8234 23-2 8424 25-4 8858 22-2
7423 2t:!-4 8239 23-2 8425 25-4 8859 22-2
7425 2t:!-4 writep 78t:!5 21-1 824t:! 23-2 8426 25-4
7427 2t:!-5 7828 21-1 8241 23-2 8428 25-4 Ipc10se 8863 22-3
7438 20-5 7835 21-1 8242 23-2 8431 25-4
7439 2t:!-5 8244 23-2 8434 25-4 1pwrite 8870 22-3
7443 20-5 plock 7862 21-1 8439 25-4
7445 2t:!-5 flushtty 8252 25-1 8451 25-4 1pcanon 8879 22-3
pre1e 7882 21-1 8258 25-1 8453 25-4 8884 22-3
maknode 7455 19-4 8259 25-1 8458 25-4 8885 22-3
par tab 7947 23-4 826t:! 25-1 8459 25-4 8887 22-3
wdir 7477 19-3 8261 25-1 8462 25-4 8909 22-4
k10pen 8923 24-4 8263 25-1 8467 25-4 8919 22-4
namei 7518 1"9-1 8926 24-4 8264 25-1 8469 25-4 8911 22-4
7531 19-1 893t:! 24-4 8265 25-1 8472 25-4 8915 22-4
7532 19-1 8931 24-4 8475 25-4 8917 22-4
7534 19-2 8t:!33 24-4 canon 8274 25-1 8921 22-4
7535 19-2 8939 24-4 8284 25-1 ttrstrt 8486 25-3 8925 22-4
7537 19-2 8945 24-4 8285 25-2 8926 22-4
7542 19-2 8946 24-4 8286 25-2 ttstart 85t:!5 25-3 8927 22-4
755t:! 19-2 8t:!47 24-4 8291 25-2 8514 25-3 8929 22-4
7563 19-2 8t:!48 24-4 8292 25-2 8518 25-3 8934 22-4
757t:! 19-2 8051 24-4 8293 25-2 852t:! 25-3 8949 22-4
7589 19-2 8t:!52 24-4 8297 25-2 8521 25-3 895t:! 22-4
7592 19-2 8298 25-2 8522 25-3 8954 22-4
7699 19-2 k1close 8955 24-4 8299 25-2 8524 25-3 8959 22-4
7606 19-2 8t:!57 24"':4 8300 25-2
76t:!7 19-2 8058 24-4 8392 25-2 ttr:ead 8535 25-1 1pstart 8967 22-2
7622 19-2 8959 24-4 8394 25-2 8541 25-1
7636 19-2 8396 25-2 8543 25-1 Ipint 8976- 22-2
7645 19-2 klxint 8979 24-4 8399 25-2 8544 25-1 8989 22-3
7647 19-2 83HJ 25-2 8981 22-3
7657 19-2 klrint 8978 24-4 8315 25-2 ttwrite 855t:! 25-3
7662 19-2 8983 24-4 8319 25-2 8556 25-3 Ipoutput 8986 22-2
7664 19-2 8984 24-4 8322 25-2 8558 25-3 8988 22-2
7665 19-2 8985 24-4 856t:! 25-3 899t:! 22-2
8986 24-4 ttyinput 8333 25-2 8561 25-3 8991 22-2
pipe 7723 21-1 8342 25-2 8562 25-3
7728 21-1 klsgtty 8099 24-2 8344 25-2 8563 25-3
7731 21-1 8349 25-2 8566 25-3
7736 21-1 map tab 8117 23-4 8353 2·5-3 8568 25-3
7744 21-1 8355 25-3
7746 21-1 stty 8183 24-2 8356 25-3 ttystty 8577 24-2
8357 25-3 8589 24-2
readp 7758 21-1 sgtty 8201 24-2 8361 25-3 8591 24-2
7763 21-1 8286 24-2 8592 24-2
7768 21-1 8209 24-2 ttyoutput 8373 25-3 8593 24-2 ,.
7776- 21-1 8213 24-2 8488 25-3
7786 21-1 8487 25-3 1popen 885H 22-1
7787 21-1 wflushtty 8217 25-1 8414 25-4 8853 22-2
7789 21-1 8423 25-4 8857 22-2

You might also like