Pintos
Pintos
by Ben Pfa
Short Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Project 1: Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3 Project 2: User Programs . . . . . . . . . . . . . . . . . . . . . . . . . 22 4 Project 3: Virtual Memory . . . . . . . . . . . . . . . . . . . . . . . . 39 5 Project 4: File Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 50 A Reference Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 B 4.4BSD Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 C Coding Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 D Project Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . 99 E Debugging Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 F Development Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 G Installing Pintos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
ii
Table of Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Source Tree Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 Building Pintos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.3 Running Pintos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.4 Debugging versus Testing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Grading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2.1 Design Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2.2 Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Legal and Ethical Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Trivia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 3 4 5 5 5 6 7 7 8 8
Project 1: Threads. . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.1 Understanding Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.2 Source Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.2.1 devices code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1.2.2 lib les . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.3 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.1.4 Development Suggestions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.1 Design Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.2 Alarm Clock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.3 Priority Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.4 Advanced Scheduler. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3 FAQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.1 Alarm Clock FAQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.2 Priority Scheduling FAQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.3 Advanced Scheduler FAQ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
iii 3.3 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Design Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Process Termination Messages . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 Argument Passing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.4 System Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.5 Denying Writes to Executables. . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 FAQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Argument Passing FAQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 System Calls FAQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 80x86 Calling Convention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Program Startup Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 System Call Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 28 29 29 29 32 33 34 35 35 36 37
iv 5.4.1 5.4.2 5.4.3 Indexed Files FAQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Subdirectories FAQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Buer Cache FAQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Appendix A
Reference Guide . . . . . . . . . . . . . . . 58
58 58 59 59 60 61 61 63 65 66 66 67 68 68 69 70 71 72 73 74 75 75 76 77 79 79 79 80 80 80 82 83 84 84 85 86 87 88 89 89
A.1 Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1.1 The Loader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1.2 Low-Level Kernel Initialization . . . . . . . . . . . . . . . . . . . . . . . . . A.1.3 High-Level Kernel Initialization . . . . . . . . . . . . . . . . . . . . . . . . A.1.4 Physical Memory Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.2 Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.2.1 struct thread . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.2.2 Thread Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.2.3 Thread Switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.3 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.3.1 Disabling Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.3.2 Semaphores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.3.3 Locks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.3.4 Monitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.3.4.1 Monitor Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.3.5 Optimization Barriers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.4 Interrupt Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.4.1 Interrupt Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.4.2 Internal Interrupt Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.4.3 External Interrupt Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . A.5 Memory Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.5.1 Page Allocator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.5.2 Block Allocator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.6 Virtual Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.7 Page Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.7.1 Creation, Destruction, and Activation. . . . . . . . . . . . . . . . . . . A.7.2 Inspection and Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.7.3 Accessed and Dirty Bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.7.4 Page Table Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.7.4.1 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.7.4.2 Page Table Entry Format . . . . . . . . . . . . . . . . . . . . . . . . . A.7.4.3 Page Directory Entry Format . . . . . . . . . . . . . . . . . . . . . . A.8 Hash Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.8.1 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.8.2 Basic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.8.3 Search Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.8.4 Iteration Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.8.5 Hash Table Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.8.6 Auxiliary Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.8.7 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Appendix B
B.1 B.2 B.3 B.4 B.5 B.6
4.4BSD Scheduler . . . . . . . . . . . . . . 91
91 91 92 93 93 94
Niceness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Calculating Priority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Calculating recent cpu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Calculating load avg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fixed-Point Real Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Appendix C
C.1 C.2 C.3
Coding Standards . . . . . . . . . . . . . . 96
Appendix D
D.1 D.2
Project Documentation . . . . . . . . 99
Appendix E
E.1 printf() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E.2 ASSERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E.3 Function and Parameter Attributes . . . . . . . . . . . . . . . . . . . . . . . . E.4 Backtraces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E.4.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E.5 GDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E.5.1 Using GDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E.5.2 Example GDB Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E.5.3 FAQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E.6 Triple Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E.7 Modifying Bochs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E.8 Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Appendix F
F.1 F.2 F.3 F.4
Appendix G
G.1
Bibliography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Hardware References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Software References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Operating System Design References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Chapter 1: Introduction
1 Introduction
Welcome to Pintos. Pintos is a simple operating system framework for the 80x86 architecture. It supports kernel threads, loading and running user programs, and a le system, but it implements all of these in a very simple way. In the Pintos projects, you and your project team will strengthen its support in all three of these areas. You will also add a virtual memory implementation. Pintos could, theoretically, run on a regular IBM-compatible PC. Unfortunately, it is impractical to supply every student a dedicated PC for use with Pintos. Therefore, we will run Pintos projects in a system simulator, that is, a program that simulates an 80x86 CPU and its peripheral devices accurately enough that unmodied operating systems and software can run under it. In class we will use the Bochs and QEMU simulators. Pintos has also been tested with VMware Player. These projects are hard. They have a reputation of taking a lot of time, and deservedly so. We will do what we can to reduce the workload, such as providing a lot of support material, but there is plenty of hard work that needs to be done. We welcome your feedback. If you have suggestions on how we can reduce the unnecessary overhead of assignments, cutting them down to the important underlying issues, please let us know. This chapter explains how to get started working with Pintos. You should read the entire chapter before you start work on any of the projects.
The term uname -m expands to a value such as x86_64 that indicates the type of computer youre logged into.
Chapter 1: Introduction
threads/ Source code for the base kernel, which you will modify starting in project 1. userprog/ Source code for the user program loader, which you will modify starting with project 2. vm/ filesys/ Source code for a basic le system. You will use this le system starting with project 2, but you will not modify it until project 4. devices/ Source code for I/O device interfacing: keyboard, timer, disk, etc. You will modify the timer implementation in project 1. Otherwise you should have no need to change this code. lib/ An implementation of a subset of the standard C library. The code in this directory is compiled into both the Pintos kernel and, starting from project 2, user programs that run under it. In both kernel code and user programs, headers in this directory can be included using the #include <...> notation. You should have little need to modify this code. An almost empty directory. You will implement virtual memory here in project 3.
lib/kernel/ Parts of the C library that are included only in the Pintos kernel. This also includes implementations of some data types that you are free to use in your kernel code: bitmaps, doubly linked lists, and hash tables. In the kernel, headers in this directory can be included using the #include <...> notation. lib/user/ Parts of the C library that are included only in Pintos user programs. In user programs, headers in this directory can be included using the #include <...> notation. tests/ Tests for each project. You can modify this code if it helps you test your submission, but we will replace it with the originals before we run the tests.
examples/ Example user programs for use starting with project 2. misc/ utils/ These les may come in handy if you decide to try working with Pintos on your own machine. Otherwise, you can ignore them.
Chapter 1: Introduction
Makefile A copy of pintos/src/Makefile.build. It describes how to build the kernel. See [Adding Source Files], page 17, for more information. kernel.o Object le for the entire kernel. This is the result of linking object les compiled from each individual kernel source le into a single object le. It contains debug information, so you can run GDB (see Section E.5 [GDB], page 105) or backtrace (see Section E.4 [Backtraces], page 103) on it. kernel.bin Memory image of the kernel, that is, the exact bytes loaded into memory to run the Pintos kernel. This is just kernel.o with debug information stripped out, which saves a lot of space, which in turn keeps the kernel from bumping up against a 512 kB size limit imposed by the kernel loaders design. loader.bin Memory image for the kernel loader, a small chunk of code written in assembly language that reads the kernel from disk into memory and starts it up. It is exactly 512 bytes long, a size xed by the PC BIOS. Subdirectories of build contain object les (.o) and dependency les (.d), both produced by the compiler. The dependency les tell make which source les need to be recompiled when other source or header les are changed.
Chapter 1: Introduction
You can log serial output to a le by redirecting at the command line, e.g. pintos run alarm-multiple > logfile. The pintos program oers several options for conguring the simulator or the virtual hardware. If you specify any options, they must precede the commands passed to the Pintos kernel and be separated from them by --, so that the whole command looks like pintos option ... -- argument .... Invoke pintos without any arguments to see a list of available options. Options can select a simulator to use: the default is Bochs, but --qemu selects QEMU. You can run the simulator with a debugger (see Section E.5 [GDB], page 105). You can set the amount of memory to give the VM. Finally, you can select how you want VM output to be displayed: use -v to turn o the VGA display, -t to use your terminal window as the VGA display instead of opening a new window (Bochs only), or -s to suppress serial input from stdin and output to stdout. The Pintos kernel has commands and options other than run. These are not very interesting for now, but you can see a list of them using -h, e.g. pintos -h.
Chapter 1: Introduction
The QEMU simulator is available as an alternative to Bochs (use --qemu when invoking pintos). The QEMU simulator is much faster than Bochs, but it only supports real-time simulation and does not have a reproducible mode.
1.2 Grading
We will grade your assignments based on test results and design quality, each of which comprises 50% of your grade.
1.2.1 Testing
Your test result grade will be based on our tests. Each project has several tests, each of which has a name beginning with tests. To completely test your submission, invoke make check from the project build directory. This will build and run each test and print a pass or fail message for each one. When a test fails, make check also prints some details of the reason for failure. After running all the tests, make check also prints a summary of the test results. For project 1, the tests will probably run faster in Bochs. For the rest of the projects, they will run much faster in QEMU. make check will select the faster simulator by default, but you can override its choice by specifying SIMULATOR=--bochs or SIMULATOR=--qemu on the make command line. You can also run individual tests one at a time. A given test t writes its output to t.output, then a script scores the output as pass or fail and writes the verdict to t.result. To run and grade a single test, make the .result le explicitly from the build directory, e.g. make tests/threads/alarm-multiple.result. If make says that the test result is up-to-date, but you want to re-run it anyway, either run make clean or delete the .output le by hand. By default, each test provides feedback only at completion, not during its run. If you prefer, you can observe the progress of each test by specifying VERBOSE=1 on the make command line, as in make check VERBOSE=1. You can also provide arbitrary options to the pintos run by the tests with PINTOSOPTS=..., e.g. make check PINTOSOPTS=-j 1 to select a jitter value of 1 (see Section 1.1.4 [Debugging versus Testing], page 4). All of the tests and related les are in pintos/src/tests. Before we test your submission, we will replace the contents of that directory by a pristine, unmodied copy, to ensure that the correct tests are used. Thus, you can modify some of the tests if that helps in debugging, but we will run the originals. All software has bugs, so some of our tests may be awed. If you think a test failure is a bug in the test, not a bug in your code, please point it out. We will look at it and x it if necessary. Please dont try to take advantage of our generosity in giving out our test suite. Your code has to work properly in the general case, not just for the test cases we supply. For example, it would be unacceptable to explicitly base the kernels behavior on the name of the running test case. Such attempts to side-step the test cases will receive no credit. If you think your solution may be in a gray area here, please ask us about it.
Chapter 1: Introduction
1.2.2 Design
We will judge your design based on the design document and the source code that you submit. We will read your entire design document and much of your source code. Dont forget that design quality, including the design document, is 50% of your project grade. It is better to spend one or two hours writing a good design document than it is to spend that time getting the last 5% of the points for tests and then trying to rush through writing the design document in the last 15 minutes.
Chapter 1: Introduction
can be made as rough or informal arguments (formal language or proofs are unnecessary). An incomplete, evasive, or non-responsive design document or one that strays from the template without good reason may be penalized. Incorrect capitalization, punctuation, spelling, or grammar can also cost points. See Appendix D [Project Documentation], page 99, for a sample design document for a ctitious project.
Chapter 1: Introduction
In the context of Stanfords CS 140 course, please respect the spirit and the letter of the honor code by refraining from reading any homework solutions available online or elsewhere. Reading the source code for other operating system kernels, such as Linux or FreeBSD, is allowed, but do not copy code from them literally. Please cite the code that inspired your own in your design documentation.
1.4 Acknowledgements
The Pintos core and this documentation were originally written by Ben Pfa [email protected]. Additional features were contributed by Anthony Romano [email protected]. The GDB macros supplied with Pintos were written by Godmar Back [email protected], and their documentation is adapted from his work. The original structure and form of Pintos was inspired by the Nachos instructional operating system from the University of California, Berkeley ([Christopher]). The Pintos projects and documentation originated with those designed for Nachos by current and former CS 140 teaching assistants at Stanford University, including at least Yu Ping, Greg Hutchins, Kelly Shaw, Paul Twohey, Sameer Qureshi, and John Rector. Example code for monitors (see Section A.3.4 [Monitors], page 68) is from classroom slides originally by Dawson Engler and updated by Mendel Rosenblum.
1.5 Trivia
Pintos originated as a replacement for Nachos with a similar design. Since then Pintos has greatly diverged from the Nachos design. Pintos diers from Nachos in two important ways. First, Pintos runs on real or simulated 80x86 hardware, but Nachos runs as a process on a host operating system. Second, Pintos is written in C like most real-world operating systems, but Nachos is written in C++. Why the name Pintos? First, like nachos, pinto beans are a common Mexican food. Second, Pintos is small and a pint is a small amount. Third, like drivers of the eponymous car, students are likely to have trouble with blow-ups.
2 Project 1: Threads
In this assignment, we give you a minimally functional thread system. Your job is to extend the functionality of this system to gain a better understanding of synchronization problems. You will be working primarily in the threads directory for this assignment, with some work in the devices directory on the side. Compilation should be done in the threads directory. Before you read the description of this project, you should read all of the following sections: Chapter 1 [Introduction], page 1, Appendix C [Coding Standards], page 96, Appendix E [Debugging Tools], page 102, and Appendix F [Development Tools], page 113. You should at least skim the material from Section A.1 [Pintos Loading], page 58 through Section A.5 [Memory Allocation], page 75, especially Section A.3 [Synchronization], page 66. To complete this project you will also need to read Appendix B [4.4BSD Scheduler], page 91.
2.1 Background
2.1.1 Understanding Threads
The rst step is to read and understand the code for the initial thread system. Pintos already implements thread creation and thread completion, a simple scheduler to switch between threads, and synchronization primitives (semaphores, locks, condition variables, and optimization barriers). Some of this code might seem slightly mysterious. If you havent already compiled and run the base system, as described in the introduction (see Chapter 1 [Introduction], page 1), you should do so now. You can read through parts of the source code to see whats going on. If you like, you can add calls to printf() almost anywhere, then recompile and run to see what happens and in what order. You can also run the kernel in a debugger and set breakpoints at interesting spots, single-step through code and examine data, and so on. When a thread is created, you are creating a new context to be scheduled. You provide a function to be run in this context as an argument to thread_create(). The rst time the thread is scheduled and runs, it starts from the beginning of that function and executes in that context. When the function returns, the thread terminates. Each thread, therefore, acts like a mini-program running inside Pintos, with the function passed to thread_ create() acting like main(). At any given time, exactly one thread runs and the rest, if any, become inactive. The scheduler decides which thread to run next. (If no thread is ready to run at any given time, then the special idle thread, implemented in idle(), runs.) Synchronization primitives can force context switches when one thread needs to wait for another thread to do something. The mechanics of a context switch are in threads/switch.S, which is 80x86 assembly code. (You dont have to understand it.) It saves the state of the currently running thread and restores the state of the thread were switching to. Using the GDB debugger, slowly trace through a context switch to see what happens (see Section E.5 [GDB], page 105). You can set a breakpoint on schedule() to start out,
10
and then single-step from there.1 Be sure to keep track of each threads address and state, and what procedures are on the call stack for each thread. You will notice that when one thread calls switch_threads(), another thread starts running, and the rst thing the new thread does is to return from switch_threads(). You will understand the thread system once you understand why and how the switch_threads() that gets called is dierent from the switch_threads() that returns. See Section A.2.3 [Thread Switching], page 65, for more information. Warning: In Pintos, each thread is assigned a small, xed-size execution stack just under 4 kB in size. The kernel tries to detect stack overow, but it cannot do so perfectly. You may cause bizarre problems, such as mysterious kernel panics, if you declare large data structures as non-static local variables, e.g. int buf[1000];. Alternatives to stack allocation include the page allocator and the block allocator (see Section A.5 [Memory Allocation], page 75).
kernel.lds.S The linker script used to link the kernel. Sets the load address of the kernel and arranges for start.S to be near the beginning of the kernel image. See Section A.1.1 [Pintos Loader], page 58, for details. Again, you should not need to look at this code or modify it, but its here in case youre curious. init.c init.h Kernel initialization, including main(), the kernels main program. You should look over main() at least to see what gets initialized. You might want to add your own initialization code here. See Section A.1.3 [High-Level Kernel Initialization], page 59, for details.
thread.c thread.h Basic thread support. Much of your work will take place in these les. thread.h denes struct thread, which you are likely to modify in all four projects. See Section A.2.1 [struct thread], page 61 and Section A.2 [Threads], page 61 for more information.
1
GDB might tell you that schedule() doesnt exist, which is arguably a GDB bug. You can work around this by setting the breakpoint by lename and line number, e.g. break thread.c:ln where ln is the line number of the rst declaration in schedule().
11
switch.S switch.h Assembly language routine for switching threads. Already discussed above. See Section A.2.2 [Thread Functions], page 63, for more information. palloc.c palloc.h Page allocator, which hands out system memory in multiples of 4 kB pages. See Section A.5.1 [Page Allocator], page 75, for more information. malloc.c malloc.h A simple implementation of malloc() and free() for the kernel. See Section A.5.2 [Block Allocator], page 76, for more information. interrupt.c interrupt.h Basic interrupt handling and functions for turning interrupts on and o. See Section A.4 [Interrupt Handling], page 71, for more information. intr-stubs.S intr-stubs.h Assembly code for low-level interrupt handling. See Section A.4.1 [Interrupt Infrastructure], page 72, for more information. synch.c synch.h Basic synchronization primitives: semaphores, locks, condition variables, and optimization barriers. You will need to use these for synchronization in all four projects. See Section A.3 [Synchronization], page 66, for more information. Functions for I/O port access. This is mostly used by source code in the devices directory that you wont have to touch. Functions and macros for working with virtual addresses and page table entries. These will be more important to you in project 3. For now, you can ignore them. Macros that dene a few bits in the 80x86 ags register. Probably of no interest. See [IA32-v1], section 3.4.3, EFLAGS Register, for more information.
12
serial.c serial.h Serial port driver. Again, printf() calls this code for you, so you dont need to do so yourself. It handles serial input by passing it to the input layer (see below). block.c block.h An abstraction layer for block devices, that is, random-access, disk-like devices that are organized as arrays of xed-size blocks. Out of the box, Pintos supports two types of block devices: IDE disks and partitions. Block devices, regardless of type, wont actually be used until project 2. Supports reading and writing sectors on up to 4 IDE disks.
ide.c ide.h
partition.c partition.h Understands the structure of partitions on disks, allowing a single disk to be carved up into multiple regions (partitions) for independent use. kbd.c kbd.h input.c input.h intq.c intq.h rtc.c rtc.h Keyboard driver. Handles keystrokes passing them to the input layer (see below). Input layer. Queues input characters passed along by the keyboard or serial drivers. Interrupt queue, for managing a circular queue that both kernel threads and interrupt handlers want to access. Used by the keyboard and serial drivers. Real-time clock driver, to enable the kernel to determine the current date and time. By default, this is only used by thread/init.c to choose an initial seed for the random number generator.
speaker.c speaker.h Driver that can produce tones on the PC speaker. pit.c pit.h Code to congure the 8254 Programmable Interrupt Timer. This code is used by both devices/timer.c and devices/speaker.c because each device uses one of the PITs output channel.
13
ctype.h inttypes.h limits.h stdarg.h stdbool.h stddef.h stdint.h stdio.c stdio.h stdlib.c stdlib.h string.c string.h A subset of the standard C library. See Section C.2 [C99], page 96, for information on a few recently introduced pieces of the C library that you might not have encountered before. See Section C.3 [Unsafe String Functions], page 97, for information on whats been intentionally left out for safety. debug.c debug.h
Functions and macros to aid debugging. See Appendix E [Debugging Tools], page 102, for more information.
random.c random.h Pseudo-random number generator. The actual sequence of random values will not vary from one Pintos run to another, unless you do one of three things: specify a new random seed value on the -rs kernel command-line option on each run, or use a simulator other than Bochs, or specify the -r option to pintos. round.h Macros for rounding.
syscall-nr.h System call numbers. Not used until project 2. kernel/list.c kernel/list.h Doubly linked list implementation. Used all over the Pintos code, and youll probably want to use it a few places yourself in project 1. kernel/bitmap.c kernel/bitmap.h Bitmap implementation. You can use this in your code if you like, but you probably wont have any need for it in project 1. kernel/hash.c kernel/hash.h Hash table implementation. Likely to come in handy for project 3.
14
2.1.3 Synchronization
Proper synchronization is an important part of the solutions to these problems. Any synchronization problem can be easily solved by turning interrupts o: while interrupts are o, there is no concurrency, so theres no possibility for race conditions. Therefore, its tempting to solve all synchronization problems this way, but dont. Instead, use semaphores, locks, and condition variables to solve the bulk of your synchronization problems. Read the tour section on synchronization (see Section A.3 [Synchronization], page 66) or the comments in threads/synch.c if youre unsure what synchronization primitives may be used in what situations. In the Pintos projects, the only class of problem best solved by disabling interrupts is coordinating data shared between a kernel thread and an interrupt handler. Because interrupt handlers cant sleep, they cant acquire locks. This means that data shared between kernel threads and an interrupt handler must be protected within a kernel thread by turning o interrupts. This project only requires accessing a little bit of thread state from interrupt handlers. For the alarm clock, the timer interrupt needs to wake up sleeping threads. In the advanced scheduler, the timer interrupt needs to access a few global and per-thread variables. When you access these variables from kernel threads, you will need to disable interrupts to prevent the timer interrupt from interfering. When you do turn o interrupts, take care to do so for the least amount of code possible, or you can end up losing important things such as timer ticks or input events. Turning o interrupts also increases the interrupt handling latency, which can make a machine feel sluggish if taken too far. The synchronization primitives themselves in synch.c are implemented by disabling interrupts. You may need to increase the amount of code that runs with interrupts disabled here, but you should still try to keep it to a minimum. Disabling interrupts can be useful for debugging, if you want to make sure that a section of code is not interrupted. You should remove debugging code before turning in your project. (Dont just comment it out, because that can make the code dicult to read.) There should be no busy waiting in your submission. A tight loop that calls thread_ yield() is one form of busy waiting.
15
Instead, we recommend integrating your teams changes early and often, using a source code control system such as Git (see Section F.3 [Git], page 113). This is less likely to produce surprises, because everyone can see everyone elses code as it is written, instead of just when it is nished. These systems also make it possible to review changes and, when a change introduces a bug, drop back to working versions of code. You should expect to run into bugs that you simply dont understand while working on this and subsequent projects. When you do, reread the appendix on debugging tools, which is lled with useful debugging tips that should help you to get back up to speed (see Appendix E [Debugging Tools], page 102). Be sure to read the section on backtraces (see Section E.4 [Backtraces], page 103), which will help you to get the most out of every kernel panic or assertion failure.
2.2 Requirements
2.2.1 Design Document
Before you turn in your project, you must copy the project 1 design document template into your source tree under the name pintos/src/threads/DESIGNDOC and ll it in. We recommend that you read the design document template before you start working on the project. See Appendix D [Project Documentation], page 99, for a sample design document that goes along with a ctitious project.
[Function] Suspends execution of the calling thread until time has advanced by at least x timer ticks. Unless the system is otherwise idle, the thread need not wake up after exactly x ticks. Just put it on the ready queue after they have waited for the right amount of time. timer_sleep() is useful for threads that operate in real-time, e.g. for blinking the cursor once per second. The argument to timer_sleep() is expressed in timer ticks, not in milliseconds or any another unit. There are TIMER_FREQ timer ticks per second, where TIMER_FREQ is a macro dened in devices/timer.h. The default value is 100. We dont recommend changing this value, because any change is likely to cause many of the tests to fail.
Separate functions timer_msleep(), timer_usleep(), and timer_nsleep() do exist for sleeping a specic number of milliseconds, microseconds, or nanoseconds, respectively, but these will call timer_sleep() automatically when necessary. You do not need to modify them. If your delays seem too short or too long, reread the explanation of the -r option to pintos (see Section 1.1.4 [Debugging versus Testing], page 4). The alarm clock implementation is not needed for later projects, although it could be useful for project 4.
16
[Function] Sets the current threads priority to new priority. If the current thread no longer has the highest priority, yields. [Function] Returns the current threads priority. In the presence of priority donation, returns the higher (donated) priority.
You need not provide any interface to allow a thread to directly modify other threads priorities. The priority scheduler is not used in any later project.
17
Like the priority scheduler, the advanced scheduler chooses the thread to run based on priorities. However, the advanced scheduler does not do priority donation. Thus, we recommend that you have the priority scheduler working, except possibly for priority donation, before you start work on the advanced scheduler. You must write your code to allow us to choose a scheduling algorithm policy at Pintos startup time. By default, the priority scheduler must be active, but we must be able to choose the 4.4BSD scheduler with the -mlfqs kernel option. Passing this option sets thread_mlfqs, declared in threads/thread.h, to true when the options are parsed by parse_options(), which happens early in main(). When the 4.4BSD scheduler is enabled, threads no longer directly control their own priorities. The priority argument to thread_create() should be ignored, as well as any calls to thread_set_priority(), and thread_get_priority() should return the threads current priority as set by the scheduler. The advanced scheduler is not used in any later project.
2.3 FAQ
How much code will I need to write? Heres a summary of our reference solution, produced by the diffstat program. The nal row gives total lines inserted and deleted; a changed line counts as both an insertion and a deletion. The reference solution represents just one possible solution. Many other solutions are also possible and many of those dier greatly from the reference solution. Some excellent solutions may not modify all the les modied by the reference solution, and some may modify les not modied by the reference solution. devices/timer.c | 42 +++++threads/fixed-point.h | 120 ++++++++++++++++++ threads/synch.c | 88 ++++++++++++threads/thread.c | 196 ++++++++++++++++++++++++++---threads/thread.h | 23 +++ 5 files changed, 440 insertions(+), 29 deletions(-) fixed-point.h is a new le added by the reference solution. How do I update the Makefiles when I add a new source le? To add a .c le, edit the top-level Makefile.build. Add the new le to variable dir _SRC, where dir is the directory where you added the le. For this project, that means you should add it to threads_SRC or devices_SRC. Then run make. If your new le doesnt get compiled, run make clean and then try again. When you modify the top-level Makefile.build and re-run make, the modied version should be automatically copied to threads/build/Makefile. The converse is not true, so any changes will be lost the next time you run make clean from the threads directory. Unless your changes are truly temporary, you should prefer to edit Makefile.build. A new .h le does not require editing the Makefiles.
18
What does warning: no previous prototype for func mean? It means that you dened a non-static function without preceding it by a prototype. Because non-static functions are intended for use by other .c les, for safety they should be prototyped in a header le included before their denition. To x the problem, add a prototype in a header le that you include, or, if the function isnt actually used by other .c les, make it static. What is the interval between timer interrupts? Timer interrupts occur TIMER_FREQ times per second. You can adjust this value by editing devices/timer.h. The default is 100 Hz. We dont recommend changing this value, because any changes are likely to cause many of the tests to fail. How long is a time slice? There are TIME_SLICE ticks per time slice. threads/thread.c. The default is 4 ticks. This macro is declared in
We dont recommend changing this value, because any changes are likely to cause many of the tests to fail. How do I run the tests? See Section 1.2.1 [Testing], page 5. Why do I get a test failure in pass()? You are probably looking at a backtrace that looks something like this: 0xc0108810: 0xc010a99f: 0xc010bdd3: 0xc010a8cf: 0xc0100452: 0xc0100536: 0xc01000bb: debug_panic (lib/kernel/debug.c:32) pass (tests/threads/tests.c:93) test_mlfqs_load_1 (...threads/mlfqs-load-1.c:33) run_test (tests/threads/tests.c:51) run_task (threads/init.c:283) run_actions (threads/init.c:333) main (threads/init.c:137)
This is just confusing output from the backtrace program. It does not actually mean that pass() called debug_panic(). In fact, fail() called debug_ panic() (via the PANIC() macro). GCC knows that debug_panic() does not return, because it is declared NO_RETURN (see Section E.3 [Function and Parameter Attributes], page 102), so it doesnt include any code in fail() to take control when debug_panic() returns. This means that the return address on the stack looks like it is at the beginning of the function that happens to follow fail() in memory, which in this case happens to be pass(). See Section E.4 [Backtraces], page 103, for more information. How do interrupts get re-enabled in the new thread following schedule()? Every path into schedule() disables interrupts. They eventually get re-enabled by the next thread to be scheduled. Consider the possibilities: the new thread is running in switch_thread() (but see below), which is called by schedule(), which is called by one of a few possible functions: thread_exit(), but well never switch back into such a thread, so its uninteresting.
19
thread_yield(), which immediately restores the interrupt level upon return from schedule(). thread_block(), which is called from multiple places: sema_down(), which restores the interrupt level before returning. idle(), which enables interrupts with an explicit assembly STI instruction. wait() in devices/intq.c, whose callers are responsible for reenabling interrupts. There is a special case when a newly created thread runs for the rst time. Such a thread calls intr_enable() as the rst action in kernel_thread(), which is at the bottom of the call stack for every kernel thread but the rst.
20
Can a threads priority change while it is on the ready queue? Yes. Consider a ready, low-priority thread L that holds a lock. High-priority thread H attempts to acquire the lock and blocks, thereby donating its priority to ready thread L. Can a threads priority change while it is blocked? Yes. While a thread that has acquired lock L is blocked for any reason, its priority can increase by priority donation if a higher-priority thread attempts to acquire L. This case is checked by the priority-donate-sema test. Can a thread added to the ready list preempt the processor? Yes. If a thread added to the ready list has higher priority than the running thread, the correct behavior is to immediately yield the processor. It is not acceptable to wait for the next timer interrupt. The highest priority thread should run as soon as it is runnable, preempting whatever thread is currently running. How does thread_set_priority() aect a thread receiving donations? It sets the threads base priority. The threads eective priority becomes the higher of the newly set priority or the highest donated priority. When the donations are released, the threads priority becomes the one set through the function call. This behavior is checked by the priority-donate-lower test. Doubled test names in output make them fail. Suppose you are seeing output in which some test names are doubled, like this: (alarm-priority) begin (alarm-priority) (alarm-priority) Thread priority 30 woke up. Thread priority 29 woke up. (alarm-priority) Thread priority 28 woke up. What is happening is that output from two threads is being interleaved. That is, one thread is printing "(alarm-priority) Thread priority 29 woke up.\n" and another thread is printing "(alarm-priority) Thread priority 30 woke up.\n", but the rst thread is being preempted by the second in the middle of its output. This problem indicates a bug in your priority scheduler. After all, a thread with priority 29 should not be able to run while a thread with priority 30 has work to do. Normally, the implementation of the printf() function in the Pintos kernel attempts to prevent such interleaved output by acquiring a console lock during the duration of the printf call and releasing it afterwards. However, the output of the test name, e.g., (alarm-priority), and the message following it is output using two calls to printf, resulting in the console lock being acquired and released twice.
21
Can I use one queue instead of 64 queues? Yes. In general, your implementation may dier from the description, as long as its behavior is the same. Some scheduler tests fail and I dont understand why. Help! If your implementation mysteriously fails some of the advanced scheduler tests, try the following: Read the source les for the tests that youre failing, to make sure that you understand whats going on. Each one has a comment at the top that explains its purpose and expected results. Double-check your xed-point arithmetic routines and your use of them in the scheduler routines. Consider how much work your implementation does in the timer interrupt. If the timer interrupt handler takes too long, then it will take away most of a timer tick from the thread that the timer interrupt preempted. When it returns control to that thread, it therefore wont get to do much work before the next timer interrupt arrives. That thread will therefore get blamed for a lot more CPU time than it actually got a chance to use. This raises the interrupted threads recent CPU count, thereby lowering its priority. It can cause scheduling decisions to change. It also raises the load average.
22
3.1 Background
Up to now, all of the code you have run under Pintos has been part of the operating system kernel. This means, for example, that all the test code from the last assignment ran as part of the kernel, with full access to privileged parts of the system. Once we start running user programs on top of the operating system, this is no longer true. This project deals with the consequences. We allow more than one process to run at a time. Each process has one thread (multithreaded processes are not supported). User programs are written under the illusion that they have the entire machine. This means that when you load and run multiple processes at a time, you must manage memory, scheduling, and other state correctly to maintain this illusion. In the previous project, we compiled our test code directly into your kernel, so we had to require certain specic function interfaces within the kernel. From now on, we will test your operating system by running user programs. This gives you much greater freedom. You must make sure that the user program interface meets the specications described here, but given that constraint you are free to restructure or rewrite kernel code however you wish.
23
syscall.c syscall.h Whenever a user process wants to access some kernel functionality, it invokes a system call. This is a skeleton system call handler. Currently, it just prints a message and terminates the user process. In part 2 of this project you will add code to do everything else needed by system calls. exception.c exception.h When a user process performs a privileged or prohibited operation, it traps into the kernel as an exception or fault.1 These les handle exceptions. Currently all exceptions simply print a message and terminate the process. Some, but not all, solutions to project 2 require modifying page_fault() in this le. gdt.c gdt.h The 80x86 is a segmented architecture. The Global Descriptor Table (GDT) is a table that describes the segments in use. These les set up the GDT. You should not need to modify these les for any of the projects. You can read the code if youre interested in how the GDT works. The Task-State Segment (TSS) is used for 80x86 architectural task switching. Pintos uses the TSS only for switching stacks when a user process enters an interrupt handler, as does Linux. You should not need to modify these les for any of the projects. You can read the code if youre interested in how the TSS works.
tss.c tss.h
We will treat these terms as synonyms. There is no standard distinction between them, although Intel processor manuals make a minor distinction between them on 80x86.
24
File size is xed at creation time. The root directory is represented as a le, so the number of les that may be created is also limited. File data is allocated as a single extent, that is, data in a single le must occupy a contiguous range of sectors on disk. External fragmentation can therefore become a serious problem as a le system is used over time. No subdirectories. File names are limited to 14 characters. A system crash mid-operation may corrupt the disk in a way that cannot be repaired automatically. There is no le system repair tool anyway. One important feature is included: Unix-like semantics for filesys_remove() are implemented. That is, if a le is open when it is removed, its blocks are not deallocated and it may still be accessed by any threads that have it open, until the last one closes it. See [Removing an Open File], page 35, for more information. You need to be able to create a simulated disk with a le system partition. The pintosmkdisk program provides this functionality. From the userprog/build directory, execute pintos-mkdisk filesys.dsk --filesys-size=2. This command creates a simulated disk named filesys.dsk that contains a 2 MB Pintos le system partition. Then format the le system partition by passing -f -q on the kernels command line: pintos -f -q. The -f option causes the le system to be formatted, and -q causes Pintos to exit as soon as the format is done. Youll need a way to copy les in and out of the simulated le system. The pintos -p (put) and -g (get) options do this. To copy file into the Pintos le system, use the command pintos -p file -- -q. (The -- is needed because -p is for the pintos script, not for the simulated kernel.) To copy it to the Pintos le system under the name newname , add -a newname : pintos -p file -a newname -- -q. The commands for copying les out of a VM are similar, but substitute -g for -p. Incidentally, these commands work by passing special commands extract and append on the kernels command line and copying to and from a special simulated scratch partition. If youre very curious, you can look at the pintos script as well as filesys/fsutil.c to learn the implementation details. Heres a summary of how to create a disk with a le system partition, format the le system, copy the echo program into the new disk, and then run echo, passing argument x. (Argument passing wont work until you implemented it.) It assumes that youve already built the examples in examples and that the current directory is userprog/build: pintos-mkdisk filesys.dsk --filesys-size=2 pintos -f -q pintos -p ../../examples/echo -a echo -- -q pintos -q run echo x The three nal steps can actually be combined into a single command: pintos-mkdisk filesys.dsk --filesys-size=2 pintos -p ../../examples/echo -a echo -- -f -q run echo x If you dont want to keep the le system disk around for later use or inspection, you can even combine all four steps into a single command. The --filesys-size=n option creates
25
a temporary le system partition approximately n megabytes in size just for the duration of the pintos run. The Pintos automatic test suite makes extensive use of this syntax: pintos --filesys-size=2 -p ../../examples/echo -a echo -- -f -q run echo x You can delete a le from the Pintos le system using the rm file kernel action, e.g. pintos -q rm file . Also, ls lists the les in the le system and cat file prints a les contents to the display.
26
and, if a user process is running, the user virtual memory of the running process. However, even in the kernel, an attempt to access memory at an unmapped user virtual address will cause a page fault.
27
28
} Each of these functions assumes that the user address has already been veried to be below PHYS_BASE. They also assume that youve modied page_fault() so that a page fault in the kernel merely sets eax to 0xffffffff and copies its former value into eip.
3.3 Requirements
3.3.1 Design Document
Before you turn in your project, you must copy the project 2 design document template into your source tree under the name pintos/src/userprog/DESIGNDOC and ll it in. We recommend that you read the design document template before you start working on the project. See Appendix D [Project Documentation], page 99, for a sample design document that goes along with a ctitious project.
29
[System Call] Terminates Pintos by calling shutdown_power_off() (declared in devices/shutdown.h). This should be seldom used, because you lose some information about possible deadlock situations, etc.
30
[System Call] Terminates the current user program, returning status to the kernel. If the processs parent waits for it (see below), this is the status that will be returned. Conventionally, a status of 0 indicates success and nonzero values indicate errors. [System Call] Runs the executable whose name is given in cmd line, passing any given arguments, and returns the new processs program id (pid). Must return pid -1, which otherwise should not be a valid pid, if the program cannot load or run for any reason. Thus, the parent process cannot return from the exec until it knows whether the child process successfully loaded its executable. You must use appropriate synchronization to ensure this.
[System Call] Waits for a child process pid and retrieves the childs exit status. If pid is still alive, waits until it terminates. Then, returns the status that pid passed to exit. If pid did not call exit(), but was terminated by the kernel (e.g. killed due to an exception), wait(pid) must return -1. It is perfectly legal for a parent process to wait for child processes that have already terminated by the time the parent calls wait, but the kernel must still allow the parent to retrieve its childs exit status, or learn that the child was terminated by the kernel. wait must fail and return -1 immediately if any of the following conditions is true: pid does not refer to a direct child of the calling process. pid is a direct child of the calling process if and only if the calling process received pid as a return value from a successful call to exec. Note that children are not inherited: if A spawns child B and B spawns child process C, then A cannot wait for C, even if B is dead. A call to wait(C) by process A must fail. Similarly, orphaned processes are not assigned to a new parent if their parent process exits before they do. The process that calls wait has already called wait on pid. That is, a process may wait for any given child at most once. Processes may spawn any number of children, wait for them in any order, and may even exit without having waited for some or all of their children. Your design should consider all the ways in which waits can occur. All of a processs resources, including its struct thread, must be freed whether its parent ever waits for it or not, and regardless of whether the child exits before or after its parent. You must ensure that Pintos does not terminate until the initial process exits. The supplied Pintos code tries to do this by calling process_wait() (in userprog/process.c) from main() (in threads/init.c). We suggest that you implement process_wait() according to the comment at the top of the function and then implement the wait system call in terms of process_wait(). Implementing this system call requires considerably more work than any of the rest.
[System Call] Creates a new le called le initially initial size bytes in size. Returns true if successful, false otherwise. Creating a new le does not open it: opening the new le is a separate operation which would require a open system call.
31
[System Call] Deletes the le called le. Returns true if successful, false otherwise. A le may be removed regardless of whether it is open or closed, and removing an open le does not close it. See [Removing an Open File], page 35, for details.
[System Call] Opens the le called le. Returns a nonnegative integer handle called a le descriptor (fd), or -1 if the le could not be opened. File descriptors numbered 0 and 1 are reserved for the console: fd 0 (STDIN_FILENO) is standard input, fd 1 (STDOUT_FILENO) is standard output. The open system call will never return either of these le descriptors, which are valid as system call arguments only as explicitly described below. Each process has an independent set of le descriptors. File descriptors are not inherited by child processes. When a single le is opened more than once, whether by a single process or dierent processes, each open returns a new le descriptor. Dierent le descriptors for a single le are closed independently in separate calls to close and they do not share a le position. [System Call] Returns the size, in bytes, of the le open as fd.
int filesize (int fd ) int read (int fd, void *buffer, unsigned size )
[System Call] Reads size bytes from the le open as fd into buer. Returns the number of bytes actually read (0 at end of le), or -1 if the le could not be read (due to a condition other than end of le). Fd 0 reads from the keyboard using input_getc().
[System Call] Writes size bytes from buer to the open le fd. Returns the number of bytes actually written, which may be less than size if some bytes could not be written. Writing past end-of-le would normally extend the le, but le growth is not implemented by the basic le system. The expected behavior is to write as many bytes as possible up to end-of-le and return the actual number written, or 0 if no bytes could be written at all. Fd 1 writes to the console. Your code to write to the console should write all of buer in one call to putbuf(), at least as long as size is not bigger than a few hundred bytes. (It is reasonable to break up larger buers.) Otherwise, lines of text output by dierent processes may end up interleaved on the console, confusing both human readers and our grading scripts. [System Call] Changes the next byte to be read or written in open le fd to position, expressed in bytes from the beginning of the le. (Thus, a position of 0 is the les start.) A seek past the current end of a le is not an error. A later read obtains 0 bytes, indicating end of le. A later write extends the le, lling any unwritten gap with zeros. (However, in Pintos les have a xed length until project 4 is complete, so writes past end of le will return an error.) These semantics are implemented in the le system and do not require any special eort in system call implementation.
32
[System Call] Returns the position of the next byte to be read or written in open le fd, expressed in bytes from the beginning of the le.
[System Call] Closes le descriptor fd. Exiting or terminating a process implicitly closes all its open le descriptors, as if by calling this function for each one.
The le denes other syscalls. Ignore them for now. You will implement some of them in project 3 and the rest in project 4, so be sure to design your system with extensibility in mind. To implement syscalls, you need to provide ways to read and write data in user virtual address space. You need this ability before you can even obtain the system call number, because the system call number is on the users stack in the users virtual address space. This can be a bit tricky: what if the user provides an invalid pointer, a pointer into kernel memory, or a block partially in one of those regions? You should handle these cases by terminating the user process. We recommend writing and testing this code before implementing any other system call functionality. See Section 3.1.5 [Accessing User Memory], page 27, for more information. You must synchronize system calls so that any number of user processes can make them at once. In particular, it is not safe to call into the le system code provided in the filesys directory from multiple threads at once. Your system call implementation must treat the le system code as a critical section. Dont forget that process_execute() also accesses les. For now, we recommend against modifying code in the filesys directory. We have provided you a user-level function for each system call in lib/user/syscall.c. These provide a way for user processes to invoke each system call from a C program. Each uses a little inline assembly code to invoke the system call and (if appropriate) returns the system calls return value. When youre done with this part, and forevermore, Pintos should be bulletproof. Nothing that a user program can do should ever cause the OS to crash, panic, fail an assertion, or otherwise malfunction. It is important to emphasize this point: our tests will try to break your system calls in many, many ways. You need to think of all the corner cases and handle them. The sole way a user program should be able to cause the OS to halt is by invoking the halt system call. If a system call is passed an invalid argument, acceptable options include returning an error value (for those calls that return a value), returning an undened value, or terminating the process. See Section 3.5.2 [System Call Details], page 37, for details on how system calls work.
33
Closing a le will also re-enable writes. Thus, to deny writes to a processs executable, you must keep it open as long as the process is still running.
3.4 FAQ
How much code will I need to write? Heres a summary of our reference solution, produced by the diffstat program. The nal row gives total lines inserted and deleted; a changed line counts as both an insertion and a deletion. The reference solution represents just one possible solution. Many other solutions are also possible and many of those dier greatly from the reference solution. Some excellent solutions may not modify all the les modied by the reference solution, and some may modify les not modied by the reference solution. threads/thread.c | 13 threads/thread.h | 26 + userprog/exception.c | 8 userprog/process.c | 247 ++++++++++++++-userprog/syscall.c | 468 ++++++++++++++++++++++++++++++userprog/syscall.h | 1 6 files changed, 725 insertions(+), 38 deletions(-) The kernel always panics when I run pintos -p file -- -q. Did you format the le system (with pintos -f)? Is your le name too long? The le system limits le names to 14 characters. A command like pintos -p ../../examples/echo -- -q will exceed the limit. Use pintos -p ../../examples/echo -a echo -- -q to put the le under the name echo instead. Is the le system full? Does the le system already contain 16 les? The base Pintos le system has a 16-le limit. The le system may be so fragmented that theres not enough contiguous space for your le. When I run pintos -p ../file --, file isnt copied. Files are written under the name you refer to them, by default, so in this case the le copied in would be named ../file. You probably want to run pintos -p ../file -a file -- instead. You can list the les in your le system with pintos -q ls. All my user programs die with page faults. This will happen if you havent implemented argument passing (or havent done so correctly). The basic C library for user programs tries to read argc and argv o the stack. If the stack isnt properly set up, this causes a page fault. All my user programs die with system call! Youll have to implement system calls before you see anything else. Every reasonable program tries to make at least one system call (exit()) and most
34
programs make more than that. Notably, printf() invokes the write system call. The default system call handler just prints system call! and terminates the program. Until then, you can use hex_dump() to convince yourself that argument passing is implemented correctly (see Section 3.5.1 [Program Startup Details], page 36). How can I disassemble user programs? The objdump (80x86) or i386-elf-objdump (SPARC) utility can disassemble entire user programs or object les. Invoke it as objdump -d file . You can use GDBs disassemble command to disassemble individual functions (see Section E.5 [GDB], page 105). Why do many C include les not work in Pintos programs? Can I use libfoo in my Pintos programs? The C library we provide is very limited. It does not include many of the features that are expected of a real operating systems C library. The C library must be built specically for the operating system (and architecture), since it must make system calls for I/O and memory allocation. (Not all functions do, of course, but usually the library is compiled as a unit.) The chances are good that the library you want uses parts of the C library that Pintos doesnt implement. It will probably take at least some porting eort to make it work under Pintos. Notably, the Pintos user program C library does not have a malloc() implementation. How do I compile new user programs? Modify src/examples/Makefile, then run make. Can I run user programs under a debugger? Yes, with some limitations. See Section E.5 [GDB], page 105. Whats the dierence between tid_t and pid_t? A tid_t identies a kernel thread, which may have a user process running in it (if created with process_execute()) or not (if created with thread_ create()). It is a data type used only in the kernel. A pid_t identies a user process. It is used by user processes and the kernel in the exec and wait system calls. You can choose whatever suitable types you like for tid_t and pid_t. By default, theyre both int. You can make them a one-to-one mapping, so that the same values in both identify the same process, or you can use a more complex mapping. Its up to you.
35
Is PHYS_BASE xed? No. You should be able to support PHYS_BASE values that are any multiple of 0x10000000 from 0x80000000 to 0xf0000000, simply via recompilation.
36
2. The caller pushes the address of its next instruction (the return address) on the stack and jumps to the rst instruction of the callee. A single 80x86 instruction, CALL, does both. 3. The callee executes. When it takes control, the stack pointer points to the return address, the rst argument is just above it, the second argument is just above the rst argument, and so on. 4. If the callee has a return value, it stores it into register EAX. 5. The callee returns by popping the return address from the stack and jumping to the location it species, using the 80x86 RET instruction. 6. The caller pops the arguments o the stack. Consider a function f() that takes three int arguments. This diagram shows a sample stack frame as seen by the callee at the beginning of step 3 above, supposing that f() is invoked as f(1, 2, 3). The initial stack address is arbitrary: +----------------+ 0xbffffe7c | 3 | 0xbffffe78 | 2 | 0xbffffe74 | 1 | stack pointer --> 0xbffffe70 | return address | +----------------+
37
The table below shows the state of the stack and the relevant registers right before the beginning of the user program, assuming PHYS_BASE is 0xc0000000: Address Name Data Type 0xbffffffc argv[3][...] bar\0 char[4] 0xbffffff8 argv[2][...] foo\0 char[4] 0xbffffff5 argv[1][...] -l\0 char[3] 0xbfffffed argv[0][...] /bin/ls\0 char[8] 0xbfffffec word-align 0 uint8_t 0xbfffffe8 argv[4] 0 char * 0xbfffffe4 argv[3] 0xbffffffc char * 0xbfffffe0 argv[2] 0xbffffff8 char * 0xbfffffdc argv[1] 0xbffffff5 char * 0xbfffffd8 argv[0] 0xbfffffed char * 0xbfffffd4 argv 0xbfffffd8 char ** 0xbfffffd0 argc 4 int 0xbfffffcc return address 0 void (*) () In this example, the stack pointer would be initialized to 0xbfffffcc. As shown above, your code should start the stack at the very top of the user virtual address space, in the page just below virtual address PHYS_BASE (dened in threads/vaddr.h). You may nd the non-standard hex_dump() function, declared in <stdio.h>, useful for debugging your argument passing code. Heres what it would show in the above example: bfffffc0 00 00 00 00 | ....| bfffffd0 04 00 00 00 d8 ff ff bf-ed ff ff bf f5 ff ff bf |................| bfffffe0 f8 ff ff bf fc ff ff bf-00 00 00 00 00 2f 62 69 |............./bi| bffffff0 6e 2f 6c 73 00 2d 6c 00-66 6f 6f 00 62 61 72 00 |n/ls.-l.foo.bar.|
38
syscall_handler() as the esp member of the struct intr_frame passed to it. (struct intr_frame is on the kernel stack.) The 80x86 convention for function return values is to place them in the EAX register. System calls that return a value can do so by modifying the eax member of struct intr_ frame. You should try to avoid writing large amounts of repetitive code for implementing system calls. Each system call argument, whether an integer or a pointer, takes up 4 bytes on the stack. You should be able to take advantage of this to avoid writing much near-identical code for retrieving each system calls arguments from the stack.
39
4.1 Background
4.1.1 Source Files
You will work in the vm directory for this project. The vm directory contains only Makefiles. The only change from userprog is that this new Makefile turns on the setting -DVM. All code you write will be in new les or in les introduced in earlier projects. You will probably be encountering just a few les for the rst time: devices/block.h devices/block.c Provides sector-based read and write access to block device. You will use this interface to access the swap partition as a block device.
4.1.2.1 Pages
A page, sometimes called a virtual page, is a continuous region of virtual memory 4,096 bytes (the page size) in length. A page must be page-aligned, that is, start on a virtual address evenly divisible by the page size. Thus, a 32-bit virtual address can be divided into a 20-bit page number and a 12-bit page oset (or just oset), like this: 31 12 11 0 +-------------------+-----------+ | Page Number | Offset | +-------------------+-----------+ Virtual Address Each process has an independent set of user (virtual) pages, which are those pages below virtual address PHYS_BASE, typically 0xc0000000 (3 GB). The set of kernel (virtual) pages,
40
on the other hand, is global, remaining the same regardless of what thread or process is active. The kernel may access both user and kernel pages, but a user process may access only its own user pages. See Section 3.1.4 [Virtual Memory Layout], page 25, for more information. Pintos provides several useful functions for working with virtual addresses. See Section A.6 [Virtual Addresses], page 77, for details.
4.1.2.2 Frames
A frame, sometimes called a physical frame or a page frame, is a continuous region of physical memory. Like pages, frames must be page-size and page-aligned. Thus, a 32-bit physical address can be divided into a 20-bit frame number and a 12-bit frame oset (or just oset), like this: 31 12 11 0 +-------------------+-----------+ | Frame Number | Offset | +-------------------+-----------+ Physical Address The 80x86 doesnt provide any way to directly access memory at a physical address. Pintos works around this by mapping kernel virtual memory directly to physical memory: the rst page of kernel virtual memory is mapped to the rst frame of physical memory, the second page to the second frame, and so on. Thus, frames can be accessed through kernel virtual memory. Pintos provides functions for translating between physical addresses and kernel virtual addresses. See Section A.6 [Virtual Addresses], page 77, for details.
41
42
43
data of your choice. The frame table allows Pintos to eciently implement an eviction policy, by choosing a page to evict when no frames are free. The frames used for user pages should be obtained from the user pool, by calling palloc_get_page(PAL_USER). You must use PAL_USER to avoid allocating from the kernel pool, which could cause some test cases to fail unexpectedly (see [Why PAL USER?], page 49). If you modify palloc.c as part of your frame table implementation, be sure to retain the distinction between the two pools. The most important operation on the frame table is obtaining an unused frame. This is easy when a frame is free. When none is free, a frame must be made free by evicting some page from its frame. If no frame can be evicted without allocating a swap slot, but swap is full, panic the kernel. Real OSes apply a wide range of policies to recover from or prevent such situations, but these policies are beyond the scope of this project. The process of eviction comprises roughly the following steps: 1. Choose a frame to evict, using your page replacement algorithm. The accessed and dirty bits in the page table, described below, will come in handy. 2. Remove references to the frame from any page table that refers to it. Unless you have implemented sharing, only a single page should refer to a frame at any given time. 3. If necessary, write the page to the le system or to swap. The evicted frame may then be used to store a dierent page.
44
You may use the BLOCK_SWAP block device for swapping, obtaining the struct block that represents it by calling block_get_role(). From the vm/build directory, use the command pintos-mkdisk swap.dsk --swap-size=n to create an disk named swap.dsk that contains a n-MB swap partition. Afterward, swap.dsk will automatically be attached as an extra disk when you run pintos. Alternatively, you can tell pintos to use a temporary n-MB swap disk for a single run with --swap-size=n . Swap slots should be allocated lazily, that is, only when they are actually required by eviction. Reading data pages from the executable and writing them to swap immediately at process startup is not lazy. Swap slots should not be reserved to store particular pages. Free a swap slot when its contents are read back into a frame.
} A similar program with full error handling is included as mcat.c in the examples directory, which also contains mcp.c as a second example of mmap. Your submission must be able to track what memory is used by memory mapped les. This is necessary to properly handle page faults in the mapped regions and to ensure that mapped les do not overlap any other segments within the process.
45
After this step, your kernel should still pass all the project 2 test cases. 2. Supplemental page table and page fault handler (see Section 4.1.4 [Managing the Supplemental Page Table], page 42). Change process.c to record the necessary information in the supplemental page table when loading an executable and setting up its stack. Implement loading of code and data segments in the page fault handler. For now, consider only valid accesses. After this step, your kernel should pass all of the project 2 functionality test cases, but only some of the robustness tests. From here, you can implement stack growth, mapped les, and page reclamation on process exit in parallel. The next step is to implement eviction (see Section 4.1.5 [Managing the Frame Table], page 42). Initially you could choose the page to evict randomly. At this point, you need to consider how to manage accessed and dirty bits and aliasing of user and kernel pages. Synchronization is also a concern: how do you deal with it if process A faults on a page whose frame process B is in the process of evicting? Finally, implement a eviction strategy such as the clock algorithm.
4.3 Requirements
This assignment is an open-ended design problem. We are going to say as little as possible about how to do things. Instead we will focus on what functionality we require your OS to support. We will expect you to come up with a design that makes sense. You will have the freedom to choose how to handle page faults, how to organize the swap partition, how to implement paging, etc.
4.3.2 Paging
Implement paging for segments loaded from executables. All of these pages should be loaded lazily, that is, only as the kernel intercepts page faults for them. Upon eviction, pages modied since load (e.g. as indicated by the dirty bit) should be written to swap. Unmodied pages, including read-only pages, should never be written to swap because they can always be read back from the executable. Implement a global page replacement algorithm that approximates LRU. Your algorithm should perform at least as well as the simple variant of the second chance or clock algorithm. Your design should allow for parallelism. If one page fault requires I/O, in the meantime processes that do not fault should continue executing and other page faults that do not require I/O should be able to complete. This will require some synchronization eort. Youll need to modify the core of the program loader, which is the loop in load_ segment() in userprog/process.c. Each time around the loop, page_read_bytes re-
46
ceives the number of bytes to read from the executable le and page_zero_bytes receives the number of bytes to initialize to zero following the bytes read. The two always sum to PGSIZE (4,096). The handling of a page depends on these variables values: If page_read_bytes equals PGSIZE, the page should be demand paged from the underlying le on its rst access. If page_zero_bytes equals PGSIZE, the page does not need to be read from disk at all because it is all zeroes. You should handle such pages by creating a new page consisting of all zeroes at the rst page fault. Otherwise, neither page_read_bytes nor page_zero_bytes equals PGSIZE. In this case, an initial part of the page is to be read from the underlying le and the remainder zeroed.
This rule is common but not universal. One modern exception is the x86-64 System V ABI, which designates 128 bytes below the stack pointer as a red zone that may not be modied by signal or interrupt handlers.
47
[System Call] Maps the le open as fd into the processs virtual address space. The entire le is mapped into consecutive virtual pages starting at addr.
Your VM system must lazily load pages in mmap regions and use the mmaped le itself as backing store for the mapping. That is, evicting a page mapped by mmap writes it back to the le it was mapped from. If the les length is not a multiple of PGSIZE, then some bytes in the nal mapped page stick out beyond the end of the le. Set these bytes to zero when the page is faulted in from the le system, and discard them when the page is written back to disk. If successful, this function returns a mapping ID that uniquely identies the mapping within the process. On failure, it must return -1, which otherwise should not be a valid mapping id, and the processs mappings must be unchanged. A call to mmap may fail if the le open as fd has a length of zero bytes. It must fail if addr is not page-aligned or if the range of pages mapped overlaps any existing set of mapped pages, including the stack or pages mapped at executable load time. It must also fail if addr is 0, because some Pintos code assumes virtual page 0 is not mapped. Finally, le descriptors 0 and 1, representing console input and output, are not mappable.
[System Call] Unmaps the mapping designated by mapping, which must be a mapping ID returned by a previous call to mmap by the same process that has not yet been unmapped.
All mappings are implicitly unmapped when a process exits, whether via exit or by any other means. When a mapping is unmapped, whether implicitly or explicitly, all pages written to by the process are written back to the le, and pages not written must not be. The pages are then removed from the processs list of virtual pages. Closing or removing a le does not unmap any of its mappings. Once created, a mapping is valid until munmap is called or the process exits, following the Unix convention. See [Removing an Open File], page 35, for more information. You should use the file_reopen function to obtain a separate and independent reference to the le for each of its mappings. If two or more processes map the same le, there is no requirement that they see consistent data. Unix handles this by making the two mappings share the same physical page, but the mmap system call also has an argument allowing the client to specify whether the page is shared or private (i.e. copy-on-write).
48
prevent this, a page may be evicted from its frame even while it is being accessed by kernel code. If kernel code accesses such non-resident user pages, a page fault will result. While accessing user memory, your kernel must either be prepared to handle such page faults, or it must prevent them from occurring. The kernel must prevent such page faults while it is holding resources it would need to acquire to handle these faults. In Pintos, such resources include locks acquired by the device driver(s) that control the device(s) containing the le system and swap space. As a concrete example, you must not allow page faults to occur while a device driver accesses a user buer passed to file_read, because you would not be able to invoke the driver while handling such faults. Preventing such page faults requires cooperation between the code within which the access occurs and your page eviction code. For instance, you could extend your frame table to record when a page contained in a frame must not be evicted. (This is also referred to as pinning or locking the page in its frame.) Pinning restricts your page replacement algorithms choices when looking for pages to evict, so be sure to pin pages no longer than necessary, and avoid pinning pages when it is not necessary.
4.4 FAQ
How much code will I need to write? Heres a summary of our reference solution, produced by the diffstat program. The nal row gives total lines inserted and deleted; a changed line counts as both an insertion and a deletion. This summary is relative to the Pintos base code, but the reference solution for project 3 starts from the reference solution to project 2. See Section 3.4 [Project 2 FAQ], page 33, for the summary of project 2. The reference solution represents just one possible solution. Many other solutions are also possible and many of those dier greatly from the reference solution. Some excellent solutions may not modify all the les modied by the reference solution, and some may modify les not modied by the reference solution. Makefile.build devices/timer.c threads/init.c threads/interrupt.c threads/thread.c threads/thread.h userprog/exception.c userprog/pagedir.c userprog/process.c userprog/syscall.c userprog/syscall.h vm/frame.c vm/frame.h vm/page.c vm/page.h vm/swap.c | | | | | | | | | | | | | | | | 4 42 5 2 31 37 12 10 319 545 1 162 23 297 50 85 ++
+ +-
49
vm/swap.h | 11 17 files changed, 1532 insertions(+), 104 deletions(-) Do we need a working Project 2 to implement Project 3? Yes. What extra credit is available? Extra credit may be available if you implement page sharing (some classes oer extra credit and some dont; check your class-specic information to see if extra credit is available). Page sharing means that if multiple processes use the same executable le, read-only pages (such as code pages) can be shared among those processes instead of creating separate copies for each process. If you carefully designed your data structures, sharing of read-only pages should not make this part signicantly harder. How do we resume a process after we have handled a page fault? Returning from page_fault() resumes the current user process (see Section A.4.2 [Internal Interrupt Handling], page 73). It will then retry the instruction to which the instruction pointer points. Why do user processes sometimes fault above the stack pointer? You might notice that, in the stack growth tests, the user program faults on an address that is above the user programs current stack pointer, even though the PUSH and PUSHA instructions would cause faults 4 and 32 bytes below the current stack pointer. This is not unusual. The PUSH and PUSHA instructions are not the only instructions that can trigger user stack growth. For instance, a user program may allocate stack space by decrementing the stack pointer using a SUB $n, %esp instruction, and then use a MOV ..., m(%esp) instruction to write to a stack location within the allocated space that is m bytes above the current stack pointer. Such accesses are perfectly valid, and your kernel must grow the user programs stack to allow those accesses to succeed. Does the virtual memory system need to support data segment growth? No. The size of the data segment is determined by the linker. We still have no dynamic allocation in Pintos (although it is possible to fake it at the user level by using memory-mapped les). Supporting data segment growth should add little additional complexity to a well-designed system. Why should I use PAL_USER for allocating page frames? Passing PAL_USER to palloc_get_page() causes it to allocate memory from the user pool, instead of the main kernel pool. Running out of pages in the user pool just causes user programs to page, but running out of pages in the kernel pool will cause many failures because so many kernel functions need to obtain memory. You can layer some other allocator on top of palloc_get_page() if you like, but it should be the underlying mechanism. Also, you can use the -ul kernel command-line option to limit the size of the user pool, which makes it easy to test your VM implementation with various user memory sizes.
50
5.1 Background
5.1.1 New Code
Here are some les that are probably new to you. These are in the filesys directory except where indicated: fsutil.c Simple utilities for the le system that are accessible from the kernel command line. filesys.h filesys.c Top-level interface to the le system. See Section 3.1.2 [Using the File System], page 23, for an introduction. directory.h directory.c Translates le names to inodes. The directory data structure is stored as a le. inode.h inode.c file.h file.c Manages the data structure representing the layout of a les data on disk. Translates le reads and writes to disk sector reads and writes.
lib/kernel/bitmap.h lib/kernel/bitmap.c A bitmap data structure along with routines for reading and writing the bitmap to disk les. Our le system has a Unix-like interface, so you may also wish to read the Unix man pages for creat, open, close, read, write, lseek, and unlink. Our le system has calls that are similar, but not identical, to these. The le system translates these calls into disk operations. All the basic functionality is there in the code above, so that the le system is usable from the start, as youve seen in the previous two projects. However, it has severe limitations which you will remove.
51
While most of your work will be in filesys, you should be prepared for interactions with all previous parts.
5.3 Requirements
5.3.1 Design Document
Before you turn in your project, you must copy the project 4 design document template into your source tree under the name pintos/src/filesys/DESIGNDOC and ll it in. We recommend that you read the design document template before you start working on the project. See Appendix D [Project Documentation], page 99, for a sample design document that goes along with a ctitious project.
52
5.3.3 Subdirectories
Implement a hierarchical name space. In the basic le system, all les live in a single directory. Modify this to allow directory entries to point to les or to other directories. Make sure that directories can expand beyond their original size just as any other le can. The basic le system has a 14-character limit on le names. You may retain this limit for individual le name components, or may extend it, at your option. You must allow full path names to be much longer than 14 characters. Maintain a separate current directory for each process. At startup, set the root as the initial processs current directory. When one process starts another with the exec system call, the child process inherits its parents current directory. After that, the two processes current directories are independent, so that either changing its own current directory has no eect on the other. (This is why, under Unix, the cd command is a shell built-in, not an external program.) Update the existing system calls so that, anywhere a le name is provided by the caller, an absolute or relative path name may used. The directory separator character is forward
53
slash (/). You must also support special le names . and .., which have the same meanings as they do in Unix. Update the open system call so that it can also open directories. Of the existing system calls, only close needs to accept a le descriptor for a directory. Update the remove system call so that it can delete empty directories (other than the root) in addition to regular les. Directories may only be deleted if they do not contain any les or subdirectories (other than . and ..). You may decide whether to allow deletion of a directory that is open by a process or in use as a processs current working directory. If it is allowed, then attempts to open les (including . and ..) or create new les in a deleted directory must be disallowed. Implement the following new system calls:
[System Call] Changes the current working directory of the process to dir, which may be relative or absolute. Returns true if successful, false on failure.
[System Call] Creates the directory named dir, which may be relative or absolute. Returns true if successful, false on failure. Fails if dir already exists or if any directory name in dir, besides the last, does not already exist. That is, mkdir("/a/b/c") succeeds only if /a/b already exists and /a/b/c does not. [System Call] Reads a directory entry from le descriptor fd, which must represent a directory. If successful, stores the null-terminated le name in name, which must have room for READDIR_MAX_LEN + 1 bytes, and returns true. If no entries are left in the directory, returns false. . and .. should not be returned by readdir. If the directory changes while it is open, then it is acceptable for some entries not to be read at all or to be read multiple times. Otherwise, each directory entry should be read once, in any order. READDIR_MAX_LEN is dened in lib/user/syscall.h. If your le system supports longer le names than the basic le system, you should increase this value from the default of 14.
[System Call] Returns true if fd represents a directory, false if it represents an ordinary le.
[System Call] Returns the inode number of the inode associated with fd, which may represent an ordinary le or a directory. An inode number persistently identies a le or directory. It is unique during the les existence. In Pintos, the sector number of the inode is suitable for use as an inode number.
We have provided ls and mkdir user programs, which are straightforward once the above syscalls are implemented. We have also provided pwd, which is not so straightforward. The shell program implements cd internally.
54
The pintos extract and append commands should now accept full path names, assuming that the directories used in the paths have already been created. This should not require any signicant extra eort on your part.
5.3.5 Synchronization
The provided le system requires external synchronization, that is, callers must ensure that only one thread can be running in the le system code at once. Your submission must adopt a ner-grained synchronization strategy that does not require external synchronization. To the extent possible, operations on independent entities should be independent, so that they do not need to wait on each other.
55
Operations on dierent cache blocks must be independent. In particular, when I/O is required on a particular block, operations on other blocks that do not require I/O should proceed without having to wait for the I/O to complete. Multiple processes must be able to access a single le at once. Multiple reads of a single le must be able to complete without waiting for one another. When writing to a le does not extend the le, multiple processes should also be able to write a single le at once. A read of a le by one process when the le is being written by another process is allowed to show that none, all, or part of the write has completed. (However, after the write system call returns to its caller, all subsequent readers must see the change.) Similarly, when two processes simultaneously write to the same part of a le, their data may be interleaved. On the other hand, extending a le and writing data into the new section must be atomic. Suppose processes A and B both have a given le open and both are positioned at end-of-le. If A reads and B writes the le at the same time, A may read all, part, or none of what B writes. However, A may not read data other than what B writes, e.g. if Bs data is all nonzero bytes, A is not allowed to see any zeros. Operations on dierent directories should take place concurrently. Operations on the same directory may wait for one another. Keep in mind that only data shared by multiple threads needs to be synchronized. In the base le system, struct file and struct dir are accessed only by a single thread.
5.4 FAQ
How much code will I need to write? Heres a summary of our reference solution, produced by the diffstat program. The nal row gives total lines inserted and deleted; a changed line counts as both an insertion and a deletion. This summary is relative to the Pintos base code, but the reference solution for project 4 is based on the reference solution to project 3. Thus, the reference solution runs with virtual memory enabled. See Section 4.4 [Project 3 FAQ], page 48, for the summary of project 3. The reference solution represents just one possible solution. Many other solutions are also possible and many of those dier greatly from the reference solution. Some excellent solutions may not modify all the les modied by the reference solution, and some may modify les not modied by the reference solution. Makefile.build | 5 devices/timer.c | 42 ++ filesys/Make.vars | 6 filesys/cache.c | 473 +++++++++++++++++++++++++ filesys/cache.h | 23 + filesys/directory.c | 99 ++++filesys/directory.h | 3 filesys/file.c | 4 filesys/filesys.c | 194 +++++++++filesys/filesys.h | 5 filesys/free-map.c | 45 +-
56
filesys/free-map.h | 4 filesys/fsutil.c | 8 filesys/inode.c | 444 ++++++++++++++++++----filesys/inode.h | 11 threads/init.c | 5 threads/interrupt.c | 2 threads/thread.c | 32 + threads/thread.h | 38 +userprog/exception.c | 12 userprog/pagedir.c | 10 userprog/process.c | 332 +++++++++++++---userprog/syscall.c | 582 ++++++++++++++++++++++++++++++userprog/syscall.h | 1 vm/frame.c | 161 ++++++++ vm/frame.h | 23 + vm/page.c | 297 +++++++++++++++ vm/page.h | 50 ++ vm/swap.c | 85 ++++ vm/swap.h | 11 30 files changed, 2721 insertions(+), 286 deletions(-) Can BLOCK_SECTOR_SIZE change? No, BLOCK_SECTOR_SIZE is xed at 512. For IDE disks, this value is a xed property of the hardware. Other disks do not necessarily have a 512-byte sector, but for simplicity Pintos only supports those that do.
57
kernel memory then you have to count it against the 64-block limit. The same rule applies to anything thats similar to a block of disk data, such as a struct inode_disk without the length or sector_cnt members. That means youll have to change the way the inode implementation accesses its corresponding on-disk inode right now, since it currently just embeds a struct inode_disk in struct inode and reads the corresponding sector from disk when its created. Keeping extra copies of inodes would subvert the 64-block limitation that we place on your cache. You can store a pointer to inode data in struct inode, but if you do so you should carefully make sure that this does not limit your OS to 64 simultaneously open les. You can also store other information to help you nd the inode when you need it. Similarly, you may store some metadata along each of your 64 cache entries. You can keep a cached copy of the free map permanently in memory if you like. It doesnt have to count against the cache size. byte_to_sector() in filesys/inode.c uses the struct inode_disk directly, without rst reading that sector from wherever it was in the storage hierarchy. This will no longer work. You will need to change inode_byte_to_sector() to obtain the struct inode_disk from the cache before using it.
58
A.1 Loading
This section covers the Pintos loader and basic kernel initialization.
59
60
The next block of functions we call initializes the kernels memory system. palloc_ init() sets up the kernel page allocator, which doles out memory one or more pages at a time (see Section A.5.1 [Page Allocator], page 75). malloc_init() sets up the allocator that handles allocations of arbitrary-size blocks of memory (see Section A.5.2 [Block Allocator], page 76). paging_init() sets up a page table for the kernel (see Section A.7 [Page Table], page 79). In projects 2 and later, main() also calls tss_init() and gdt_init(). The next set of calls initializes the interrupt system. intr_init() sets up the CPUs interrupt descriptor table (IDT) to ready it for interrupt handling (see Section A.4.1 [Interrupt Infrastructure], page 72), then timer_init() and kbd_init() prepare for handling timer interrupts and keyboard interrupts, respectively. input_init() sets up to merge serial and keyboard input into one stream. In projects 2 and later, we also prepare to handle interrupts caused by user programs using exception_init() and syscall_init(). Now that interrupts are set up, we can start the scheduler with thread_start(), which creates the idle thread and enables interrupts. With interrupts enabled, interrupt-driven serial port I/O becomes possible, so we use serial_init_queue() to switch to that mode. Finally, timer_calibrate() calibrates the timer for accurate short delays. If the le system is compiled in, as it will starting in project 2, we initialize the IDE disks with ide_init(), then the le system with filesys_init(). Boot is complete, so we print a message. Function run_actions() now parses and executes actions specied on the kernel command line, such as run to run a test (in project 1) or a user program (in later projects). Finally, if -q was specied on the kernel command line, we call shutdown_power_off() to terminate the machine simulator. Otherwise, main() calls thread_exit(), which allows any other running threads to continue running.
61
A.2 Threads
A.2.1 struct thread
The main Pintos data threads/thread.h. structure for threads is struct thread, declared in
struct thread
[Structure] Represents a thread or a user process. In the projects, you will have to add your own members to struct thread. You may also change or delete the denitions of existing members. Every struct thread occupies the beginning of its own page of memory. The rest of the page is used for the threads stack, which grows downward from the end of the page. It looks like this: 4 kB +---------------------------------+ | kernel stack | | | | | | | | V | | grows downward | | | | | | | | | | | | | | | | | sizeof (struct thread) +---------------------------------+ | magic | | : | | : | | status | | tid | 0 kB +---------------------------------+ This has two consequences. First, struct thread must not be allowed to grow too big. If it does, then there will not be enough room for the kernel stack. The base struct thread is only a few bytes in size. It probably should stay well under 1 kB. Second, kernel stacks must not be allowed to grow too large. If a stack overows, it will corrupt the thread state. Thus, kernel functions should not allocate large structures or arrays as non-static local variables. Use dynamic allocation with malloc() or palloc_get_page() instead (see Section A.5 [Memory Allocation], page 75).
tid_t tid
[Member of struct thread] The threads thread identier or tid. Every thread must have a tid that is unique over the entire lifetime of the kernel. By default, tid_t is a typedef for int and each new thread receives the numerically next higher tid, starting from 1 for the initial process. You can change the type and the numbering scheme if you like.
62
THREAD_RUNNING
[Thread State] The thread is running. Exactly one thread is running at a given time. thread_ current() returns the running thread. [Thread State] The thread is ready to run, but its not running right now. The thread could be selected to run the next time the scheduler is invoked. Ready threads are kept in a doubly linked list called ready_list.
THREAD_READY
THREAD_BLOCKED
[Thread State] The thread is waiting for something, e.g. a lock to become available, an interrupt to be invoked. The thread wont be scheduled again until it transitions to the THREAD_READY state with a call to thread_unblock(). This is most conveniently done indirectly, using one of the Pintos synchronization primitives that block and unblock threads automatically (see Section A.3 [Synchronization], page 66). There is no a priori way to tell what a blocked thread is waiting for, but a backtrace can help (see Section E.4 [Backtraces], page 103). [Thread State] The thread will be destroyed by the scheduler after switching to the next thread.
[Member of struct thread] The threads name as a string, or at least the rst few characters of it.
uint8_t * stack
[Member of struct thread] Every thread has its own stack to keep track of its state. When the thread is running, the CPUs stack pointer register tracks the top of the stack and this member is unused. But when the CPU switches to another thread, this member saves the threads stack pointer. No other members are needed to save the threads registers, because the other registers that must be saved are saved on the stack. When an interrupt occurs, whether in the kernel or a user program, an struct intr_ frame is pushed onto the stack. When the interrupt occurs in a user program, the struct intr_frame is always at the very top of the page. See Section A.4 [Interrupt Handling], page 71, for more information. [Member of struct thread] A thread priority, ranging from PRI_MIN (0) to PRI_MAX (63). Lower numbers correspond to lower priorities, so that priority 0 is the lowest priority and priority 63 is the highest. Pintos as provided ignores thread priorities, but you will implement priority scheduling in project 1 (see Section 2.2.3 [Priority Scheduling], page 16). [Member of struct thread] This list element is used to link the thread into the list of all threads. Each thread is inserted into this list when it is created and removed when it exits. The thread_ foreach() function should be used to iterate over all threads.
int priority
63
[Member of struct thread] A list element used to put the thread into doubly linked lists, either ready_list (the list of threads ready to run) or a list of threads waiting on a semaphore in sema_ down(). It can do double duty because a thread waiting on a semaphore is not ready, and vice versa. [Member of struct thread] Only present in project 2 and later. See Section 4.1.2.3 [Page Tables], page 40.
[Member of struct thread] Always set to THREAD_MAGIC, which is just an arbitrary number dened in threads/thread.c, and used to detect stack overow. thread_current() checks that the magic member of the running threads struct thread is set to THREAD_MAGIC. Stack overow tends to change this value, triggering the assertion. For greatest benet, as you add members to struct thread, leave magic at the end.
[Function] Called by main() to initialize the thread system. Its main purpose is to create a struct thread for Pintoss initial thread. This is possible because the Pintos loader puts the initial threads stack at the top of a page, in the same position as any other Pintos thread. Before thread_init() runs, thread_current() will fail because the running threads magic value is incorrect. Lots of functions call thread_current() directly or indirectly, including lock_acquire() for locking a lock, so thread_init() is called early in Pintos initialization. [Function] Called by main() to start the scheduler. Creates the idle thread, that is, the thread that is scheduled when no other thread is ready. Then enables interrupts, which as a side eect enables the scheduler because the scheduler runs on return from the timer interrupt, using intr_yield_on_return() (see Section A.4.3 [External Interrupt Handling], page 74). [Function] Called by the timer interrupt at each timer tick. It keeps track of thread statistics and triggers the scheduler when a time slice expires. [Function] [Function] Called during Pintos shutdown to print thread statistics.
void thread_print_stats (void) tid_t thread_create (const char *name, int priority, thread func *func, void *aux )
Creates and starts a new thread named name with the given priority, returning the new threads tid. The thread executes func, passing aux as the functions single argument.
64
thread_create() allocates a page for the threads struct thread and stack and initializes its members, then it sets up a set of fake stack frames for it (see Section A.2.3 [Thread Switching], page 65). The thread is initialized in the blocked state, then unblocked just before returning, which allows the new thread to be scheduled (see [Thread States], page 62).
[Type] This is the type of the function passed to thread_create(), whose aux argument is passed along as the functions argument.
[Function] Transitions the running thread from the running state to the blocked state (see [Thread States], page 62). The thread will not run again until thread_unblock() is called on it, so youd better have some way arranged for that to happen. Because thread_block() is so low-level, you should prefer to use one of the synchronization primitives instead (see Section A.3 [Synchronization], page 66). [Function] Transitions thread, which must be in the blocked state, to the ready state, allowing it to resume running (see [Thread States], page 62). This is called when the event that the thread is waiting for occurs, e.g. when the lock that the thread is waiting on becomes available. [Function] Returns the running thread.
[Function] Returns the running threads thread id. Equivalent to thread_current ()->tid. [Function] Returns the running threads name. Equivalent to thread_current ()->name.
[Function] Causes the current thread to exit. Never returns, hence NO_RETURN (see Section E.3 [Function and Parameter Attributes], page 102). [Function] Yields the CPU to the scheduler, which picks a new thread to run. The new thread might be the current thread, so you cant depend on this function to keep this thread from running for any particular length of time.
[Function] Iterates over all threads t and invokes action(t, aux) on each. action must refer to a function that matches the signature given by thread_action_func():
[Type]
[Function] [Function] Stub to set and get thread priority. See Section 2.2.3 [Priority Scheduling], page 16.
65
int thread_get_nice (void) void thread_set_nice (int new_nice ) int thread_get_recent_cpu (void) int thread_get_load_avg (void)
[Function] [Function] [Function] [Function] Stubs for the advanced scheduler. See Appendix B [4.4BSD Scheduler], page 91.
This is because switch_threads() takes arguments on the stack and the 80x86 SVR4 calling convention requires the caller, not the called function, to remove them when the call is complete. See [SysV-i386] chapter 3 for details.
66
The nal stack frame is for kernel_thread(), which enables interrupts and calls the threads function (the function passed to thread_create()). If the threads function returns, it calls thread_exit() to terminate the thread.
A.3 Synchronization
If sharing of resources between threads is not handled in a careful, controlled fashion, the result is usually a big mess. This is especially the case in operating system kernels, where faulty sharing can crash the entire machine. Pintos provides several synchronization primitives to help out.
enum intr_level
[Type] One of INTR_OFF or INTR_ON, denoting that interrupts are disabled or enabled, respectively. [Function] Returns the current interrupt state.
enum intr_level intr_get_level (void) enum intr_level intr_set_level (enum intr level level ) enum intr_level intr_enable (void)
Turns interrupts on. Returns the previous interrupt state.
[Function] Turns interrupts on or o according to level. Returns the previous interrupt state. [Function] [Function]
67
A.3.2 Semaphores
A semaphore is a nonnegative integer together with two operators that manipulate it atomically, which are: Down or P: wait for the value to become positive, then decrement it. Up or V: increment the value (and wake up one waiting thread, if any). A semaphore initialized to 0 may be used to wait for an event that will happen exactly once. For example, suppose thread A starts another thread B and wants to wait for B to signal that some activity is complete. A can create a semaphore initialized to 0, pass it to B as it starts it, and then down the semaphore. When B nishes its activity, it ups the semaphore. This works regardless of whether A downs the semaphore or B ups it rst. A semaphore initialized to 1 is typically used for controlling access to a resource. Before a block of code starts using the resource, it downs the semaphore, then after it is done with the resource it ups the resource. In such a case a lock, described below, may be more appropriate. Semaphores can also be initialized to values larger than 1. These are rarely used. Semaphores were invented by Edsger Dijkstra and rst used in the THE operating system ([Dijkstra]). Pintos semaphore type and operations are declared in threads/synch.h.
struct semaphore
Represents a semaphore.
[Type] [Function]
[Function] Executes the down or P operation on sema, waiting for its value to become positive and then decrementing it by one. [Function] Tries to execute the down or P operation on sema, without waiting. Returns true if sema was successfully decremented, or false if it was already zero and thus could not be decremented without waiting. Calling this function in a tight loop wastes CPU time, so use sema_down() or nd a dierent approach instead. [Function] Executes the up or V operation on sema, incrementing its value. If any threads are waiting on sema, wakes one of them up.
Unlike most synchronization primitives, sema_up() may be called inside an external interrupt handler (see Section A.4.3 [External Interrupt Handling], page 74). Semaphores are internally built out of disabling interrupt (see Section A.3.1 [Disabling Interrupts], page 66) and thread blocking and unblocking (thread_block() and thread_ unblock()). Each semaphore maintains a list of waiting threads, using the linked list implementation in lib/kernel/list.c.
68
A.3.3 Locks
A lock is like a semaphore with an initial value of 1 (see Section A.3.2 [Semaphores], page 67). A locks equivalent of up is called release, and the down operation is called acquire. Compared to a semaphore, a lock has one added restriction: only the thread that acquires a lock, called the locks owner, is allowed to release it. If this restriction is a problem, its a good sign that a semaphore should be used, instead of a lock. Locks in Pintos are not recursive, that is, it is an error for the thread currently holding a lock to try to acquire that lock. Lock types and functions are declared in threads/synch.h.
struct lock
Represents a lock.
[Type]
[Function] Initializes lock as a new lock. The lock is not initially owned by any thread. [Function] Acquires lock for the current thread, rst waiting for any current owner to release it if necessary.
[Function] Tries to acquire lock for use by the current thread, without waiting. Returns true if successful, false if the lock is already owned. Calling this function in a tight loop is a bad idea because it wastes CPU time, so use lock_acquire() instead. [Function] Releases lock, which the current thread must own.
void lock_release (struct lock *lock ) bool lock_held_by_current_thread (const struct lock *lock )
[Function] Returns true if the running thread owns lock, false otherwise. There is no function to test whether an arbitrary thread owns a lock, because the answer could change before the caller could act on it.
A.3.4 Monitors
A monitor is a higher-level form of synchronization than a semaphore or a lock. A monitor consists of data being synchronized, plus a lock, called the monitor lock, and one or more condition variables. Before it accesses the protected data, a thread rst acquires the monitor lock. It is then said to be in the monitor. While in the monitor, the thread has control over all the protected data, which it may freely examine or modify. When access to the protected data is complete, it releases the monitor lock. Condition variables allow code in the monitor to wait for a condition to become true. Each condition variable is associated with an abstract condition, e.g. some data has arrived for processing or over 10 seconds has passed since the users last keystroke. When code in the monitor needs to wait for a condition to become true, it waits on the associated condition variable, which releases the lock and waits for the condition to be signaled. If, on the other hand, it has caused one of these conditions to become true, it signals the condition to wake up one waiter, or broadcasts the condition to wake all of them.
69
The theoretical framework for monitors was laid out by C. A. R. Hoare ([Hoare]). Their practical usage was later elaborated in a paper on the Mesa operating system ([Lampson]). Condition variable types and functions are declared in threads/synch.h.
struct condition
Represents a condition variable.
[Type] [Function]
[Function] Atomically releases lock (the monitor lock) and waits for cond to be signaled by some other piece of code. After cond is signaled, reacquires lock before returning. lock must be held before calling this function. Sending a signal and waking up from a wait are not an atomic operation. Thus, typically cond_wait()s caller must recheck the condition after the wait completes and, if necessary, wait again. See the next section for an example. [Function] If any threads are waiting on cond (protected by monitor lock lock), then this function wakes up one of them. If no threads are waiting, returns without performing any action. lock must be held before calling this function. [Function] Wakes up all threads, if any, waiting on cond (protected by monitor lock lock). lock must be held before calling this function.
70
cond_signal (¬_empty, &lock); /* buf cant be empty anymore. */ lock_release (&lock); } char get (void) { char ch; lock_acquire (&lock); while (n == 0) /* Cant read buf as long as its empty. */ cond_wait (¬_empty, &lock); ch = buf[tail++ % BUF_SIZE]; /* Get ch from buf. */ n--; cond_signal (¬_full, &lock); /* buf cant be full anymore. */ lock_release (&lock); } Note that BUF_SIZE must divide evenly into SIZE_MAX + 1 for the above code to be completely correct. Otherwise, it will fail the rst time head wraps around to 0. In practice, BUF_SIZE would ordinarily be a power of 2.
71
Finally, optimization barriers can be used to force the ordering of memory reads or writes. For example, suppose we add a feature that, whenever a timer interrupt occurs, the character in global variable timer_put_char is printed on the console, but only if global Boolean variable timer_do_put is true. The best way to set up x to be printed is then to use an optimization barrier, like this: timer_put_char = x; barrier (); timer_do_put = true; Without the barrier, the code is buggy because the compiler is free to reorder operations when it doesnt see a reason to keep them in the same order. In this case, the compiler doesnt know that the order of assignments is important, so its optimizer is permitted to exchange their order. Theres no telling whether it will actually do this, and it is possible that passing the compiler dierent optimization ags or using a dierent version of the compiler will produce dierent behavior. Another solution is to disable interrupts around the assignments. This does not prevent reordering, but it prevents the interrupt handler from intervening between the assignments. It also has the extra runtime cost of disabling and re-enabling interrupts: enum intr_level old_level = intr_disable (); timer_put_char = x; timer_do_put = true; intr_set_level (old_level); A second solution is to mark the declarations of timer_put_char and timer_do_put as volatile. This keyword tells the compiler that the variables are externally observable and restricts its latitude for optimization. However, the semantics of volatile are not welldened, so it is not a good general solution. The base Pintos code does not use volatile at all. The following is not a solution, because locks neither prevent interrupts nor prevent the compiler from reordering the code within the region where the lock is held: lock_acquire (&timer_lock); timer_put_char = x; timer_do_put = true; lock_release (&timer_lock); /* INCORRECT CODE */
The compiler treats invocation of any function dened externally, that is, in another source le, as a limited form of optimization barrier. Specically, the compiler assumes that any externally dened function may access any statically or dynamically allocated data and any local variable whose address is taken. This often means that explicit barriers can be omitted. It is one reason that Pintos contains few explicit barriers. A function dened in the same source le, or in a header included by the source le, cannot be relied upon as a optimization barrier. This applies even to invocation of a function before its denition, because the compiler may read and parse the entire source le before performing optimization.
72
73
[Type] This is how an interrupt handler function must be declared. Its frame argument (see below) allows it to determine the cause of the interrupt and the state of the thread that was interrupted.
struct intr_frame
[Type] The stack frame of an interrupt handler, as saved by the CPU, the interrupt stubs, and intr_entry(). Its most interesting members are described below. [Member of struct intr_frame] [Member of struct intr_frame] [Member of struct intr_frame] [Member of struct intr_frame] [Member of struct intr_frame] [Member of struct intr_frame] [Member of struct intr_frame] [Member of struct intr_frame] [Member of struct intr_frame] [Member of struct intr_frame] Register values in the interrupted thread, pushed by intr_entry(). The esp_dummy value isnt actually used (refer to the description of PUSHA in [IA32-v2b] for details).
uint32_t uint32_t uint32_t uint32_t uint32_t uint32_t uint32_t uint32_t uint16_t uint16_t
uint32_t vec_no
[Member of struct intr_frame] The interrupt vector number, ranging from 0 to 255. [Member of struct intr_frame] The error code pushed on the stack by the CPU for some internal interrupts.
[Member of struct intr_frame] The address of the next instruction to be executed by the interrupted thread. [Member of struct intr_frame] The interrupted threads stack pointer.
[Function] Returns the name of the interrupt numbered vec, or "unknown" if the interrupt has no registered name.
74
preempted by other kernel threads. Thus, they do need to synchronize with other threads on shared data and other resources (see Section A.3 [Synchronization], page 66). Internal interrupt handlers can be invoked recursively. For example, the system call handler might cause a page fault while attempting to read user memory. Deep recursion would risk overowing the limited kernel stack (see Section A.2.1 [struct thread], page 61), but should be unnecessary.
void intr_register_int (uint8 t vec, int dpl, enum intr level level, intr handler func *handler, const char *name )
[Function]
Registers handler to be called when internal interrupt numbered vec is triggered. Names the interrupt name for debugging purposes. If level is INTR_ON, external interrupts will be processed normally during the interrupt handlers execution, which is normally desirable. Specifying INTR_OFF will cause the CPU to disable external interrupts when it invokes the interrupt handler. The eect is slightly dierent from calling intr_disable() inside the handler, because that leaves a window of one or more CPU instructions in which external interrupts are still enabled. This is important for the page fault handler; refer to the comments in userprog/exception.c for details. dpl determines how the interrupt can be invoked. If dpl is 0, then the interrupt can be invoked only by kernel threads. Otherwise dpl should be 3, which allows user processes to invoke the interrupt with an explicit INT instruction. The value of dpl doesnt aect user processes ability to invoke the interrupt indirectly, e.g. an invalid memory reference will cause a page fault regardless of dpl.
75
IDT, it also initializes the PICs for interrupt handling. The PICs also must be acknowledged at the end of processing for each external interrupt. intr_handler() takes care of that by calling pic_end_of_interrupt(), which properly signals the PICs. The following functions relate to external interrupts.
void intr_register_ext (uint8 t vec, intr handler func *handler, const char *name )
[Function]
Registers handler to be called when external interrupt numbered vec is triggered. Names the interrupt name for debugging purposes. The handler will run with interrupts disabled.
[Function] Returns true if we are running in an interrupt context, otherwise false. Mainly used in functions that might sleep or that otherwise should not be called from interrupt context, in this form: ASSERT (!intr_context ());
[Function] When called in an interrupt context, causes thread_yield() to be called just before the interrupt returns. Used in the timer interrupt handler when a threads time slice expires, to cause a new thread to be scheduled.
76
fail due to fragmentation, so requests for multiple contiguous pages should be limited as much as possible. Pages may not be allocated from interrupt context, but they may be freed. When a page is freed, all of its bytes are cleared to 0xcc, as a debugging aid (see Section E.8 [Debugging Tips], page 112). Page allocator types and functions are described below.
void * palloc_get_page (enum palloc ags flags ) void * palloc_get_multiple (enum palloc ags flags, size t page_cnt )
[Function] [Function]
Obtains and returns one page, or page cnt contiguous pages, respectively. Returns a null pointer if the pages cannot be allocated. The ags argument may be any combination of the following ags:
PAL_ASSERT
[Page Allocator Flag] If the pages cannot be allocated, panic the kernel. This is only appropriate during kernel initialization. User processes should never be permitted to panic the kernel.
PAL_ZERO
[Page Allocator Flag] Zero all the bytes in the allocated pages before returning them. If not set, the contents of newly allocated pages are unpredictable. [Page Allocator Flag] Obtain the pages from the user pool. If not set, pages are allocated from the kernel pool.
PAL_USER
void palloc_free_page (void *page ) void palloc_free_multiple (void *pages, size t page_cnt )
[Function] [Function] Frees one page, or page cnt contiguous pages, respectively, starting at pages. All of the pages must have been obtained using palloc_get_page() or palloc_get_ multiple().
77
As long as a page can be obtained from the page allocator, small allocations always succeed. Most small allocations do not require a new page from the page allocator at all, because they are satised using part of a page already allocated. However, large allocations always require calling into the page allocator, and any allocation that needs more than one contiguous page can fail due to fragmentation, as already discussed in the previous section. Thus, you should minimize the number of large allocations in your code, especially those over approximately 4 kB each. When a block is freed, all of its bytes are cleared to 0xcc, as a debugging aid (see Section E.8 [Debugging Tips], page 112). The block allocator may not be called from interrupt context. The block allocator functions are described below. Their interfaces are the same as the standard C library functions of the same names.
[Function] Obtains and returns a new block, from the kernel pool, at least size bytes long. Returns a null pointer if size is zero or if memory is not available.
[Function] Obtains a returns a new block, from the kernel pool, at least a * b bytes long. The blocks contents will be cleared to zeros. Returns a null pointer if a or b is zero or if insucient memory is available. [Function] Attempts to resize block to new size bytes, possibly moving it in the process. If successful, returns the new block, in which case the old block must no longer be accessed. On failure, returns a null pointer, and the old block remains valid. A call with block null is equivalent to malloc(). A call with new size zero is equivalent to free(). [Function] Frees block, which must have been previously returned by malloc(), calloc(), or realloc() (and not yet freed).
PGSHIFT PGBITS
[Macro] [Macro] The bit index (0) and number of bits (12) of the oset part of a virtual address, respectively.
78
PGMASK PGSIZE
[Macro] A bit mask with the bits in the page oset set to 1, the rest set to 0 (0xfff). [Macro] [Function] [Function] The page size in bytes (4,096).
[Function] Returns the start of the virtual page that va points within, that is, va with the page oset set to 0. [Function] Returns va rounded up to the nearest page boundary.
Virtual memory in Pintos is divided into two regions: user virtual memory and kernel virtual memory (see Section 3.1.4 [Virtual Memory Layout], page 25). The boundary between them is PHYS_BASE:
PHYS_BASE
[Macro] Base address of kernel virtual memory. It defaults to 0xc0000000 (3 GB), but it may be changed to any multiple of 0x10000000 from 0x80000000 to 0xf0000000. User virtual memory ranges from virtual address 0 up to PHYS_BASE. Kernel virtual memory occupies the rest of the virtual address space, from PHYS_BASE up to 4 GB. [Function] [Function] Returns true if va is a user or kernel virtual address, respectively, false otherwise.
bool is_user_vaddr (const void *va ) bool is_kernel_vaddr (const void *va )
The 80x86 doesnt provide any way to directly access memory given a physical address. This ability is often necessary in an operating system kernel, so Pintos works around it by mapping kernel virtual memory one-to-one to physical memory. That is, virtual address PHYS_BASE accesses physical address 0, virtual address PHYS_BASE + 0x1234 accesses physical address 0x1234, and so on up to the size of the machines physical memory. Thus, adding PHYS_BASE to a physical address obtains a kernel virtual address that accesses that address; conversely, subtracting PHYS_BASE from a kernel virtual address obtains the corresponding physical address. Header threads/vaddr.h provides a pair of functions to do these translations:
[Function] Returns the kernel virtual address corresponding to physical address pa, which should be between 0 and the number of bytes of physical memory. [Function] Returns the physical address corresponding to va, which must be a kernel virtual address.
79
[Function] Creates and returns a new page table. The new page table contains Pintoss normal kernel virtual page mappings, but no user virtual mappings. Returns a null pointer if memory cannot be obtained.
[Function] Frees all of the resources held by pd, including the page table itself and the frames that it maps. [Function] Activates pd. The active page table is the one used by the CPU to translate memory references.
bool pagedir_set_page (uint32 t *pd, void *upage, void *kpage, bool writable )
[Function]
Adds to pd a mapping from user page upage to the frame identied by kernel virtual address kpage. If writable is true, the page is mapped read/write; otherwise, it is mapped read-only. User page upage must not already be mapped in pd. Kernel page kpage should be a kernel virtual address obtained from the user pool with palloc_get_page(PAL_USER) (see [Why PAL USER?], page 49). Returns true if successful, false on failure. Failure will occur if additional memory required for the page table cannot be obtained.
[Function] Looks up the frame mapped to uaddr in pd. Returns the kernel virtual address for that frame, if uaddr is mapped, or a null pointer if it is not. [Function]
80
Other bits in the page table for page are preserved, permitting the accessed and dirty bits (see the next section) to be checked. This function has no eect if page is not mapped.
bool pagedir_is_dirty (uint32 t *pd, const void *page ) bool pagedir_is_accessed (uint32 t *pd, const void *page )
[Function] [Function] Returns true if page directory pd contains a page table entry for page that is marked dirty (or accessed). Otherwise, returns false. [Function] [Function]
void pagedir_set_dirty (uint32 t *pd, const void *page, bool value ) void pagedir_set_accessed (uint32 t *pd, const void *page, bool value )
If page directory pd has a page table entry for page, then its dirty (or accessed) bit is set to value.
A.7.4.1 Structure
The top-level paging data structure is a page called the page directory (PD) arranged as an array of 1,024 32-bit page directory entries (PDEs), each of which represents 4 MB of virtual memory. Each PDE may point to the physical address of another page called a page table (PT) arranged, similarly, as an array of 1,024 32-bit page table entries (PTEs), each of which translates a single 4 kB virtual page to a physical page. Translation of a virtual address into a physical address follows the three-step process illustrated in the diagram below:2 1. The most-signicant 10 bits of the virtual address (bits 22. . . 31) index the page directory. If the PDE is marked present, the physical address of a page table is read from the PDE thus obtained. If the PDE is marked not present then a page fault occurs.
2
Actually, virtual to physical translation on the 80x86 architecture occurs via an intermediate linear address, but Pintos (and most modern 80x86 OSes) set up the CPU so that linear and virtual addresses are one and the same. Thus, you can eectively ignore this CPU feature.
81
2. The next 10 bits of the virtual address (bits 12. . . 21) index the page table. If the PTE is marked present, the physical address of a data page is read from the PTE thus obtained. If the PTE is marked not present then a page fault occurs. 3. The least-signicant 12 bits of the virtual address (bits 0. . . 11) are added to the data pages physical base address, yielding the nal physical address. 31 22 21 12 11 0 +----------------------+----------------------+----------------------+ | Page Directory Index | Page Table Index | Page Offset | +----------------------+----------------------+----------------------+ | | | _______/ _______/ _____/ / / / / Page Directory / Page Table / Data Page / .____________. / .____________. / .____________. |1,023|____________| |1,023|____________| | |____________| |1,022|____________| |1,022|____________| | |____________| |1,021|____________| |1,021|____________| \__\|____________| |1,020|____________| |1,020|____________| /|____________| | | | | | | | | | | | \____\| |_ | | | | . | /| . | \ | . | \____\| . |_ | . | | | . | /| . | \ | . | | | . | | . | | | . | | | . | | | | | | | | | |____________| | |____________| | |____________| 4|____________| | 4|____________| | |____________| 3|____________| | 3|____________| | |____________| 2|____________| | 2|____________| | |____________| 1|____________| | 1|____________| | |____________| 0|____________| \__\0|____________| \____\|____________| / / Pintos provides some macros and functions that are useful for working with raw page tables:
PTSHIFT PTBITS
[Macro] [Macro] The starting bit index (12) and number of bits (10), respectively, in a page table index. [Macro] A bit mask with the bits in the page table index set to 1 and the rest set to 0 (0x3ff000). [Macro] The number of bytes of virtual address space that a single page table page covers (4,194,304 bytes, or 4 MB).
PTMASK
PTSPAN
82
PDSHIFT PDBITS
[Macro] [Macro] The starting bit index (22) and number of bits (10), respectively, in a page directory index. [Macro] A bit mask with the bits in the page directory index set to 1 and other bits set to 0 (0xffc00000). [Function] [Function] Returns the page directory index or page table index, respectively, for virtual address va. These functions are dened in threads/pte.h. [Function] This function is dened in
PDMASK
uintptr_t pd_no (const void *va ) uintptr_t pt_no (const void *va )
PTE_P
[Macro] Bit 0, the present bit. When this bit is 1, the other bits are interpreted as described below. When this bit is 0, any attempt to access the page will page fault. The remaining bits are then not used by the CPU and may be used by the OS for any purpose. [Macro] Bit 1, the read/write bit. When it is 1, the page is writable. When it is 0, write attempts will page fault. [Macro] Bit 2, the user/supervisor bit. When it is 1, user processes may access the page. When it is 0, only the kernel may access the page (user accesses will page fault). Pintos clears this bit in PTEs for kernel virtual memory, to prevent user processes from accessing them.
PTE_W
PTE_U
83
PTE_A
[Macro] Bit 5, the accessed bit. See Section A.7.3 [Page Table Accessed and Dirty Bits], page 80. [Macro] Bit 6, the dirty bit. See Section A.7.3 [Page Table Accessed and Dirty Bits], page 80. [Macro] Bits 9. . . 11, available for operating system use. Pintos, as provided, does not use them and sets them to 0. [Macro] Bits 12. . . 31, the top 20 bits of the physical address of a frame. The low 12 bits of the frames address are always 0.
PTE_D
PTE_AVL
PTE_ADDR
Other bits are either reserved or uninteresting in a Pintos context and should be set to 0. Header threads/pte.h denes three functions for working with page table entries:
[Function] Returns a page table entry that points to page, which should be a kernel virtual address. The PTEs present bit will be set. It will be marked for kernel-only access. If writable is true, the PTE will also be marked read/write; otherwise, it will be read-only.
[Function] Returns a page table entry that points to page, which should be the kernel virtual address of a frame in the user pool (see [Why PAL USER?], page 49). The PTEs present bit will be set and it will be marked to allow user-mode access. If writable is true, the PTE will also be marked read/write; otherwise, it will be read-only. [Function] Returns the kernel virtual address for the frame that pte points to. The pte may be present or not-present; if it is not-present then the pointer returned is only meaningful if the address bits in the PTE actually represent a physical address.
[Function] Returns a page directory that points to page, which should be the kernel virtual address of a page table page. The PDEs present bit will be set, it will be marked to allow user-mode access, and it will be marked read/write. [Function] Returns the kernel virtual address for the page table page that pde, which must be marked present, points to.
84
struct hash
[Type] Represents an entire hash table. The actual members of struct hash are opaque. That is, code that uses a hash table should not access struct hash members directly, nor should it need to. Instead, use hash table functions and macros.
struct hash_elem
[Type] Embed a struct hash_elem member in the structure you want to include in a hash table. Like struct hash, struct hash_elem is opaque. All functions for operating on hash table elements actually take and return pointers to struct hash_elem, not pointers to your hash tables real element type.
You will often need to obtain a struct hash_elem given a real element of the hash table, and vice versa. Given a real element of the hash table, you may use the & operator to obtain a pointer to its struct hash_elem. Use the hash_entry() macro to go the other direction.
[Macro] Returns a pointer to the structure that elem, a pointer to a struct hash_elem, is embedded within. You must provide type, the name of the structure that elem is inside, and member, the name of the member in type that elem points to.
For example, suppose h is a struct hash_elem * variable that points to a struct thread member (of type struct hash_elem) named h_elem. Then, hash_entry (h, struct thread, h_elem) yields the address of the struct thread that h points within. See Section A.8.5 [Hash Table Example], page 88, for an example. Each hash table element must contain a key, that is, data that identies and distinguishes elements, which must be unique among elements in the hash table. (Elements may also contain non-key data that need not be unique.) While an element is in a hash table, its key data must not be changed. Instead, if need be, remove the element from the hash table, modify its key, then reinsert the element. For each hash table, you must write two functions that act on keys: a hash function and a comparison function. These functions must match the following prototypes:
85
[Type]
Returns a hash of elements data, as a value anywhere in the range of unsigned int. The hash of an element should be a pseudo-random function of the elements key. It must not depend on non-key data in the element or on any non-constant data other than the key. Pintos provides the following functions as a suitable basis for hash functions.
[Function] Returns a hash of the size bytes starting at buf. The implementation is the general-purpose Fowler-Noll-Vo hash for 32-bit words. [Function] [Function] Returns a hash of null-terminated string s.
If your key is a single piece of data of an appropriate type, it is sensible for your hash function to directly return the output of one of these functions. For multiple pieces of data, you may wish to combine the output of more than one call to them using, e.g., the ^ (exclusive or) operator. Finally, you may entirely ignore these functions and write your own hash function from scratch, but remember that your goal is to build an operating system kernel, not to design a hash function. See Section A.8.6 [Hash Auxiliary Data], page 89, for an explanation of aux.
bool hash_less_func (const struct hash_elem *a, const struct hash_elem *b, void *aux )
[Type]
Compares the keys stored in elements a and b. Returns true if a is less than b, false if a is greater than or equal to b. If two elements compare equal, then they must hash to equal values. See Section A.8.6 [Hash Auxiliary Data], page 89, for an explanation of aux. See Section A.8.5 [Hash Table Example], page 88, for hash and comparison function examples. A few functions accept a pointer to a third kind of function as an argument:
[Type]
bool hash_init (struct hash *hash, hash hash func *hash_func, hash less func *less_func, void *aux )
[Function]
Initializes hash as a hash table with hash func as hash function, less func as comparison function, and aux as auxiliary data. Returns true if successful, false on failure. hash_init() calls malloc() and fails if memory cannot be allocated.
86
See Section A.8.6 [Hash Auxiliary Data], page 89, for an explanation of aux, which is most often a null pointer.
[Function] Removes all the elements from hash, which must have been previously initialized with hash_init(). If action is non-null, then it is called once for each element in the hash table, which gives the caller an opportunity to deallocate any memory or other resources used by the element. For example, if the hash table elements are dynamically allocated using malloc(), then action could free() the element. This is safe because hash_ clear() will not access the memory in a given hash element after calling action on it. However, action must not call any function that may modify the hash table, such as hash_insert() or hash_delete(). [Function] If action is non-null, calls it for each element in the hash, with the same semantics as a call to hash_clear(). Then, frees the memory held by hash. Afterward, hash must not be passed to any hash table function, absent an intervening call to hash_init(). [Function] Returns the number of elements currently stored in hash.
size_t hash_size (struct hash *hash ) bool hash_empty (struct hash *hash )
[Function] Returns true if hash currently contains no elements, false if hash contains at least one element.
struct hash_elem * hash_insert (struct hash *hash, struct hash elem *element )
[Function]
Searches hash for an element equal to element. If none is found, inserts element into hash and returns a null pointer. If the table already contains an element equal to element, it is returned without modifying hash.
struct hash_elem * hash_replace (struct hash *hash, struct hash elem *element )
[Function]
Inserts element into hash. Any element equal to element already in hash is removed. Returns the element removed, or a null pointer if hash did not contain an element equal to element. The caller is responsible for deallocating any resources associated with the returned element, as appropriate. For example, if the hash table elements are dynamically allocated using malloc(), then the caller must free() the element after it is no longer needed. The element passed to the following functions is only used for hashing and comparison purposes. It is never actually inserted into the hash table. Thus, only key data in the
87
element needs to be initialized, and other data in the element will not be used. It often makes sense to declare an instance of the element type as a local variable, initialize the key data, and then pass the address of its struct hash_elem to hash_find() or hash_ delete(). See Section A.8.5 [Hash Table Example], page 88, for an example. (Large structures should not be allocated as local variables. See Section A.2.1 [struct thread], page 61, for more information.)
struct hash_elem * hash_find (struct hash *hash, struct hash elem *element )
[Function]
Searches hash for an element equal to element. Returns the element found, if any, or a null pointer otherwise.
struct hash_elem * hash_delete (struct hash *hash, struct hash elem *element )
[Function]
Searches hash for an element equal to element. If one is found, it is removed from hash and returned. Otherwise, a null pointer is returned and hash is unchanged. The caller is responsible for deallocating any resources associated with the returned element, as appropriate. For example, if the hash table elements are dynamically allocated using malloc(), then the caller must free() the element after it is no longer needed.
[Function] Calls action once for each element in hash, in arbitrary order. action must not call any function that may modify the hash table, such as hash_insert() or hash_delete(). action must not modify key data in elements, although it may modify any other data.
The second interface is based on an iterator data type. Idiomatically, iterators are used as follows: struct hash_iterator i; hash_first (&i, h); while (hash_next (&i)) { struct foo *f = hash_entry (hash_cur (&i), struct foo, elem); . . . do something with f . . . }
struct hash_iterator
[Type] Represents a position within a hash table. Calling any function that may modify a hash table, such as hash_insert() or hash_delete(), invalidates all iterators within that hash table. Like struct hash and struct hash_elem, struct hash_elem is opaque.
88
[Function]
[Function] Advances iterator to the next element in hash, and returns that element. Returns a null pointer if no elements remain. After hash_next() returns null for iterator, calling it again yields undened behavior.
[Function] Returns the value most recently returned by hash_next() for iterator. Yields undened behavior after hash_first() has been called on iterator but before hash_ next() has been called for the rst time.
89
struct hash pages; hash_init (&pages, page_hash, page_less, NULL); Now we can manipulate the hash table weve created. If p is a pointer to a struct page, we can insert it into the hash table with: hash_insert (&pages, &p->hash_elem); If theres a chance that pages might already contain a page with the same addr, then we should check hash_insert()s return value. To search for an element in the hash table, use hash_find(). This takes a little setup, because hash_find() takes an element to compare against. Heres a function that will nd and return a page based on a virtual address, assuming that pages is dened at le scope: /* Returns the page containing the given virtual address, or a null pointer if no such page exists. */ struct page * page_lookup (const void *address) { struct page p; struct hash_elem *e; p.addr = address; e = hash_find (&pages, &p.hash_elem); return e != NULL ? hash_entry (e, struct page, hash_elem) : NULL; } struct page is allocated as a local variable here on the assumption that it is fairly small. Large structures should not be allocated as local variables. See Section A.2.1 [struct thread], page 61, for more information. A similar function could delete a page by address using hash_delete().
A.8.7 Synchronization
The hash table does not do any internal synchronization. It is the callers responsibility to synchronize calls to hash table functions. In general, any number of functions that examine but do not modify the hash table, such as hash_find() or hash_next(), may execute simultaneously. However, these function cannot safely execute at the same time as any
90
function that may modify a given hash table, such as hash_insert() or hash_delete(), nor may more than one function that can modify a given hash table execute safely at once. It is also the callers responsibility to synchronize access to data in hash table elements. How to synchronize access to this data depends on how it is designed and organized, as with any other data structure.
91
B.1 Niceness
Thread priority is dynamically determined by the scheduler using a formula given below. However, each thread also has an integer nice value that determines how nice the thread should be to other threads. A nice of zero does not aect thread priority. A positive nice, to the maximum of 20, decreases the priority of a thread and causes it to give up some CPU time it would otherwise receive. On the other hand, a negative nice, to the minimum of -20, tends to take away CPU time from other threads. The initial thread starts with a nice value of zero. Other threads start with a nice value inherited from their parent thread. You must implement the functions described below, which are for use by test programs. We have provided skeleton denitions for them in threads/thread.c.
[Function]
[Function] Sets the current threads nice value to new nice and recalculates the threads priority based on the new value (see Section B.2 [Calculating Priority], page 91). If the running thread no longer has the highest priority, yields.
92
initialization. It is also recalculated once every fourth clock tick, for every thread. In either case, it is determined by the formula priority = PRI_MAX - (recent_cpu / 4) - (nice * 2), where recent cpu is an estimate of the CPU time the thread has used recently (see below) and nice is the threads nice value. The result should be rounded down to the nearest integer (truncated). The coecients 1/4 and 2 on recent cpu and nice, respectively, have been found to work well in practice but lack deeper meaning. The calculated priority is always adjusted to lie in the valid range PRI_MIN to PRI_MAX. This formula gives a thread that has received CPU time recently lower priority for being reassigned the CPU the next time the scheduler runs. This is key to preventing starvation: a thread that has not received any CPU time recently will have a recent cpu of 0, which barring a high nice value should ensure that it receives CPU time soon.
93
Assumptions made by some of the tests require that these recalculations of recent cpu be made exactly when the system tick counter reaches a multiple of a second, that is, when timer_ticks () % TIMER_FREQ == 0, and not at any other time. The value of recent cpu can be negative for a thread with a negative nice value. Do not clamp negative recent cpu to 0. You may need to think about the order of calculations in this formula. We recommend computing the coecient of recent cpu rst, then multiplying. Some students have reported that multiplying load avg by recent cpu directly can cause overow. You must implement thread_get_recent_cpu(), for which there is a skeleton in threads/thread.c.
[Function] Returns 100 times the current threads recent cpu value, rounded to the nearest integer.
[Function] Returns 100 times the current system load average, rounded to the nearest integer.
B.5 Summary
The following formulas summarize the calculations required to implement the scheduler. They are not a complete description of scheduler requirements. Every thread has a nice value between -20 and 20 directly under its control. Each thread also has a priority, between 0 (PRI_MIN) through 63 (PRI_MAX), which is recalculated using the following formula every fourth tick: priority = PRI_MAX - (recent_cpu / 4) - (nice * 2). recent cpu measures the amount of CPU time a thread has received recently. On each timer tick, the running threads recent cpu is incremented by 1. Once per second, every threads recent cpu is updated this way: recent_cpu = (2*load_avg )/(2*load_avg + 1) * recent_cpu + nice .
94
load avg estimates the average number of threads ready to run over the past minute. It is initialized to 0 at boot and recalculated once per second as follows: load_avg = (59/60)*load_avg + (1/60)*ready_threads . where ready threads is the number of threads that are either running or ready to run at time of update (not including the idle thread).
Because we are working in binary, the decimal point might more correctly be called the binary point, but the meaning should be clear.
95
This section has consistently used multiplication or division by f , instead of q-bit shifts, for two reasons. First, multiplication and division do not have the surprising operator precedence of the C shift operators. Second, multiplication and division are well-dened on negative operands, but the C shift operators are not. Take care with these issues in your implementation. The following table summarizes how xed-point arithmetic operations can be implemented in C. In the table, x and y are xed-point numbers, n is an integer, xed-point numbers are in signed p.q format where p + q = 31, and f is 1 << q: Convert n to xed point: n*f Convert x to integer (rounding toward zero): Convert x to integer (rounding to nearest): Add x and y: Subtract y from x: Add x and n: Subtract n from x: Multiply x by y: Multiply x by n: Divide x by y: Divide x by n: x/f (x + f / 2) / f if x >= 0, (x - f / 2) / f if x <= 0. x+y x-y x+n*f x-n*f ((int64_t) x) * y / f x*n ((int64_t) x) * f / y x/n
96
C.1 Style
Style, for the purposes of our grading, refers to how readable your code is. At minimum, this means that your code is well formatted, your variable names are descriptive and your functions are decomposed and well commented. Any other factors which make it hard (or easy) for us to read or use your code will be reected in your style grade. The existing Pintos code is written in the GNU style and largely follows the GNU Coding Standards. We encourage you to follow the applicable parts of them too, especially chapter 5, Making the Best Use of C. Using a dierent style wont cause actual problems, but its ugly to see gratuitous dierences in style from one function to another. If your code is too ugly, it will cost you points. Please limit C source le lines to at most 79 characters long. Pintos comments sometimes refer to external standards or specications by writing a name inside square brackets, like this: [IA32-v3a]. These names refer to the reference names used in this documentation (see [Bibliography], page 117). If you remove existing Pintos code, please delete it from your source le entirely. Dont just put it into a comment or a conditional compilation directive, because that makes the resulting code hard to read. Were only going to do a compile in the directory for the project being submitted. You dont need to make sure that the previous projects also compile. Project code should be written so that all of the subproblems for the project function together, that is, without the need to rebuild with dierent macros dened, etc. If you do extra credit work that changes normal Pintos behavior so as to interfere with grading, then you must implement it so that it only acts that way when given a special command-line option of the form -name , where name is a name of your choice. You can add such an option by modifying parse_options() in threads/init.c. The introduction describes additional coding style requirements (see Section 1.2.2 [Design], page 5).
C.2 C99
The Pintos source code uses a few features of the C99 standard library that were not in the original 1989 standard for C. Many programmers are unaware of these feature, so we will describe them. The new features used in Pintos are mostly in new headers: <stdbool.h> Denes macros bool, a 1-bit type that takes on only the values 0 and 1, true, which expands to 1, and false, which expands to 0.
97
<stdint.h> On systems that support them, this header denes types intn _t and uintn _t for n = 8, 16, 32, 64, and possibly other values. These are 2s complement signed and unsigned types, respectively, with the given number of bits. On systems where it is possible, this header also denes types intptr_t and uintptr_t, which are integer types big enough to hold a pointer. On all systems, this header denes types intmax_t and uintmax_t, which are the systems signed and unsigned integer types with the widest ranges. For every signed integer type type _t dened here, as well as for ptrdiff_t dened in <stddef.h>, this header also denes macros TYPE _MAX and TYPE _ MIN that give the types range. Similarly, for every unsigned integer type type _t dened here, as well as for size_t dened in <stddef.h>, this header denes a TYPE _MAX macro giving its maximum value. <inttypes.h> <stdint.h> provides no straightforward way to format the types it denes with printf() and related functions. This header provides macros to help with that. For every intn _t dened by <stdint.h>, it provides macros PRIdn and PRIin for formatting values of that type with "%d" and "%i". Similarly, for every uintn _t, it provides PRIon , PRIun , PRIux , and PRIuX . You use these something like this, taking advantage of the fact that the C compiler concatenates adjacent string literals: #include <inttypes.h> ... int32_t value = ...; printf ("value=%08"PRId32"\n", value); The % is not supplied by the PRI macros. As shown above, you supply it yourself and follow it by any ags, eld width, etc. <stdio.h> The printf() function has some new type modiers for printing standard types: j z t For intmax_t (e.g. %jd) or uintmax_t (e.g. %ju). For size_t (e.g. %zu). For ptrdiff_t (e.g. %td).
Pintos printf() also implements a nonstandard ag that groups large numbers with commas to make them easier to read.
98
strncpy() This function can leave its destination buer without a null string terminator. It also has performance problems. Again, use strlcpy(). strcat() strncat() The meaning of its buer size argument is surprising. Again, use strlcat(). strtok() Uses global data, so it is unsafe in threaded programs such as kernels. Use strtok_r() instead, and see its source code in lib/string.c for documentation and an example. Same issue as strcpy(). Use snprintf() instead. Refer to comments in lib/stdio.h for documentation. vsprintf() Same issue as strcpy(). Use vsnprintf() instead. If you try to use any of these functions, the error message will give you a hint by referring to an identier like dont_use_sprintf_use_snprintf. Same issue as strcpy(). Use strlcat() instead. Again, refer to comments in its source code in lib/string.c for documentation.
sprintf()
99
[Function] Blocks the current thread until thread tid exits. If A is the running thread and B is the argument, then we say that A joins B. Incidentally, the argument is a thread id, instead of a thread pointer, because a thread pointer is not unique over time. That is, when a thread dies, its memory may be, whether immediately or much later, reused for another thread. If thread A over time had two children B and C that were stored at the same address, then thread_join(B ) and thread_join(C ) would be ambiguous. A thread may only join its immediate children. Calling thread_join() on a thread that is not the callers child should cause the caller to return immediately. Children are not inherited, that is, if A has child B and B has child C, then A always returns immediately should it try to join C, even if B is dead. A thread need not ever be joined. Your solution should properly free all of a threads resources, including its struct thread, whether it is ever joined or not, and regardless of whether the child exits before or after its parent. That is, a thread should be freed exactly once in all cases. Joining a given thread is idempotent. That is, joining a thread multiple times is equivalent to joining it once, because it has already exited at the time of the later joins. Thus, joins on a given thread after the rst should return immediately. You must handle all the ways a join can occur: nested joins (A joins B, then B joins C), multiple joins (A joins B, then A joins C), and so on.
100
>> If you have any preliminary comments on your submission, notes for >> the TAs, or extra credit, please give them here. (This is a sample design document.) >> Please cite any offline or online sources you consulted while >> preparing your submission, other than the Pintos documentation, >> course text, and lecture notes. None. JOIN ==== ---- DATA STRUCTURES --->> Copy here the declaration of each new or changed struct or struct >> member, global or static variable, typedef, or enumeration. >> Identify the purpose of each in 25 words or less. A "latch" is a new synchronization primitive. Acquires block until the first release. Afterward, all ongoing and future acquires pass immediately. /* Latch. */ struct latch { bool released; struct lock monitor_lock; struct condition rel_cond; }; Added to struct thread: /* Members for implementing thread_join(). */ struct latch ready_to_die; /* Release when thread about to die. */ struct semaphore can_die; /* Up when thread allowed to die. */ struct list children; /* List of child threads. */ list_elem children_elem; /* Element of children list. */ ---- ALGORITHMS --->> Briefly describe your implementation of thread_join() and how it >> interacts with thread termination. thread_join() finds the joined child on the threads list of children and waits for the child to exit by acquiring the childs
101
ready_to_die latch. When thread_exit() is called, the thread releases its ready_to_die latch, allowing the parent to continue. ---- SYNCHRONIZATION --->> >> >> >> >> Consider parent thread P with child thread C. How do you ensure proper synchronization and avoid race conditions when P calls wait(C) before C exits? After C exits? How do you ensure that all resources are freed in each case? How about when P terminates without waiting, before C exits? After C exits? Are there any special cases?
C waits in thread_exit() for P to die before it finishes its own exit, using the can_die semaphore "down"ed by C and "up"ed by P as it exits. Regardless of whether whether C has terminated, there is no race on wait(C), because C waits for Ps permission before it frees itself. Regardless of whether P waits for C, P still "up"s Cs can_die semaphore when P dies, so C will always be freed. (However, freeing Cs resources is delayed until Ps death.) The initial thread is a special case because it has no parent to wait for it or to "up" its can_die semaphore. Therefore, its can_die semaphore is initialized to 1. ---- RATIONALE --->> Critique your design, pointing out advantages and disadvantages in >> your design choices. This design has the advantage of simplicity. Encapsulating most of the synchronization logic into a new "latch" structure abstracts what little complexity there is into a separate layer, making the design easier to reason about. Also, all the new data members are in struct thread, with no need for any extra dynamic allocation, etc., that would require extra management code. On the other hand, this design is wasteful in that a child thread cannot free itself before its parent has terminated. A parent thread that creates a large number of short-lived child threads could unnecessarily exhaust kernel memory. This is probably acceptable for implementing kernel threads, but it may be a bad idea for use with user processes because of the larger number of resources that user processes tend to own.
102
E.1 printf()
Dont underestimate the value of printf(). The way printf() is implemented in Pintos, you can call it from practically anywhere in the kernel, whether its in a kernel thread or an interrupt handler, almost regardless of what locks are held. printf() is useful for more than just examining data. It can also help gure out when and where something goes wrong, even when the kernel crashes or panics without a useful error message. The strategy is to sprinkle calls to printf() with dierent strings (e.g. "<1>", "<2>", . . . ) throughout the pieces of code you suspect are failing. If you dont even see <1> printed, then something bad happened before that point, if you see <1> but not <2>, then something bad happened between those two points, and so on. Based on what you learn, you can then insert more printf() calls in the new, smaller region of code you suspect. Eventually you can narrow the problem down to a single statement. See Section E.6 [Triple Faults], page 111, for a related technique.
E.2 ASSERT
Assertions are useful because they can catch problems early, before theyd otherwise be noticed. Ideally, each function should begin with a set of assertions that check its arguments for validity. (Initializers for functions local variables are evaluated before assertions are checked, so be careful not to assume that an argument is valid in an initializer.) You can also sprinkle assertions throughout the body of functions in places where you suspect things are likely to go wrong. They are especially useful for checking loop invariants. Pintos provides the ASSERT macro, dened in <debug.h>, for checking assertions.
ASSERT (expression)
[Macro] Tests the value of expression. If it evaluates to zero (false), the kernel panics. The panic message includes the expression that failed, its le and line number, and a backtrace, which should help you to nd the problem. See Section E.4 [Backtraces], page 103, for more information.
UNUSED
[Macro] Appended to a function parameter to tell the compiler that the parameter might not be used within the function. It suppresses the warning that would otherwise appear. [Macro] Appended to a function prototype to tell the compiler that the function never returns. It allows the compiler to ne-tune its warnings and its code generation.
NO_RETURN
103
NO_INLINE
[Macro] Appended to a function prototype to tell the compiler to never emit the function in-line. Occasionally useful to improve the quality of backtraces (see below).
[Macro] Appended to a function prototype to tell the compiler that the function takes a printf()-like format string as the argument numbered format (starting from 1) and that the corresponding value arguments start at the argument numbered rst. This lets the compiler tell you if you pass the wrong argument types.
E.4 Backtraces
When the kernel panics, it prints a backtrace, that is, a summary of how your program got where it is, as a list of addresses inside the functions that were running at the time of the panic. You can also insert a call to debug_backtrace(), prototyped in <debug.h>, to print a backtrace at any point in your code. debug_backtrace_all(), also declared in <debug.h>, prints backtraces of all threads. The addresses in a backtrace are listed as raw hexadecimal numbers, which are dicult to interpret. We provide a tool called backtrace to translate these into function names and source le line numbers. Give it the name of your kernel.o as the rst argument and the hexadecimal numbers composing the backtrace (including the 0x prexes) as the remaining arguments. It outputs the function name and source le line numbers that correspond to each address. If the translated form of a backtrace is garbled, or doesnt make sense (e.g. function A is listed above function B, but B doesnt call A), then its a good sign that youre corrupting a kernel threads stack, because the backtrace is extracted from the stack. Alternatively, it could be that the kernel.o you passed to backtrace is not the same kernel that produced the backtrace. Sometimes backtraces can be confusing without any corruption. Compiler optimizations can cause surprising behavior. When a function has called another function as its nal action (a tail call), the calling function may not appear in a backtrace at all. Similarly, when function A calls another function B that never returns, the compiler may optimize such that an unrelated function C appears in the backtrace instead of A. Function C is simply the function that happens to be in memory just after A. In the threads project, this is commonly seen in backtraces for test failures; see [pass() Fails], page 18, for more information.
E.4.1 Example
Heres an example. Suppose that Pintos printed out this following call stack, which is taken from an actual Pintos submission for the le system project: Call stack: 0xc0106eff 0xc01102fb 0xc010dc22 0xc010cf67 0xc0102319 0xc010325a 0x804812c 0x8048a96 0x8048ac8. You would then invoke the backtrace utility like shown below, cutting and pasting the backtrace information into the command line. This assumes that kernel.o is in the current directory. You would of course enter all of the following on a single shell command line, even though that would overow our margins here:
104
backtrace kernel.o 0xc0106eff 0xc01102fb 0xc010dc22 0xc010cf67 0xc0102319 0xc010325a 0x804812c 0x8048a96 0x8048ac8 The backtrace output would then look something like this: 0xc0106eff: debug_panic (lib/debug.c:86) 0xc01102fb: file_seek (filesys/file.c:405) 0xc010dc22: seek (userprog/syscall.c:744) 0xc010cf67: syscall_handler (userprog/syscall.c:444) 0xc0102319: intr_handler (threads/interrupt.c:334) 0xc010325a: intr_entry (threads/intr-stubs.S:38) 0x0804812c: (unknown) 0x08048a96: (unknown) 0x08048ac8: (unknown) (You will probably not see exactly the same addresses if you run the command above on your own kernel binary, because the source code you compiled and the compiler you used are probably dierent.) The rst line in the backtrace refers to debug_panic(), the function that implements kernel panics. Because backtraces commonly result from kernel panics, debug_panic() will often be the rst function shown in a backtrace. The second line shows file_seek() as the function that panicked, in this case as the result of an assertion failure. In the source code tree used for this example, line 405 of filesys/file.c is the assertion ASSERT (file_ofs >= 0); (This line was also cited in the assertion failure message.) Thus, file_seek() panicked because it passed a negative le oset argument. The third line indicates that seek() called file_seek(), presumably without validating the oset argument. In this submission, seek() implements the seek system call. The fourth line shows that syscall_handler(), the system call handler, invoked seek(). The fth and sixth lines are the interrupt handler entry path. The remaining lines are for addresses below PHYS_BASE. This means that they refer to addresses in the user program, not in the kernel. If you know what user program was running when the kernel panicked, you can re-run backtrace on the user program, like so: (typing the command on a single line, of course): backtrace tests/filesys/extended/grow-too-big 0xc0106eff 0xc01102fb 0xc010dc22 0xc010cf67 0xc0102319 0xc010325a 0x804812c 0x8048a96 0x8048ac8 The results look like this: 0xc0106eff: (unknown) 0xc01102fb: (unknown) 0xc010dc22: (unknown) 0xc010cf67: (unknown) 0xc0102319: (unknown) 0xc010325a: (unknown) 0x0804812c: test_main (...xtended/grow-too-big.c:20) 0x08048a96: main (tests/main.c:10)
105
0x08048ac8: _start (lib/user/entry.c:9) You can even specify both the kernel and the user program names on the command line, like so: backtrace kernel.o tests/filesys/extended/grow-too-big 0xc0106eff 0xc01102fb 0xc010dc22 0xc010cf67 0xc0102319 0xc010325a 0x804812c 0x8048a96 0x8048ac8 The result is a combined backtrace: In kernel.o: 0xc0106eff: debug_panic (lib/debug.c:86) 0xc01102fb: file_seek (filesys/file.c:405) 0xc010dc22: seek (userprog/syscall.c:744) 0xc010cf67: syscall_handler (userprog/syscall.c:444) 0xc0102319: intr_handler (threads/interrupt.c:334) 0xc010325a: intr_entry (threads/intr-stubs.S:38) In tests/filesys/extended/grow-too-big: 0x0804812c: test_main (...xtended/grow-too-big.c:20) 0x08048a96: main (tests/main.c:10) 0x08048ac8: _start (lib/user/entry.c:9) Heres an extra tip for anyone who read this far: backtrace is smart enough to strip the Call stack: header and . trailer from the command line if you include them. This can save you a little bit of trouble in cutting and pasting. Thus, the following command prints the same output as the rst one we used: backtrace kernel.o Call stack: 0xc0106eff 0xc01102fb 0xc010dc22 0xc010cf67 0xc0102319 0xc010325a 0x804812c 0x8048a96 0x8048ac8.
E.5 GDB
You can run Pintos under the supervision of the GDB debugger. First, start Pintos with the --gdb option, e.g. pintos --gdb -- run mytest. Second, open a second terminal on the same machine and use pintos-gdb to invoke GDB on kernel.o:1 pintos-gdb kernel.o and issue the following GDB command: target remote localhost:1234 Now GDB is connected to the simulator over a local network connection. You can now issue any normal GDB commands. If you issue the c command, the simulated BIOS will take control, load Pintos, and then Pintos will run in the usual way. You can pause the process at any point with Ctrl+C .
pintos-gdb is a wrapper around gdb (80x86) or i386-elf-gdb (SPARC) that loads the Pintos macros at startup.
106
c
Continues execution until
Ctrl+C
[GDB Command] or the next breakpoint. [GDB Command] [GDB Command] [GDB Command] Sets a breakpoint at function, at line within le, or address. (Use a 0x prex to specify an address in hex.) Use break main to make GDB stop when Pintos starts running. [GDB Command] Evaluates the given expression and prints its value. If the expression contains a function call, that function will actually be executed.
p expression
l *address
[GDB Command] Lists a few lines of code around address. (Use a 0x prex to specify an address in hex.)
bt
[GDB Command] Prints a stack backtrace similar to that output by the backtrace program described above. [GDB Command] Prints the name of the function or variable that occupies address. (Use a 0x prex to specify an address in hex.) [GDB Command] Disassembles function.
p/a address
diassemble function
We also provide a set of macros specialized for debugging Pintos, written by Godmar Back [email protected]. You can type help user-defined for basic help with the macros. Here is an overview of their functionality, based on Godmars documentation:
debugpintos
[GDB Macro] Attach debugger to a waiting pintos process on the same machine. Shorthand for target remote localhost:1234.
[GDB Macro] Prints the elements of list, which should be a struct list that contains elements of the given type (without the word struct) in which element is the struct list_elem member that links the elements. Example: dumplist all_list thread allelem prints all elements of struct thread that are linked in struct list all_list using the struct list_elem allelem which is part of struct thread.
btthread thread
[GDB Macro] Shows the backtrace of thread, which is a pointer to the struct thread of the thread whose backtrace it should show. For the current thread, this is identical to the bt (backtrace) command. It also works for any thread suspended in schedule(), provided you know where its kernel stack page is located.
107
[GDB Macro] Shows the backtraces of all threads in list, the struct list in which the threads are kept. Specify element as the struct list_elem eld used inside struct thread to link the threads together. Example: btthreadlist all_list allelem shows the backtraces of all threads contained in struct list all_list, linked together by allelem. This command is useful to determine where your threads are stuck when a deadlock occurs. Please see the example scenario below. [GDB Macro] Short-hand for btthreadlist all_list allelem.
btthreadall btpagefault
[GDB Macro] Print a backtrace of the current thread after a page fault exception. Normally, when a page fault exception occurs, GDB will stop with a message that might say:2 Program received signal 0, Signal 0. 0xc0102320 in intr0e_stub () In that case, the bt command might not give a useful backtrace. Use btpagefault instead. You may also use btpagefault for page faults that occur in a user process. In this case, you may wish to also load the user programs symbol table using the loadusersymbols macro, as described above. [GDB Macro] GDB invokes this macro every time the simulation stops, which Bochs will do for every processor exception, among other reasons. If the simulation stops due to a page fault, hook-stop will print a message that says and explains further whether the page fault occurred in the kernel or in user code. If the exception occurred from user code, hook-stop will say: pintos-debug: a page fault exception occurred in user mode pintos-debug: hit c to continue, or s to step to intr_handler In Project 2, a page fault in a user process leads to the termination of the process. You should expect those page faults to occur in the robustness tests where we test that your kernel properly terminates processes that try to access invalid addresses. To debug those, set a break point in page_fault() in exception.c, which you will need to modify accordingly. In Project 3, a page fault in a user process no longer automatically leads to the termination of a process. Instead, it may require reading in data for the page the process was trying to access, either because it was swapped out or because this is the rst time its accessed. In either case, you will reach page_fault() and need to take the appropriate action there. If the page fault did not occur in user mode while executing a user process, then it occurred in kernel mode while executing kernel code. In this case, hook-stop will print this message:
hook-stop
To be precise, GDB will stop only when running under Bochs. When running under QEMU, you must set a breakpoint in the page_fault function to stop execution when a page fault occurs. In that case, the btpagefault macro is unnecessary.
108
pintos-debug: a page fault occurred in kernel mode followed by the output of the btpagefault command. Before Project 3, a page fault exception in kernel code is always a bug in your kernel, because your kernel should never crash. Starting with Project 3, the situation will change if you use the get_user() and put_user() strategy to verify user memory accesses (see Section 3.1.5 [Accessing User Memory], page 27).
Then, I open a second window on the same machine and start GDB:
$ pintos-gdb kernel.o GNU gdb Red Hat Linux (6.3.0.0-1.84rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"... Using host libthread_db library "/lib/libthread_db.so.1".
109
Pintos booting with 4,096 kB RAM... Kernel command line: -q -mlfqs run mlfqs-load-1 374 pages available in kernel pool. 373 pages available in user pool. Calibrating timer... 102,400 loops/s. Boot complete. Executing mlfqs-load-1: (mlfqs-load-1) begin (mlfqs-load-1) spinning for up to 45 seconds, please wait... (mlfqs-load-1) load average rose to 0.5 after 42 seconds (mlfqs-load-1) sleeping for another 10 seconds, please wait...
. . . until it gets stuck because of the bug I had introduced. I hit window:
Ctrl+C
in the debugger
Program received signal 0, Signal 0. 0xc010168c in next_thread_to_run () at ../../threads/thread.c:649 649 while (i <= PRI_MAX && list_empty (&ready_list[i])) (gdb)
The thread that was running when I interrupted Pintos was the idle thread. If I run backtrace, it shows this backtrace:
(gdb) bt #0 0xc010168c in next_thread_to_run () at ../../threads/thread.c:649 #1 0xc0101778 in schedule () at ../../threads/thread.c:714 #2 0xc0100f8f in thread_block () at ../../threads/thread.c:324 #3 0xc0101419 in idle (aux=0x0) at ../../threads/thread.c:551 #4 0xc010145a in kernel_thread (function=0xc01013ff , aux=0x0) at ../../threads/thread.c:575 #5 0x00000000 in ?? ()
Not terribly useful. What I really like to know is whats up with the other thread (or threads). Since I keep all threads in a linked list called all_list, linked together by a struct list_elem member named allelem, I can use the btthreadlist macro from the macro library I wrote. btthreadlist iterates through the list of threads and prints the backtrace for each thread:
(gdb) btthreadlist all list allelem pintos-debug: dumping backtrace of thread main @0xc002f000 #0 0xc0101820 in schedule () at ../../threads/thread.c:722 #1 0xc0100f8f in thread_block () at ../../threads/thread.c:324 #2 0xc0104755 in timer_sleep (ticks=1000) at ../../devices/timer.c:141 #3 0xc010bf7c in test_mlfqs_load_1 () at ../../tests/threads/mlfqs-load-1.c:49 #4 0xc010aabb in run_test (name=0xc0007d8c "mlfqs-load-1") at ../../tests/threads/tests.c:50 #5 0xc0100647 in run_task (argv=0xc0110d28) at ../../threads/init.c:281 #6 0xc0100721 in run_actions (argv=0xc0110d28) at ../../threads/init.c:331 #7 0xc01000c7 in main () at ../../threads/init.c:140 pintos-debug: dumping backtrace of thread idle @0xc0116000 #0 0xc010168c in next_thread_to_run () at ../../threads/thread.c:649 #1 0xc0101778 in schedule () at ../../threads/thread.c:714 #2 0xc0100f8f in thread_block () at ../../threads/thread.c:324 #3 0xc0101419 in idle (aux=0x0) at ../../threads/thread.c:551 #4 0xc010145a in kernel_thread (function=0xc01013ff , aux=0x0) at ../../threads/thread.c:575 #5 0x00000000 in ?? ()
110
In this case, there are only two threads, the idle thread and the main thread. The kernel stack pages (to which the struct thread points) are at 0xc0116000 and 0xc002f000, respectively. The main thread is stuck in timer_sleep(), called from test_mlfqs_load_1. Knowing where threads are stuck can be tremendously useful, for instance when diagnosing deadlocks or unexplained hangs.
loadusersymbols
[GDB Macro] You can also use GDB to debug a user program running under Pintos. To do that, use the loadusersymbols macro to load the programs symbol table: loadusersymbols program where program is the name of the programs executable (in the host le system, not in the Pintos le system). For example, you may issue:
(gdb) loadusersymbols tests/userprog/exec-multiple add symbol table from file "tests/userprog/exec-multiple" at .text_addr = 0x80480a0 (gdb)
After this, you should be able to debug the user program the same way you would the kernel, by placing breakpoints, inspecting data, etc. Your actions apply to every user program running in Pintos, not just to the one you want to debug, so be careful in interpreting the results: GDB does not know which process is currently active (because that is an abstraction the Pintos kernel creates). Also, a name that appears in both the kernel and the user program will actually refer to the kernel name. (The latter problem can be avoided by giving the user executable name on the GDB command line, instead of kernel.o, and then using loadusersymbols to load kernel.o.) loadusersymbols is implemented via GDBs add-symbol-file command.
E.5.3 FAQ
GDB cant connect to Bochs. If the target remote command fails, then make sure that both GDB and pintos are running on the same machine by running hostname in each terminal. If the names printed dier, then you need to open a new terminal for GDB on the machine running pintos. GDB doesnt recognize any of the macros. If you start GDB with pintos-gdb, it should load the Pintos macros automatically. If you start GDB some other way, then you must issue the command source pintosdir /src/misc/gdb-macros, where pintosdir is the root of your Pintos directory, before you can use them. Can I debug Pintos with DDD? Yes, you can. DDD invokes GDB as a subprocess, so youll need to tell it to invokes pintos-gdb instead: ddd --gdb --debugger pintos-gdb Can I use GDB inside Emacs? Yes, you can. Emacs has special support for running GDB as a subprocess. Type M-x gdb and enter your pintos-gdb command at the prompt. The Emacs manual has information on how to use its debugging features in a section titled Debuggers.
111
GDB is doing something weird. If you notice strange behavior while using GDB, there are three possibilities: a bug in your modied Pintos, a bug in Bochss interface to GDB or in GDB itself, or a bug in the original Pintos code. The rst and second are quite likely, and you should seriously consider both. We hope that the third is less likely, but it is also possible.
112
not very hard. You start by retrieving the source code for Bochs 2.2.6 from http://bochs.sourceforge.net and saving the le bochs-2.2.6.tar.gz into a directory. The script pintos/src/misc/bochs-2.2.6-build.sh applies a number of patches contained in pintos/src/misc to the Bochs tree, then builds Bochs and installs it in a directory of your choice. Run this script without arguments to learn usage instructions. To use your bochs binary with pintos, put it in your PATH, and make sure that it is earlier than /usr/class/cs140/uname -m/bin/bochs. Of course, to get any good out of this youll have to actually modify Bochs. Instructions for doing this are rmly out of the scope of this document. However, if you want to debug page faults as suggested above, a good place to start adding printf()s is BX_CPU_ C::dtranslate_linear() in cpu/paging.cc.
E.8 Tips
The page allocator in threads/palloc.c and the block allocator in threads/malloc.c clear all the bytes in memory to 0xcc at time of free. Thus, if you see an attempt to dereference a pointer like 0xcccccccc, or some other reference to 0xcc, theres a good chance youre trying to reuse a page thats already been freed. Also, byte 0xcc is the CPU opcode for invoke interrupt 3, so if you see an error like Interrupt 0x03 (#BP Breakpoint Exception), then Pintos tried to execute code in a freed page or block. An assertion failure on the expression sec_no < d->capacity indicates that Pintos tried to access a le through an inode that has been closed and freed. Freeing an inode clears its starting sector number to 0xcccccccc, which is not a valid sector number for disks smaller than about 1.6 TB.
113
F.1 Tags
Tags are an index to the functions and global variables declared in a program. Many editors, including Emacs and vi, can use them. The Makefile in pintos/src produces Emacs-style tags with the command make TAGS or vi-style tags with make tags. In Emacs, use M-. to follow a tag in the current window, C-x 4 . in a new window, or C-x 5 . in a new frame. If your cursor is on a symbol name for any of those commands, it becomes the default target. If a tag name has multiple denitions, M-0 M-. jumps to the next one. To jump back to where you were before you followed the last tag, use M-*.
F.2 cscope
The cscope program also provides an index to functions and variables declared in a program. It has some features that tag facilities lack. Most notably, it can nd all the points in a program at which a given function is called. The Makefile in pintos/src produces cscope indexes when it is invoked as make cscope. Once the index has been generated, run cscope from a shell command line; no command-line arguments are normally necessary. Then use the arrow keys to choose one of the search criteria listed near the bottom of the terminal, type in an identier, and hit Enter . cscope will then display the matches in the upper part of the terminal. You may use the arrow keys to choose a particular match; if you then hit Enter , cscope will invoke the default system editor1 and position the cursor on that match. To start a new search, type Tab . To exit cscope, type Ctrl-d. Emacs and some versions of vi have their own interfaces to cscope. For information on how to use these interface, visit the cscope home page.
F.3 Git
Its crucial that you use a source code control system to manage your Pintos code. This will allow you to keep track of your changes and coordinate changes made by dierent people in the project. For this class we recommend that you use Git; if you followed the instructions on getting started, a Git repository will already have been created for you. If you dont already know how to use Git, we recommend that you read the Pro Git book online.
F.4 VNC
VNC stands for Virtual Network Computing. It is, in essence, a remote display system which allows you to view a computing desktop environment not only on the machine where it is running, but from anywhere on the Internet and from a wide variety of machine architectures. It is already installed on the lab machines. For more information, look at the VNC Home Page.
1
114
115
5. Pintos should now be ready for use. If you have the Pintos reference solutions, which are provided only to faculty and their teaching assistants, then you may test your installation by running make check in the top-level tests directory. The tests take between 20 minutes and 1 hour to run, depending on the speed of your hardware. 6. Optional: Build the documentation, by running make dist in the top-level doc directory. This creates a WWW subdirectory within doc that contains HTML and PDF versions of the documentation, plus the design document templates and various hardware specications referenced by the documentation. Building the PDF version of the manual requires Texinfo and TEX (see above). You may install WWW wherever you nd most useful. The doc directory is not included in the .tar.gz distributed for Pintos. It is in the Pintos CVS tree available via :pserver:[email protected]:/var/lib/cvs, in the pintos module. The CVS tree is not the authoritative source for Stanford course materials, which should be obtained from the course website.
116
bochs-2.2.6-page-fault-segv.patch Makes the GDB stub report a SIGSEGV to the debugger when a page-fault exception occurs, instead of signal 0. The former can be ignored with handle SIGSEGV nostop but the latter cannot. bochs-2.2.6-paranoia.patch Fixes compile error with modern versions of GCC. bochs-2.2.6-solaris-link.patch Needed on Solaris hosts. Do not apply it elsewhere. To apply all the patches, cd into the Bochs directory, then type: patch -p1 < $PINTOSDIR/src/misc/bochs-2.2.6-big-endian.patch patch -p1 < $PINTOSDIR/src/misc/bochs-2.2.6-jitter.patch patch -p1 < $PINTOSDIR/src/misc/bochs-2.2.6-triple-fault.patch patch -p1 < $PINTOSDIR/src/misc/bochs-2.2.6-ms-extensions.patch patch -p1 < $PINTOSDIR/src/misc/bochs-2.2.6-solaris-tty.patch patch -p1 < $PINTOSDIR/src/misc/bochs-2.2.6-page-fault-segv.patch patch -p1 < $PINTOSDIR/src/misc/bochs-2.2.6-paranoia.patch patch -p1 < $PINTOSDIR/src/misc/bochs-2.2.6-solaris-link.patch You will have to supply the proper $PINTOSDIR, of course. You can use patchs --dry-run option if you want to test whether the patches would apply cleanly before trying to apply them. Sample commands to build and install Bochs for Pintos are supplied in src/misc/bochs-2.2.6-build.sh.
Appendix G: Bibliography
117
Bibliography
Hardware References
[IA32-v1]. IA-32 Intel Architecture Software Developers Manual Volume 1: Basic Architecture. Basic 80x86 architecture and programming environment. Available via developer.intel.com. Section numbers in this document refer to revision 18. [IA32-v2a]. IA-32 Intel Architecture Software Developers Manual Volume 2A: Instruction Set Reference A-M. 80x86 instructions whose names begin with A through M. Available via developer.intel.com. Section numbers in this document refer to revision 18. [IA32-v2b]. IA-32 Intel Architecture Software Developers Manual Volume 2B: Instruction Set Reference N-Z. 80x86 instructions whose names begin with N through Z. Available via developer.intel.com. Section numbers in this document refer to revision 18. [IA32-v3a]. IA-32 Intel Architecture Software Developers Manual Volume 3A: System Programming Guide. Operating system support, including segmentation, paging, tasks, interrupt and exception handling. Available via developer.intel.com. Section numbers in this document refer to revision 18. [FreeVGA]. FreeVGA Project. Documents the VGA video hardware used in PCs. [kbd]. Keyboard scancodes. Documents PC keyboard interface. [ATA-3]. AT Attachment-3 Interface (ATA-3) Working Draft. Draft of an old version of the ATA aka IDE interface for the disks used in most desktop PCs. National Semiconductor PC16550D Universal Asynchronous [PC16550D]. ceiver/Transmitter with FIFOs. Datasheet for a chip used for PC serial ports. [8254]. Intel 8254 Programmable Interval Timer. Datasheet for PC timer chip. [8259A]. Intel 8259A Programmable Interrupt Controller (8259A/8259A-2). Datasheet for PC interrupt controller chip. [MC146818A]. Motorola MC146818A Real Time Clock Plus Ram (RTC). Datasheet for PC real-time clock chip. Re-
Software References
[ELF1]. Tool Interface Standard (TIS) Executable and Linking Format (ELF) Specication Version 1.2 Book I: Executable and Linking Format. The ubiquitous format for executables in modern Unix systems. [ELF2]. Tool Interface Standard (TIS) Executable and Linking Format (ELF) Specication Version 1.2 Book II: Processor Specic (Intel Architecture). 80x86-specic parts of ELF. [ELF3]. Tool Interface Standard (TIS) Executable and Linking Format (ELF) Specication Version 1.2 Book III: Operating System Specic (UNIX System V Release 4). Unix-specic parts of ELF. [SysV-ABI]. System V Application Binary Interface: Edition 4.1. Species how applications interface with the OS under Unix. [SysV-i386]. System V Application Binary Interface: Intel386 Architecture Processor Supplement: Fourth Edition. 80x86-specic parts of the Unix interface.
Appendix G: Bibliography
118
[SysV-ABI-update]. System V Application Binary InterfaceDRAFT24 April 2001. A draft of a revised version of [SysV-ABI] which was never completed. [SUSv3]. The Open Group, Single UNIX Specication V3, 2001. [Partitions]. A. E. Brouwer, Minimal partition table specication, 1999. [IntrList]. R. Brown, Ralf Browns Interrupt List, 2000.
Appendix G: License
119
License
Pintos, including its documentation, is subject to the following license: Copyright c 2004, 2005, 2006 Board of Trustees, Leland Stanford Jr. University. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation les (the Software), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. A few individual les in Pintos were originally derived from other projects, but they have been extensively modied for use in Pintos. The original code falls under the original license, and modications for Pintos are additionally covered by the Pintos license above. In particular, code derived from Nachos is subject to the following license: Copyright c 1992-1996 The Regents of the University of California. All rights reserved. Permission to use, copy, modify, and distribute this software and its documentation for any purpose, without fee, and without written agreement is hereby granted, provided that the above copyright notice and the following two paragraphs appear in all copies of this software. IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN AS IS BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.