Ucam CL TR 798
Ucam CL TR 798
ISSN 1476-2986
Number 798
Computer Laboratory
Periklis Akritidis
June 2011
15 JJ Thomson Avenue
Cambridge CB3 0FD
United Kingdom
phone +44 1223 763500
http://www.cl.cam.ac.uk/
c 2011 Periklis Akritidis
http://www.cl.cam.ac.uk/techreports/
ISSN 1476-2986
Practical memory safety for C
Periklis Akritidis
I would like to thank my supervisor Steve Hand for his guidance, as well as my
hosts at Microsoft Research Cambridge, Manuel Costa and Miguel Castro; I greatly
enjoyed working with them. I am grateful to Brad Karp and Prof. Alan Mycroft for
their valuable feedback, and to Prof. Jon Crowcroft for his encouragement. I am
also indebted to my previous supervisors and colleagues, especially Prof. Evangelos
Markatos for introducing me to research.
During the last years, I have enjoyed the company of my labmates Amitabha Roy,
Bharat Venkatakrishnan, David Miller, Myoung Jin Nam, Carl Forsell, Tassos Noulas,
Haris Rotsos, and Eva Kalyvianaki. I am also grateful to my friends Mina Brimpari,
Daniel Holland, and Lefteris Garyfallidis for their support and company.
Lastly, I am immensely grateful to my parents, Themistoklis and Maria for their
crucial early support. Thank you!
5
Contents
Summary 3
Acknowledgements 5
Table of contents 9
List of figures 11
List of tables 13
1 Introduction 15
1.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2 State of the art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3 Requirements and challenges . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3.1 Adequate protection . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3.2 Good performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3.3 Pristine sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3.4 Binary compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3.5 Low false positive rate . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4 Hypothesis and contributions . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4.1 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4.2 New integrity properties . . . . . . . . . . . . . . . . . . . . . . . 19
1.4.3 New implementations . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4.4 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5 Organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2 Background 23
2.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Common vulnerabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 Possible attack targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4 Integrity guarantees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
7
8 Contents
5 Byte-granularity isolation 83
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.2 Protection model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.3 Interposition library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Contents 9
5.4 Instrumentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.4.1 Encoding rights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.4.2 ACL tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.4.3 Avoiding accesses to conflict tables . . . . . . . . . . . . . . . . . 95
5.4.4 Table access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.4.5 Synchronisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.5 Experimental evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.5.1 Effectiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.5.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
7 Conclusions 119
7.1 Summary and lessons learnt . . . . . . . . . . . . . . . . . . . . . . . . . . 119
7.2 Future directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Bibliography 121
List of figures
11
12 List of figures
13
Chapter 1
Introduction
C and C++ are holding their ground [52, 157] against new, memory-safe languages
thanks to their capacity for high performance execution and low-level systems pro-
gramming, and, of course, the abundance of legacy code. The lack of memory safety,
however, causes widespread security and reliability problems. Despite considerable
prior research, existing solutions are weak, expensive, or incompatible with legacy
code.
My work demonstrates that a spectrum of efficient backwards-compatible solutions
are possible through careful engineering and judicious tradeoffs between performance
and error detection. The key idea is to enforce the minimum integrity guarantees nec-
essary to protect against memory-safety attacks and operating system crashes beyond
the protection of current practical solutions, while maintaining a low cost instead of
aiming to detect all memory-safety violations at a prohibitive cost.
Finally, while the rest of this dissertation focuses on programs written in C, the
observations and solutions are intended to apply equally well to programs written in
C++.
15
16 Introduction
1 char buf[N];
2 char *q = buf;
3 while (*p)
4 *q++ = *p++;
Figure 1.1: The archetypal memory corruption error: writes through pointer q
may overwrite memory beyond the intended buffer buf if strlen(p)
can become >= N.
little processing and lots of I/O, resulting in frequent expensive hardware address-
space switches. In addition, existing solutions break backwards compatibility, as ex-
isting device drivers must be ported to new, coarse-grained APIs. While software-
based sandboxing solutions such as software fault isolation (SFI) can address the cost
of frequent hardware address-space switching, they still fail to preserve backwards
compatibility with legacy APIs at a reasonable cost.
Thus, the state of the art is neatly summarised in the adage: solutions are safe, fast,
and backwards-compatible—pick two. This hinders the adoption of comprehensive
solutions, since performance and legacy code are the main drivers of C and C++ use.
Hence, only fast, backwards-compatible solutions are likely to see wide adoption; but
to date, these do not provide sufficient protection to curb memory-safety problems in
practice.
1.4.1 Hypothesis
My thesis is that protecting the execution integrity of code written in memory-unsafe
languages against memory-safety errors can be made practical. A practical solu-
tion requires minimal porting of existing source-code, incurs acceptable performance
degradation, allows incremental deployment, and avoids false alarms for almost all
programs. My work shows how to achieve these by striking a better balance between
protection and performance.
Baggy bounds
Traditional bounds checking prevents spatial memory-safety violations by detecting
memory accesses outside the intended object’s bounds. I observed, however, that
padding objects and permitting memory accesses to their padding does not compro-
mise security: some spatial memory safety violations are silently tolerated (they just
access the padding), while those that would access another object are still detected.
This observation enables tuning the bounds to increase performance. By padding
every object to a power-of-two size and aligning its base address to a multiple of its
padded size, bounds can be represented with the binary logarithm of the power-of-
two size, which can fit in a single byte for address spaces up to 2256 bytes. That is
an eight-fold improvement over traditional bounds representations on 32-bit systems
that require eight bytes (four for the base address plus four for the object size). The
compact bounds representation can help replace expensive data structures used in
previous work with efficient ones, reducing the time to access the data structures, and,
despite trading space for time by padding objects, allowing for competitive memory
overhead due to less memory being used for the data structures storing the bounds.
Furthermore, with object bounds constrained this way, bounds checks can be stream-
lined into bit-pattern checks. Finally, on 64-bit architectures, it is possible to use spare
bits in the pointers to store the bounds without having to change the pointer size or
use an auxiliary data structure.
Write integrity
A fundamental cost of many comprehensive solutions for memory safety, including
baggy bounds checking, is tracking the intended target object of a pointer. Intuitively,
write integrity approximates this relation without tracking pointers.
Write integrity uses interprocedural points-to analysis [7] at compile time to con-
servatively approximate the set of objects writable by an instruction, and enforces this
set at runtime using lightweight checks to prevent memory corruption. Addition-
ally, it introduces small guards between the original objects in the program. As these
guards are not in any writable set, they prevent sequential overflows (Section 2.2)
even when the static analysis is imprecise. WIT maintains call stack and heap meta-
data integrity (Section 2.4) because return addresses and heap metadata are excluded
from any permitted set by default, and can prevent double frees by checking free op-
erations similarly to writes in combination with excluding released memory from any
writable set. It prevents more memory errors on top of these, subject to the precision
of the analysis. In fact, subsequent write integrity checks can often nullify bugs due
to corruption of sub-objects, dangling pointers, and out-of-bounds reads.
Write integrity is coupled with control-flow integrity (CFI, Section 2.4) [86, 1, 2] to
reinforce each other. CFI prevents bypassing write checks and provides a second line
of defence. In turn, the CFI implementation does not have to check function returns,
as call-stack integrity is guaranteed by the write checks.
1.4 Hypothesis and contributions 21
WIT has better coverage than solutions with comparable performance, and has
consistently low enough overhead to be used in practice for protecting user-space
applications. Its average runtime overhead is only 7% across a set of CPU-intensive
benchmarks and it is negligible when I/O is the bottleneck. Its memory overhead is
13% and can be halved on 64-bit architectures.
BGI extends WIT to isolate device drivers from each other and the kernel, offering
high protection with CPU overhead between 0% and 16%.
All three solutions satisfy the requirement of backwards compatibility, both at the
source and binary level. BBC and WIT can compile user-space C programs without
modifications, and these programs can be linked against unmodified binary libraries.
BGI can compile Windows drivers [114] without requiring changes to the source code,
and these drivers can coexist with unmodified binary drivers.
1.5 Organisation
The rest of this dissertation is organised as follows. Chapter 2 provides background
information related to this work. It summarises the long history of the problem,
classifies the weaknesses this work aims to address, and introduces various integrity
guarantees to give some background context for the design process.
The next three chapters present my work on providing effective and practical
protection for user-space and kernel-space C programs. Chapter 3 addresses spa-
tial safety using the baggy bounds integrity property. It shows how baggy bounds
checking (BBC) can be used to enforce spatial safety in production systems more
efficiently than traditional backwards-compatible bounds checking [82]. Special at-
tention is given to how 64-bit architectures can be used for faster protection. To
address temporal safety, BBC can be combined with existing techniques, as discussed
in Section 3.10.3, to provide a complete solution.
Chapters 4 and 5 present further work that was motivated by two reasons. First,
I tried to lower overheads further by making runtime checks even more lightweight.
This led to the formulation of write integrity testing (WIT) in Chapter 4, with a design
aiming to maximise the safety that can be provided using the cheapest-possible run-
time checks. WIT can also protect against some temporal memory-safety violations
due to uses of pointers to deallocated objects.
Next, I tried to address memory-safety issues in the context of legacy Windows de-
vice drivers, highlighting temporal-safety risks beyond memory deallocation, which
are addressed in Chapter 5. In short, I observed that lack of temporal access-control
allows memory corruption faults to propagate across kernel components. For ex-
ample, a memory error in a kernel extension corrupting an object allocated by the
extension but referenced by kernel data structures can cause kernel code to corrupt
memory when using the object. These errors can be prevented by enforcing dynamic
access rights according to the kernel API rules to prevent extensions from corrupting
objects they no longer own.
Finally, Chapter 6 critically reviews related work, and Chapter 7 concludes.
Chapter 2
Background
2.1 History
A look at the long history of memory-safety problems can highlight some challenges
faced by proposed solutions. Attackers have shown significant motivation, quickly
adapting to defences, and a spirit of full disclosure has emerged in condemnation of
“security through obscurity”.
After Dennis Ritchie developed C in 1972 as a general-purpose computer program-
ming language for use with the UNIX operating system, it quickly became popular
for developing portable application software. Bjarne Stroustrup started developing
C++ in 1979 based on C, inheriting its security and reliability problems. According
to several crude estimates [52, 157], most software today is written in one of these
languages.
Buffer overflows were identified as a security threat as early as 1972 [8], and the
earliest documented exploitation of a buffer overflow was in 1988 as one of several
exploits used by the Morris worm [116] to propagate over the Internet. Since then,
several high-profile Internet worms have exploited buffer overflows for their prop-
agation, including Code Red [178] in July 2001, Code Red II in August 2001, and
Nimda in September 2001, then SQL Slammer [102] in January 2003 and Blaster [13]
in August 2003, until attacker attention shifted to stealthier attacks such as botnets
and drive-by attacks that generate illegal revenue rather than mayhem.
Stack-overflow vulnerabilities and their exploitation became widely known in 1996
through an influential step-by-step article [113] by Elias Levy (known as Aleph One).
Early stack-based buffer overflows targeting the saved function return address were
later extended to non-control-data attacks [37] and heap-based overflows [119, 44].
In August 1997 Alexander Peslyak (known as Solar Designer) showed how to bypass
the then promising non-executable stack defences [53]. Exploitation mechanisms for
memory errors beyond buffer overflows, such as integer overflows [23, 4], format-
string vulnerabilities, and dangling pointers followed soon. Format-string vulnerabil-
ities were discovered in June 2000, when a vulnerability in WU-FTP was exploited.
The first exploitation of a heap overflow was demonstrated in July 2000 by Solar
Designer [54]. Most browser-based heap-memory exploits now use a technique first
described in 2004 by SkyLined [137], called heap spraying, that overcomes the un-
predictability of heap memory-layout by filling up the heap with many attacker-
controlled objects allocated through a malicious web-page. The first exploit for a
dangling pointer vulnerability, latent in the Microsoft IIS web server since December
23
24 Background
2005, was presented [3] in August 2007, complete with an instruction manual on how
to find and exploit dangling pointer vulnerabilities.
Twenty years after the first attack, we are witnessing frequent exploitation of buffer
overflows and other memory errors, as well as new attack targets. In 2003, buffer
overflows in licensed Xbox games were exploited to enable running unlicensed soft-
ware. Buffer overflows were also used for the same purpose targeting the PlayStation
2 and the Wii. Some trends have changed over the years. Instead of servers exposed
to malicious clients, Web browsers and their plugins are becoming targeted through
malicious websites. Moreover, vulnerabilities in thousands of end-user systems may
be exploited for large-scale attacks affecting the security of third parties or the stability
of the Internet as a whole.
Integer errors Silent integer overflow (wraparound) errors, and integer coercion
(width and sign conversion) errors, while not memory errors themselves, may trig-
ger bound errors [23, 4]. For instance, a negative value may pass a signed integer
check guarding against values greater than a buffer’s size, but subsequent use as an
unsigned integer, e.g. by a function like memcpy, can overflow the buffer. Another ex-
ample is when a wrapped-around negative integer size is passed to malloc causing a
zero-sized buffer allocation that leads to an overflow when the buffer is later used.
Figure 2.1: A sequential buffer overflow (a) cannot access Object 3 using a
pointer intended for Object 1 without accessing Object 2, but a ran-
dom access bound error (b) can access Object 3 without accessing
Object 2.
Use of uninitialised variables A similar vulnerability results from the use of unini-
tialised data, especially pointers. Uninitialised data may be controlled by the attacker,
but use of uninitialised data can be exploited even if the data is not controlled by the
attacker.
Double frees Another error related to manual memory management are double-
free bugs. If a program calls free twice on the same memory address, vulnerable
heap-memory allocators re-enter double-freed memory into a free list, subsequently
returning the same memory chunk twice, with the allocator erroneously interpreting
the data stored in the first allocation as heap-metadata pointers when the second
allocation request for the doubly entered chunk is processed.
control flow to attacker code, or otherwise alter program behaviour. We will see,
however, that invalid reads and legitimate code can be used too. Here I discuss the
various critical program elements targeted by attackers, to help with evaluating the
effectiveness of proposals.
Heap-based and static variables Buffer overflows in heap-based and static buffers
can also be exploited to overwrite targets on the heap or in global variables, as dis-
cussed next.
Existing code As defences made data and stack sections non-executable or pre-
vented injecting new code, return-to-libc attacks were developed which divert exe-
cution to an existing function, using an additional portion of the stack to provide
2.4 Integrity guarantees 27
arguments to this function. Possible targets include system to execute a shell or Vir-
tualProtect to make data executable again. An elaboration of this technique [134, 28]
targets short instruction sequences scattered in the executable that end with a return
instruction. Longer operations can be executed using the return instructions to chain
individual short sequences through a series of return addresses placed on the stack.
This technique greatly increases the number of illegal control-flow transition targets.
Exploiting reads Finally, even memory reads can indirectly corrupt memory. Illegal
reads of pointer values are particularly dangerous. For example, if a program reads
a pointer from attacker-controlled data and writes to memory through that pointer,
an attacker can divert the write to overwrite an arbitrary target. The heap spraying
technique can be used when the attacker has no control over the value read.
3.1 Overview
Figure 3.1 shows the overall system architecture of BBC. It converts source code to
an intermediate representation (IR), identifies potentially unsafe pointer-arithmetic
operations, and inserts checks to ensure their results are within bounds. Then, it links
29
30 Baggy bounds checking
Source Generate
Analyze
Code IR
Baggy Runtime
Insert
Bounds Support
Checks
Library
Checking
Binary Generate
Link
Libraries Code
Hardened
Executable
the generated code with a runtime support library and binary libraries—compiled
with or without checks—to create an executable hardened against bound errors.
Overall, the system is similar to previous backwards-compatible bounds-checking
systems for C, and follows the general bounds-checking approach introduced by Jones
and Kelly [82]. Given an in-bounds pointer to an object, this approach ensures that
any derived pointer points to the same object. It records bounds information for each
object in a bounds table. The bounds are associated with memory ranges, instead of
pointers. This table is updated on allocation and deallocation of objects: this is done
by the malloc family of functions for heap-based objects, on function entry and exit
for local variables, and on program startup for global variables. (The alloca function
is supported too.)
I will use the example in Figure 3.2 to illustrate how the Jones and Kelly approach
bounds-checks pointer arithmetic. The example is a simplified web server with a
buffer-overflow vulnerability. It is inspired by a vulnerability in nullhttpd [111] that
can be used to launch a non-control-data attack [37]. Line 7 was rewritten into lines 8
and 9 to highlight the erroneous pointer arithmetic in line 8.
When the web server in Figure 3.2 receives a CGI command, it calls function Pro-
cessCGIRequest with the message it received from the network and its size as argu-
ments. This function copies the command from the message to the global variable
cgiCommand and then calls ExecuteRequest to execute the command. The variable
cgiDir contains the pathname of the directory with the executables that can be in-
voked by CGI commands. ExecuteRequest first checks that cgiCommand does not
3.1 Overview 31
1 char cgiCommand[1024];
2 char cgiDir[1024];
3
4 void ProcessCGIRequest(char *msg, int sz) {
5 int i = 0;
6 while (i < sz) {
7 // cgiCommand[i] = msg[i];
8 char *p = cgiCommand + i;
9 *p = msg[i];
10 i++;
11 }
12
13 ExecuteRequest(cgiDir, cgiCommand);
14 }
Figure 3.2: Example vulnerable code: simplified web server with a buffer over-
flow vulnerability. Line 7 is split into lines 8 and 9 to highlight the
erroneous pointer arithmetic operation.
contain the substring "\\.." and then it concatenates cgiDir and cgiCommand to ob-
tain the pathname of the executable to run. Unfortunately, there is a bound error in
line 8: if the message is too long, the attacker can overwrite cgiDir, assuming the
compiler generated a memory layout where cgiDir immediately follows cgiCommand.
This allows the attacker to run any executable (for example, a command shell) with
the arguments supplied in the request message. This is one of the most challenging
types of attacks to detect because it is a non-control-data attack [37]: it does not violate
control-flow integrity.
The system identifies at compile time lines 8 and 9 as containing potentially-
dangerous pointer arithmetic, in this case adding i to pointer cgiCommand and in-
dexing array msg, and inserts code to perform checks at runtime. It also inserts code
to save the bounds of memory allocations in the bounds table. In the case of global
variables such as cgiCommand and cgiDir, this code is run on program startup for
each variable.
At runtime, the checks for the dangerous pointer arithmetic in line 8 use the source
pointer holding the address of cgiCommand to look up in the table the bounds of the
memory pointed to. The system then performs the pointer arithmetic operation, in
this case computing index i, but before using the result to access memory, it checks
if the result remains within bounds. If the resulting address is out-of-bounds, an
exception is raised. By checking all pointer arithmetic, the Jones and Kelly approach
maintains the invariant that every pointer used by the program is within bounds. That
is why pointers can be assumed to be within bounds prior to every pointer arithmetic
operation.
While not the case in this example, according to the C standard, an out-of-bounds
pointer pointing one element past an array can be legally used in pointer arithmetic to
produce an in-bounds pointer. In common practice, arbitrary out-of-bounds pointers
32 Baggy bounds checking
are used too. That is why the Jones and Kelly approach was subsequently modi-
fied [130] from raising an exception to marking out-of-bounds pointers and only rais-
ing an exception if they are dereferenced. Marked pointers must be handled specially
to retrieve their proper bounds. This mechanism is described in detail in Section 3.5.
To reduce space and time overhead at runtime, the solutions presented in this dis-
sertation, including BBC, perform a safety analysis to compute instructions and ob-
jects that are safe. A pointer arithmetic operation is safe if its result can be proven
at compile-time to always be within bounds. The result of a safe pointer arithmetic
operation is a safe pointer. Objects that are accessed only through safe pointers, or
not accessed through pointers at all, are themselves safe. The system does not have
to instrument safe pointer arithmetic operations, and does not have to enter safe ob-
jects into the bounds table. Variable i in the example of Figure 3.2 is safe, because it
is never accessed through a pointer in the program, unless we consider the implicit
stack-frame pointer used by the generated machine code to access local variables rel-
ative to the function’s stack frame, in which case i is still safe, as the constant offsets
used in pointer arithmetic accessing local variables are within bounds assuming the
stack-frame pointer is not corrupted. (BBC protects the stack-frame pointer through
its bounds checks, and Chapter 4 offers even stronger guarantees.)
Object Bounds
Allocation Bounds
Object Padding
Figure 3.3: Baggy bounds encompass the entire memory allocated for an object,
including any padding appended to the object. Note that for com-
pound objects such as structures and arrays, only the bounds of the
most outer object are considered.
Baggy bounds checking differs from prior work based on the Jones and Kelly ap-
proach in its runtime mechanisms, highlighted in Figure 3.1 with a dashed box. In-
stead of exact bounds, it enforces baggy bounds. As shown in Figure 3.3, baggy
bounds checking enforces the allocation bounds which include the object and addi-
tional padding. The padding controls the size and memory alignment of allocations.
This is used by BBC to improve performance by enabling a very compact representa-
tion for the bounds. Previous techniques recorded two values in the bounds table for
every object: a pointer to the start of the object and its size, which together require at
least eight bytes on 32-bit systems. Instead, BBC pads and aligns objects (including
heap, local, and global variables) to powers of two, which enables using a single byte
to encode bounds by storing the binary logarithm of the allocation size in the bounds
table (using C-like notation):
e = log2(size)
3.2 Protection 33
Given this information and a pointer p, pointing possibly anywhere within a suitably
aligned and padded object, BBC can recover the allocation size and a pointer to the
start of the allocation with:
size = 1 << e
base = p & ~(size - 1)
Next, I discuss how BBC satisfies the requirements for protection, performance,
and backwards compatibility; how it can support pointers one or more bytes beyond
the target object; and additional implementation details followed by an experimental
evaluation.
3.2 Protection
BBC can detect bounds violations across allocation bounds, but silently tolerates vi-
olations of the object bounds that stay within the allocation bounds. For BBC to be
secure, it is crucial that such undetected violations cannot breach security.
It is clear that the overwrite targets of Section 2.3, including return addresses, func-
tion pointers or other security critical data, cannot be corrupted since accesses to the
padding area cannot overwrite other objects or runtime metadata. Also, reads can-
not be exploited to leak sensitive information, such as a password, from another live
object.
However, allowing reads from an uninitialised padding area can access information
from freed objects that are no longer in use. Such reads could leak secrets or access
data whose contents are controlled by the attacker. Recall from Section 2.3 that pointer
reads from attacker-controlled or uninitialised memory are particularly dangerous.
This vulnerability is prevented by clearing the padding on memory allocation.
3.3 Performance
Baggy bounds enable two performance improvements over traditional bounds check-
ing. First, the compact bounds representation can be used to simplify and optimise the
data structure mapping memory addresses to bounds. Second, baggy bounds can be
checked more efficiently using bitwise operations without explicitly recovering both
bounds to perform two arithmetic comparisons, saving both machine registers and
instructions. These optimisation opportunities come at the cost of increased memory
usage due to the extra padding introduced by baggy bounds, and increased process-
ing due to having to zero this extra padding on memory allocation. As we shall see
in the experimental evaluation, however, simplifying the data structures can save con-
siderable space over previous solutions, as well as increase performance, making the
system’s space and time overheads very competitive.
splay tree. BBC, on the other hand, implements the bounds table using a contiguous
array.
slot
table entry
Figure 3.4: The bounds table can be kept small by partitioning memory into slots.
For baggy bounds, this array can be small because each entry uses a single byte.
Moreover, by partitioning memory into aligned slots of slotsize bytes, as shown in
Figure 3.4, and aligning objects to slot boundaries so that no two objects share a slot,
the bounds table can have one entry per slot rather than one per byte. Now the space
overhead of the table is 1/slotsize, which can be tuned to balance between memory
waste due to padding and table size.
Accesses to the table are fast because it is an array. To obtain a pointer to the table
entry corresponding to an address, b address
slotsize c is computed by right-shifting the address
by the constant log2 (slotsize), and is added to the constant base address of the table.
The resulting pointer can be used to retrieve the bounds information with a single
memory access, instead of having to traverse and splay a splay tree (as in previous
solutions [82, 130]).
Figure 3.5: Baggy bounds enables optimised bounds checks: we can verify that
pointer q derived from pointer p is within bounds by simply checking
that p and q have the same prefix with only the e least significant bits
modified, where e is the binary logarithm of the allocation size. The
check can be implemented using efficient unsigned shift operations.
Baggy bounds, however, enable an optimised bounds check that does not even need
to compute the lower and upper bounds. It uses directly the value of p and the value
of the binary logarithm of the allocation size, e, retrieved from the bounds table. The
constraints on allocation size and alignment ensure that q is within the allocation
bounds if it differs from p only in the e least significant bits. Therefore, it is sufficient
to XOR p with q, right-shift the result by e and check for zero, as shown in Figure 3.5.
Furthermore, for pointers q where sizeof(*q) > 1, we also need to check that
(char *) q + sizeof(*q) - 1 is within bounds to prevent an access to *q from cross-
ing the end of the object and the allocation bounds. Baggy bounds checking can avoid
this extra check if q points to a built-in type. Aligned accesses to these types cannot
overlap an allocation boundary because their size is a power of two and is less than
slotsize (assuming a sufficiently large choice of slotsize such as 16). In the absence of
casts, all accesses to built-in types generated by the compiler are aligned. Enforcing
alignment is cheap, because only casts have to be checked. When checking accesses
to structures not satisfying these constraints, both checks are performed.
3.4 Interoperability
As discussed in Section 1.3, BBC must work even when instrumented code is linked
against libraries that are not instrumented. Protection must degrade gracefully with
uninstrumented libraries, and instrumented code must not raise false positives due to
objects allocated in uninstrumented libraries. This form of interoperability is impor-
tant because some libraries are distributed in binary form only.
Library code works with BBC because the size of pointers, and therefore the mem-
ory layout of structures, does not change. However, it is also necessary to ensure
graceful degradation when bounds are missing, so that instrumented code can access
36 Baggy bounds checking
slot
Out-of-bounds pointer
directly in bottom half of slot
supported
object out-of-
bounds
range
Out-of-bounds pointer
slot in top half of slot
Figure 3.6: We can tell whether a pointer that is out-of-bounds by less than
slotsize/2 is below or above an allocation. This lets us correctly
adjust it to get a pointer to the object by respectively adding or
subtracting slotsize.
The next challenge is to recover a pointer to the referent object from the out-of-
bounds pointer without resorting to an additional data structure. This is possible
for the common case of near out-of-bounds pointers pointing at most slotsize/2 bytes
before or after the allocation bounds. Since the allocation bounds are aligned on slot
boundaries, a near out-of-bounds pointer is below or above the allocation depending
on whether it lies in the top or bottom half of an adjacent memory slot respectively,
as illustrated in Figure 3.6. Thus, a pointer within the referent object can be recov-
ered from a near out-of-bounds pointer by adding or subtracting slotsize bytes ac-
cordingly. This technique cannot handle out-of-bounds pointers more than slotsize/2
bytes outside the original allocation but, in Section 3.9.2, I show how to take advan-
tage of the spare bits in pointers on 64-bit architectures to increase this range. It is
also possible to support the remaining cases of out-of-bounds pointers using previ-
ous techniques [130, 104]. The advantage of using my mechanism for programs with
non-standard out-of-bounds pointers is that the problems of previous techniques (in-
cluding runtime and memory overheads) are limited to a small (or zero) subset of
out-of-bounds pointers; of course, programs that stick to the C standard suffer no
such problems.
Moreover, pointers that are provably not dereferenced can be allowed to go out-of-
bounds. This can be useful for supporting idioms where pointers go wildly out-of-
bounds in the final iteration of a loop, e.g.:
1 for (i = 0; ; i<<=2)
2 if (buf + i >= buf + size)
3 break;
Relational operators (e.g. >=) must be instrumented to support comparing an out-
of-bounds pointer with an in-bounds one: the instrumentation must clear the top
bit of the pointers before comparing them. Interestingly, less instrumentation would
be necessary to support the legal case of one element past the end of an array only,
because setting the top bit would not affect the unsigned inequality testing used for
38 Baggy bounds checking
objects that cannot be accessed in unsafe ways. These are called safe objects, since
every access to them is itself safe. The current prototype only pads and aligns local
variables that are indexed unsafely in the enclosing function, or whose address is
taken, and therefore possibly leaked to other functions that may use them unsafely.
These variables are called unsafe.
3.7 Instrumentation
3.7.1 Bounds table
In implementing the bounds table, I chose a slotsize of 16 bytes which is small enough
to avoid penalising small allocations but large enough to keep the table memory use
low. Therefore, 1/16th of the virtual address space is reserved for the bounds table.
Since pages are allocated to the table on demand, this increases memory utilisation
by only 6.25%. On program startup, the address space required for the bounds ta-
ble is reserved, and a vectored exception handler (a Windows mechanism similar to
UNIX signal handlers) is installed to capture accesses to unallocated pages and allo-
cate missing table pages on demand. All the bytes in these pages are initialised by the
handler to the value 31, representing bounds encompassing all the memory address-
able by BBC programs on the x86 (an allocation size of 231 at base address 0). This
prevents out-of-bounds errors when instrumented code accesses memory allocated
by uninstrumented code, as discussed in Section 3.4. The memory alignment used by
the system memory allocator on 32-bit Windows is 8 bytes, but a slotsize of 16 bytes
could be supported using a custom memory allocator.
The table can be placed in any fixed memory location. The current prototype places
the base of the table at address 40000000h. This has the downside that this 32-bit con-
40 Baggy bounds checking
↓
1 if (IN_BOUNDS(p, &p[n-1])) {
2 for (i = 0; i < n; i++) {
3 if (p[i] == 0) break;
4 p[i] = 0;
5 }
6 } else {
7 for (i = 0; i < n; i++) {
8 if (p[i] == 0) break;
9 ASSERT(IN_BOUNDS(p, &p[i]));
10 p[i] = 0;
11 }
12 }
Figure 3.7: The compiler’s range analysis can determine that the range of variable
i is at most 0 . . . n − 1. However, the loop may exit before i reaches
n − 1. To prevent erroneously raising an alarm in that case, the
transformed code falls back to an instrumented version of the loop if
the hoisted check fails.
stant has to be encoded in the instrumentation, increasing code bloat. Using a base
address of 0h reduces the number of bytes needed to encode the instructions that
access the table by omitting the 32-bit constant. This arrangement can still accom-
modate the null pointer dereference landing zone at address 0h (a protected memory
range that ensures null pointer dereferences raise an access error). The lower part of
the table can be access protected because it is the unused image of the table memory
itself. I experimented with placing the base of the table at address 0h; however, it had
little impact on the runtime overhead in practice.
dard memory allocation functions with calls to implementations based on the buddy
allocator.
The stack frames of functions that contain unsafe local variables are aligned at
runtime by enlarging the stack frame and aligning the frame pointer, while global
and static variables are aligned and padded at compile time. Most compilers support
aligning variables using special declaration attributes; BBC uses the same mechanism
implicitly.
Unsafe function arguments need special treatment, because padding and aligning
them would violate the calling convention. Instead, they are copied by the function
prologue to appropriately aligned and padded local variables and all references in the
function body are changed to use their copies (except for uses by va_list that need
the address of the last explicit argument to correctly extract subsequent arguments).
This preserves the calling convention while enabling bounds checking for function
arguments accessed in unsafe ways.
Unfortunately, the Windows runtime cannot align stack objects to more than 8k nor
static objects to more than 4k (configurable using the /ALIGN linker switch). To remove
this limitation, large automatic and static allocations could be replaced with dynamic
allocations, or the language runtime could be modified to support larger alignment
requirements. Instead, the current prototype deals with this by setting the bounds
table entries for such objects to 31, effectively disabling checks (but the performance
impact of the checks remains).
cators (including the BBC buddy allocator) perform similar optimisations for calloc
in order to avoid unnecessary page accesses to over-provisioned memory allocations
that would not be touched otherwise. The current prototype supports this for heap
allocations by reusing the mechanism used for calloc.
Start
Marked?
Yes No
Recover valid
Table lookup
pointer
Valid result?
Yes No
Ok Mark result
Start
Recover valid
Table lookup
pointer
Valid result?
Yes No
Ok Marked?
No Yes
Mark result
Figure 3.8: The optimised flow in (b) requires only one comparison in the fast
path shown with bold lines vs. two comparisons for the unoptimised
case (a).
pointer arithmetic
1 p = cgiCommand + i ;
bounds lookup
2 mov eax , cgiCommand
3 shr eax , 4
4 mov al , byte ptr [ TABLE + eax ]
bounds check
5 mov ebx , cgiCommand
6 xor ebx , p
7 shr ebx , al
8 jz ok
9 p = slowPath ( cgiCommand , p )
10 ok :
3.8.1 Performance
I evaluated the time and peak-memory overhead of BBC using the Olden bench-
marks [31] and the SPEC CINT2000 [156] integer benchmarks. I chose these
benchmarks to allow a comparison against results reported for some other solu-
tions [57, 170, 104]. In addition, to enable a more detailed comparison with splay-
tree-based approaches—including measuring their space overhead—I implemented a
BBC variant which uses the splay tree code from previous systems [82, 130]. This im-
plementation uses the standard allocator and lacks support for illegal out-of-bounds
pointers, but instruments the same operations as BBC. All benchmarks were compiled
with the Phoenix compiler using /O2 optimisation level and run on a 2.33 GHz Intel
Core 2 Duo processor with 2 GB of RAM. For each experiment, I present the average
of 3 runs; the variance was negligible.
I did not run eon from SPEC CINT2000 because it uses C++ features which are
not supported in the current prototype, such as operator new. For the splay-tree-
based implementation only, I did not run vpr due to the lack of support for illegal
out-of-bounds pointers. I also could not run gcc because of code that subtracted a
pointer from a NULL pointer and subtracted the result from NULL again to recover
the pointer. Running this would require more comprehensive support for out-of-
bounds pointers (such as that described in [130]).
I made the following modifications to some of the benchmarks: First, I modified
parser from SPEC CINT2000 to fix an overflow that triggered a bound error when
using the splay tree. It did not trigger an error with baggy bounds checking because
in the benchmark run, the overflow was entirely contained in the allocation. The
unchecked program also survived the bug because the object was small enough for
the overflow to be contained even in the padding added by the standard allocator.
3.8 Experimental evaluation 45
Buddy Baggy
Figure 3.10: Execution time for the Olden benchmarks using the buddy allocator
vs. the full BBC system, normalised by the execution time using the
standard system allocator without instrumentation.
Buddy Baggy
Figure 3.11: Peak memory use with the buddy allocator alone vs. the full BBC
system for the Olden benchmarks, normalised by peak memory using
the standard allocator without instrumentation.
Buddy Baggy
Normalized Execution Time
2.5
2
1.5
1
0.5
0
Figure 3.12: Execution time for SPEC CINT2000 benchmarks using the buddy
allocator vs. the full BBC system, normalised by the execution time
using the standard system allocator without instrumentation.
checking system [57] on the same benchmarks and offering same protection (modulo
allocation vs. object bounds) is 12%. Moreover, their system uses a technique (pool
allocation) which could be combined with BBC. Based on the breakdown of results
reported in [57], their overhead measured against a baseline using just pool allocation
is 15%, and it seems more reasonable to compare these two numbers, as both the
buddy allocator and pool allocation can be in principle applied independently on
either system.
Next I measured the system using the SPEC CINT2000 benchmarks. Figures 3.12
and 3.13 show the time and space overheads for SPEC CINT2000 benchmarks.
3.8 Experimental evaluation 47
1.4
1.2
1
0.8
0.6
0.4
0.2
0
Figure 3.13: Peak memory use with the buddy allocator alone vs. the full BBC
system for SPEC CINT2000 benchmarks, normalised by peak mem-
ory using the standard allocator without instrumentation.
Baggy Splay
Normalized Execution Time
2.5
2
1.5
1
0.5
0
Figure 3.14: Execution time of baggy bounds checking vs. using a splay tree for
the Olden benchmark suite, normalised by the execution time using
the standard system allocator without instrumentation. Benchmarks
mst and health used too much memory and thrashed so their ex-
ecution times are excluded.
48 Baggy bounds checking
Baggy Splay
Figure 3.15: Execution time of baggy bounds checking vs. using a splay tree
for SPEC CINT2000 benchmarks, normalised by the execution time
using the standard system allocator without instrumentation.
The use of the buddy allocator has little effect on performance in general. The aver-
age runtime overhead of the full system with the benchmarks from SPEC CINT2000 is
60%. vpr has the highest overhead of 127% because its frequent use of illegal pointers
to fake base-one arrays invokes the slow path. I observed that adjusting the allocator
to pad each allocation with 8 bytes from below decreases the time overhead to 53%
with only 5% added to the memory usage, although in general I did not investigate
tuning the benchmarks like this. Interestingly, the overhead for mcf is a mere 16%
compared to the 185% in [170] but the overhead of gzip is 55% compared to 15%
in [170]. Such differences in performance are due to different levels of protection such
as checking structure field indexing and checking dereferences, the effectiveness of
different static analysis implementations in optimising away checks, and the different
compilers used.
To isolate these effects, I also measured BBC using the standard memory allocator
and the splay tree implementation from previous systems [82, 130]. Figure 3.14 shows
the time overhead for baggy bounds versus using a splay tree for the Olden bench-
marks. The splay tree runs out of physical memory for the last two Olden benchmarks
(mst, health) and slows down to a crawl, so I exclude them from the average of 30%
for the splay tree. Figure 3.15 compares the time overhead against using a splay tree
for the SPEC CINT2000 benchmarks. The overhead of the splay tree exceeds 100% for
all benchmarks, with an average of 900% compared to the average of 60% for baggy
bounds checking.
Perhaps the most interesting result in the evaluation was space overhead. Previous
solutions [82, 130, 57] do not report on the memory overheads of using splay trees,
so I measured the memory overhead of BBC when using splay trees and compared
it with the memory overhead of BBC when using the baggy-bounds buddy allocator.
Figure 3.16 shows that BBC has negligible memory overhead for Olden, as opposed
to the splay tree version’s 170% overhead. Interestingly Olden’s numerous small al-
3.8 Experimental evaluation 49
Baggy Splay
Normalized Peak Memory
5
4
3
2
1
0
Figure 3.16: Peak memory use of baggy bounds checking vs. using a splay tree
for the Olden benchmark suite, normalised by peak memory using
the standard allocator without instrumentation.
Baggy Splay
Normalized Peak Memory
2.5
2
1.5
1
0.5
0
Figure 3.17: Peak memory use of baggy bounds checking vs. using a splay tree
for SPEC CINT2000 benchmarks, normalised by peak memory using
the standard allocator without instrumentation.
50 Baggy bounds checking
locations demonstrate the splay tree’s worst case memory usage by taking up more
space for the entries than for the objects.
On the other hand, Figure 3.17 shows that the splay tree version’s space overhead
for most SPEC CINT2000 benchmarks is very low. The overhead of BBC, however, is
even less (15% vs. 20%). Furthermore, the potential worst case of double the memory
was not encountered for baggy bounds in any of the experiments, while the splay tree
did exhibit greater than 100% overhead for one benchmark (twolf).
The memory overhead is also low, as expected, compared to approaches that track
metadata for each pointer. For example, Xu et al. [170] report 331% for Olden, and
Nagarakatte et al. [104] report an average of 87% using a hash-table (and 64% us-
ing a contiguous array) for Olden and a subset of SPEC CINT and SPEC CFP. For
the pointer-intensive Olden benchmarks alone, their overhead increases to more than
about 260% (or about 170% using the array). These systems suffer high memory over-
heads per pointer to provide temporal protection [170] or sub-object protection [104].
3.8.2 Effectiveness
I evaluated the effectiveness of BBC in preventing buffer overflows using the bench-
mark suite from [166]. The attacks required tuning to have any chance of success,
because BBC changes the stack frame layout and copies unsafe function arguments to
local variables while the benchmarks use the address of the first function argument
to find the location of the return address they aim to overwrite.
BBC prevented 17 out of 18 buffer overflows in the suite. It failed, however, to
prevent the overflow of an array inside a structure from overwriting a pointer inside
the same structure. This pointer was used to overwrite arbitrary memory, and if it
was a function pointer, it could have been used to directly execute arbitrary code.
This limitation is shared with other systems that detect memory errors at the level of
memory blocks [82, 130, 170, 57]. However, systems storing a base pointer and size
out-of-band can provide a second level of defence for overwritten pointers because
the associated bounds remain intact and can prevent violations when the overwritten
pointer is used. Unfortunately, this is not the case for baggy bounds, because, rather
than a pair of base address and size, it stores the bounds relative to the pointer value.
Base Baggy
Figure 3.18: Throughput of Apache web server for varying numbers of concurrent
requests.
Base Baggy
600
Requests per second
500
400
300
200
1 2 3 4 5 6
Concurrency
Figure 3.19: Throughput of NullHTTPd web server for varying numbers of con-
current requests.
web servers. I managed to saturate the CPU by using the keep-alive option of the
benchmarking utility to reuse connections for subsequent requests. I issued repeated
requests for the servers’ default pages and varied the number of concurrent clients un-
til the throughput of the uninstrumented version levelled off (Figures 3.18 and 3.19).
Then I verified that the server’s CPU was saturated, and measured a throughput
decrease of 8% for Apache and 3% for NullHTTPd.
Finally, I built libpng, a notoriously vulnerability-prone library for processing im-
ages in the PNG file format [26]. Libpng is used by many applications to display
images. I successfully ran its test program for 1000 PNG files between 1–2kB found
on a desktop machine, and measured an average runtime overhead of 4% and a peak
memory overhead of 3.5%.
52 Baggy bounds checking
Program KSLOC
openssl-0.9.8k 397
Apache-2.2.11 474
nullhttpd-0.5.1 2
libpng-1.2.5 36
SPEC CINT2000 309
Olden 6
Total 1224
Table 3.1: Source lines of code in programs successfully built and run with BBC.
16 48
(a) AMD64 hardware
21 43
Figure 3.20: Use of pointer bits by AMD64 hardware and user-space Windows
applications.
For interoperability, instrumented code must be able to use pointers from uninstru-
mented code and vice versa. I achieve the former by interpreting the default zero
value found in unused bits of user-space pointers as maximal bounds, so checks on
pointers missing bounds information will succeed. Supporting interoperability in the
other direction is harder because the extra bits are not sign extended as expected by
the hardware, raising a hardware exception when uninstrumented accesses memory
through a tagged pointer.
I used the paging hardware to address this by mapping all addresses that differ
only in their tag bits to the same physical memory. This way, unmodified binary
libraries can dereference tagged pointers, and instrumented code avoids the cost of
clearing the tag too.
64
most significant bit
21 5 38
13 5 8 38
Figure 3.21: Use of pointer bits by in-bounds and out-of-bounds tagged pointers.
Using 5 bits to encode the size, as shown in Figure 3.21 (a), allows checking object
sizes up to 232 bytes. To use the paging mechanism, these 5 bits have to come from the
43 bits supported by the Windows operating system, thus leaving 38 bits of address
space for programs.
To support 5 bits for bounds, 32 different virtual address regions must be mapped
to the same physical memory. I implemented this entirely in user space using the Cre-
ateFileMapping and MapViewOfFileEx Windows API functions to replace the process
image, stack, and heap with a file backed by the system paging file (Windows termi-
nology for an anonymous mapping in operating systems using mmap) and mapped at
32 different virtual addresses in the process address space.
Now that 5 address bits are effectively ignored by the hardware, they can be used
to store the size of memory allocations. For heap allocations, malloc wrappers set the
tag bits in returned pointers. For locals and globals, the address taking operator “&”
is instrumented to properly tag the resulting pointer. The bit complement of the size
logarithm is stored to enable interoperability with untagged pointers by interpreting
their zero bit pattern as all bits set (representing a maximal allocation of 232 ).
With the bounds embedded in pointers, there is no need for a memory lookup to
check pointer arithmetic. Figure 3.22 shows the AMD64 code sequence for checking
pointer arithmetic using a tagged pointer. First, the encoded bounds are extracted
from the source pointer by right shifting a copy to bring the tag to the bottom 8 bits
of the register and XORing them with the value 0x1f to recover the size logarithm by
inverting the bottom 5 bits. Then the result of the arithmetic is checked by XORing
54 Baggy bounds checking
pointer arithmetic
1 p = cgiCommand + i ;
Figure 3.22: AMD64 code sequence inserted to check unsafe arithmetic with
tagged pointers. Note that the al register is an alias for the 8
least significant bits of the rax register.
the source and result pointers, shifting the result by the tag stored in al, and checking
for zero.
Similarly to the table-based implementation, checks on out-of-bounds pointers trig-
ger a bounds error to avoid an explicit check in the common case. To cause this, the
bits holding the size are set to zero for out-of-bounds pointers and the size is stored
using 5 more bits in the pointer, as shown in Figure 3.21 (b).
3.5
3
2.5
2
1.5
1
0.5
0
Figure 3.24: Normalised execution time on AMD64 with SPEC CINT2000 bench-
marks.
Figures 3.23 and 3.24 show the time overhead. The average when using a table-
based implementation on 64-bits is 4% for Olden and 72% for SPEC CINT2000—close
to the 32-bit results of Section 3.8. Figures 3.25 and 3.26 show the space overhead. The
average using a table is 21% for Olden and 11% for SPEC CINT2000. Olden’s space
overhead is higher than the 32-bit version; unlike the 32-bit case, the buddy allocator
contributes to this overhead by 14% on average.
2.5
2
1.5
1
0.5
0
Figure 3.25: Normalised peak memory use on AMD64 with Olden benchmarks.
2
1.5
1
0.5
0
Figure 3.26: Normalised peak memory use on AMD64 with SPEC CINT2000
benchmarks.
Tagged pointers, are 1–2% faster on average than using a table, but slower for bh
and especially crafty. Tagged pointers also use about 5% less memory for most
benchmarks, as expected from avoiding the table’s space overhead, except for a few
such as power and crafty. These exceptions arise because the prototype does not
map pages to different addresses on demand, but instead maps 32 30-bit regions of
3.10 Discussion 57
virtual address space on program startup. Hence the fixed overhead is notable for
these benchmarks whose absolute memory usage is low.
While mapping multiple views was implemented entirely in user-space, a robust
implementation would probably require kernel support. The gains, however, appear
too small to justify the complexity.
3.10 Discussion
3.10.1 Alternative implementations
Two alternative 64-bit implementations are briefly discussed here. If code is trans-
formed by relocating unsafe local variables to the heap, which may have some run-
time cost, both the table and the tagged-pointer variants can be improved in new
ways. Instead of a buddy system, segregated allocation zones can be used. Placing
allocations of the same size adjacent to each other enables some optimisations. In the
table-based variant, slot sizes can increase by orders of magnitude, decreasing table
memory waste and improving cache performance. In the case of tagged pointers, al-
location zones can be placed in virtual addresses matching their size tags, removing
the need for instrumentation to set the tag bits. This almost makes tagged pointers
practical on 32-bit architectures, as address bits are reused for the bounds. In addi-
tion, safe variables and library data can be loaded to high addresses, with all size tag
bits set (corresponding to maximal bounds), removing the need for interoperability
instrumentation.
ates code that uses va_arg and a local variable instead of a named argument to access
the unnamed arguments. This is the only case of intentional access across objects I
have encountered. The other case involved an unused invalid read in the h264ref
benchmark in the SPEC CINT2006 suite:
1 int dd, d[16];
2 for (dd=d[k=0]; k<16; dd=d[++k])
3 // ...
Here the bogus result of the buffer read past the end of the buffer is never used,
rendering this violation harmless and hard to notice, even though it is clearly a bug
according to ISO C. Such problems, however, are less likely to go unnoticed with
writes, because they may corrupt memory. The solution in Chapter 4 can avoid such
false positives by checking only writes.
to less temporal protection than the simple implementation. Fortunately, the simple
implementation is compatible with BBC. The memory consumption of the simple
scheme can be improved by reusing physical memory pages backing deallocated ob-
jects, and it is less sensitive to fragmentation, as it can reuse memory within pools.
The resulting increase in address space use is not a problem in 64-bit systems, and
can be addressed in 32-bit systems with infrequent conservative garbage collection,
or simply ignored by allowing address space to wrap around and reuse pages in the
hope that the delay was sufficient to thwart attacks.
The next chapter presents an alternative solution that is even faster at the expense
of some spatial protection, and also adds a degree of temporal protection.
Chapter 4
Write integrity testing
This chapter presents WIT, a defence against memory errors that is fast, offers broad
coverage against attacks, is backwards-compatible, and does not incur false positives.
WIT differs from BBC and other bounds checking systems in how it addresses
the fundamental cost of tracking the intended referent objects of pointers to ensure
dereferences stay within bounds. Some bounds checking systems address it by at-
taching metadata to pointers, while the Jones and Kelly approach (used by BBC)
avoids pointer metadata by instrumenting pointer arithmetic to ensure pointers keep
pointing to their intended referent. BBC is fast because of its simple, streamlined
runtime pointer tracking. WIT goes even further by statically approximating tracking
to streamline checks even more, and using static analysis to increase protection when
possible.
At compile time, WIT uses interprocedural points-to analysis [76] to compute the
control-flow graph and the set of objects that can be written by each instruction in the
program. At runtime, WIT enforces write integrity (Section 1.4.2), that is, it prevents
instructions from modifying objects that are not in the set computed by the static
analysis. Additionally, WIT inserts small objects called guards between the original
objects in the program. Since the guards are not in any of the sets computed by the
static analysis, this allows WIT to prevent sequential overflows and underflows even
when the static analysis is imprecise. WIT also enforces control-flow integrity [1, 2], that
is, it ensures that the control-flow transfers at runtime are allowed by the control-flow
graph computed by the static analysis.
WIT uses the points-to analysis to assign a colour to each object and to each write
instruction such that all objects that can be written by an instruction have the same
colour. It instruments the code to record object colours at runtime and to check that
instructions write to the right colour. The colour of memory locations is recorded in
a colour table that is updated when objects are allocated and deallocated. Write checks
look up the colour of the memory location being written in the table and check if it is
equal to the colour of the write instruction. This ensures write integrity.
WIT also assigns a colour to indirect call instructions and to the entry points of
functions that have their address taken in the source code and thus may be called
indirectly such that all functions that may be called by the same instruction have the
same colour. WIT instruments the code to record function colours in the colour ta-
ble and to check indirect calls. The indirect call checks look up the colour of the
target address in the table and check if it matches the colour of the indirect call in-
struction. These checks together with the write checks ensure control-flow integrity.
61
62 Write integrity testing
Control-flow integrity prevents the attacker from bypassing the checks and provides
an effective second line of defence against attacks that are not detected by the write
checks.
These mechanisms allow WIT to provide broad coverage using lightweight run-
time instrumentation. In the rest of this chapter I will explain WIT, and evaluate its
coverage and performance.
4.1 Overview
1 char cgiCommand[1024];
2 char cgiDir[1024];
3
4 void ProcessCGIRequest(char *msg, int sz) {
5 int i = 0;
6 while (i < sz) {
7 cgiCommand[i] = msg[i];
8 i++;
9 }
10
11 ExecuteRequest(cgiDir, cgiCommand);
12 }
Figure 4.1: Example vulnerable code: simplified web server with a buffer overflow
vulnerability.
WIT has both a compile-time and a runtime component. I will use the familiar
example from Chapter 3, repeated in Figure 4.1, to illustrate how both components
work. Recall that the web server of the example receives a request for a CGI command
of unknown size to be stored in the fixed size variable cgiCommand. By overflowing
cgiCommand, an attacker can overwrite cgiDir and coerce the web server into execut-
ing a program from anywhere in the file system.
At compile time, the system uses a points-to analysis [76] to compute the set of
objects that can be modified by each instruction in the program. For the example in
Figure 4.1, the analysis computes the set {i} for the instructions at lines 5 and 8, and
the set {cgiCommand} for the instruction at line 7.
To reduce space and time overhead at runtime, the system performs a write safety
analysis that identifies instructions and objects that are safe. An instruction is safe if
it cannot violate write integrity and an object is safe if all instructions that modify the
object (according to the points-to analysis) are safe. In particular, all read instructions
are safe, as well as write instructions to constant offsets from the data section or the
frame pointer, assuming the frame pointer is not corrupted (which WIT can guarantee
since saved frame-pointers are not within the write set of any program instruction).
In the example, the write safety analysis determines that instructions 5 and 8 are
safe because they can only modify i and, therefore, i is safe. It also determines that
4.1 Overview 63
WIT also records in the colour table the colour of each function that can be called
indirectly. It inserts instrumentation to update the colour table at program start-up
time and to check the colour table on indirect calls. The indirect call checks compare
the colour of the indirect call instruction and its target. If the colours are different,
they raise an exception. There are no indirect calls in the example of Figure 4.1.
WIT can prevent all attacks that violate write integrity but, unlike baggy bounds
used in Chapter 3, which attacks violate this property depends on the precision of
the points-to analysis. For example, if two objects have the same colour, it may fail to
detect attacks that use a pointer to one object to write to the other. The results show
that the analysis is sufficiently precise for most programs to make this hard. Addi-
tionally, WIT can prevent many attacks regardless of the precision of the points-to
analysis. For example, it prevents: attacks that exploit buffer overflows and under-
flows that only allow sequential writes to increasing or decreasing addresses until an
object boundary is crossed (which are the most common type of memory-safety vul-
nerability); attacks that overwrite any safe objects (which include return addresses,
exception handler pointers, and data structures for dynamic linking); and attacks that
corrupt heap-management data structures.
Control-flow integrity provides an effective second line of defence when write
checks fail to detect an attack. WIT prevents all attacks that violate control-flow in-
tegrity but which attacks violate this property also depends on the precision of the
points-to analysis. For example, if many functions have the same colour as an indi-
rect call instruction, the attacker may be able to invoke any of those functions. In the
worst case, the analysis may assign the same colour to all functions that have their
address taken in the source code. Even in this worst case, an attacker that corrupts a
function pointer can only invoke one of these functions. Furthermore, these functions
do not include library functions invoked indirectly through the dynamic linking data
structures. Therefore, the attacker cannot use a corrupt function pointer to jump to
library code, to injected code or to other addresses in executable memory regions.
This makes it hard to launch attacks that subvert the intended control flow, which are
the most common.
WIT does not prevent out-of-bounds reads, and is, hence, vulnerable to disclosure
of confidential data. It protects, however, indirectly against out-of-bounds reads of
pointers. Storing pointer metadata out-of-band makes it hard to exploit out-of-bounds
pointer reads without violating write integrity or control-flow integrity in the process.
(Refer to Section 2.3 for a discussion of the threat posed by such out-of-bounds reads
vs. out-of-bounds reads in general). Therefore, I chose not to instrument reads to
achieve lower overhead, and to increase the precision of the analysis by avoiding
merging sets due to constraints caused by read instructions which are otherwise not
needed for writes.
WIT can prevent attacks on the example web server. The write check before line 8
fails and raises an exception if the attacker attempts to overflow cgiCommand. When
i is 1024, the colour of the location being written is 0 (which is the colour of the
guard) rather than 3 (which is the colour of cgiCommand). Even without guards, WIT
would be able to detect this attack because the colours of cgiCommand and cgiDir are
different.
4.2 Static analysis 65
The results of the points-to and write safety analysis are used to assign colours
to objects and to unsafe instructions. An iterative process is used to compute colour
sets, which include objects and unsafe pointer dereferences that must be assigned the
same colour because they may alias each other. Initially, there is a separate colour
set for each points-to set of an unsafe pointer: the initial colour set for a points-to set
p → {o1 , . . . , on } is {[ p], o1 , . . . , on }. Then intersecting colour sets are merged until a
fixed point is reached. A distinct colour is assigned to each colour set: it is assigned
to all objects in the colour set and all instructions that write pointer dereferences in
the set. All the other objects in the original program are assigned colour zero. By
only considering points-to sets of unsafe pointers when computing colours, the false
negative rate and the overhead to maintain the colour table are reduced.
WIT uses a similar algorithm to assign colours to functions that have their ad-
dress taken in the source code and thus may be called indirectly. The differences
are that this version of the algorithm iterates over the points-to sets of pointers that
are used in indirect call instructions (except indirect calls to functions in dynamically
linked libraries), and that it only considers the objects in these sets that are functions.
Compiler generated indirect calls through the Windows import address table (IAT) to
library functions can be excluded because the IAT is protected from corruption by the
write checks. Each colour set is assigned a different colour that is different from 0,
1, and the colours assigned to unsafe objects. The rest of the code is assigned colour
zero.
4.3 Instrumentation
4.3.1 Colour table
WIT tracks colours in a table similar to the one used by BBC to track bounds, us-
ing 1 byte to represent the colour of a memory slot. The slot size, however, is 8
bytes, to match the alignment used by the 32-bit Windows standard memory alloca-
tor. Therefore, the table introduces a space overhead of 12.5%. Recall that to prevent
any two objects having to share a slot, all unsafe objects must be aligned to multiples
of 8 bytes. This does not introduce any memory overhead for dynamic allocations,
since the memory allocator on Windows already aligns memory sufficiently. How-
ever, since the stack on 32-bit Windows is only four-byte aligned, enforcing 8 byte
alignment for local variables by aligning the frame pointer in the function prologue
would disrupt the frame pointer omission optimisation applied by compilers for using
the frame pointer as an extra general purpose register. This optimisation is important
for register-starved architectures such as the 32-bit x86.
WIT can satisfy the alignment constraints for unsafe local variables without aligning
the stack frame in the function prologue by using the following technique. It forces
unsafe objects and guard objects in the stack and data sections to be four byte aligned
(which the compiler can satisfy with padding alone) and inserts additional four-byte
aligned pads after unsafe objects. For an unsafe object of size s, the pad is eight-bytes
long if ds/4e is even and four-bytes long if ds/4e is odd. WIT sets ds/8e colour-table
entries to the colour of the unsafe object when the pad is four bytes long and ds/8e + 1
4.3 Instrumentation 67
when the pad is eight bytes long. This workaround is not needed in 64-bit Windows
or operating systems having an 8- or 16-byte-aligned stack.
guard
pad guard
guard
pad guard
pad
pad
variable
variable
variable
variable
Memory Layout Color Table Memory Layout Color Table Memory Layout Color Table Memory Layout Color Table
s/4 s/4
Figure 4.2: Ensuring that two objects with distinct colours never share the same
eight-byte slot. The pad after unsafe objects takes the colour of the
guard, the unsafe object, or both depending on the actual alignment
at runtime. The lowest addresses are at the bottom of the figure.
Figure 4.2 shows how the extra padding works. Depending on the alignment at
runtime, the pad gets the colour of the unsafe object, the guard, or both. All these con-
figurations are legal because the pads and guards should not be accessed by correct
programs and the storage locations occupied by unsafe objects are always coloured
correctly. Conceptually, the pads allow the guards to “move” to ensure that they do
not share a slot with the unsafe objects.
Since the points-to analysis does not distinguish between different fields in objects
and between different elements in arrays, the same colour is assigned to all elements
of an array and to all the fields of an object. Therefore, it is not necessary to change
the layout of arrays and objects, which is important for backwards compatibility.
Eight bits are sufficient to represent enough colours because the write safety analy-
sis is very effective at reducing the number of objects that must be coloured. However,
it is possible that more bits will be required to represent colours in very large pro-
grams. If this ever happens, two solutions are possible: colour-table entries can be
increased to 16 bits, or stay at 8 bits at the expense of more false negatives.
As with BBC, the colour table is implemented as an efficient array. For a Windows
system with 2 GB of virtual address space available to the program, 256 MB of virtual
address space is reserved for the colour table. A vectored exception handler is used
to capture accesses to unallocated memory, allocate virtual memory, and initialise it
to zero. The base of the colour table is currently at address 40000000h. So to compute
68 Write integrity testing
the address of the colour-table entry for a storage location, its address is shifted right
by three, and added to 40000000h.
Finally, an optimisation is possible to avoid the need for most guards between
stack and global objects, by laying them out such that adjacent objects have differ-
ent colours. This is not implemented in the current prototype.
corresponding to guard objects are set to 0 and those corresponding to the unsafe lo-
cal variable are set to two. The base of the colour table is at address 40000000h. The
final instruction restores the original value of ecx. The instrumentation for epilogues
is identical but it sets the colour-table entries to zero.
An alternative would be to update colour-table entries only on function entry for all
objects in the stack frame. This alternative adds significantly higher overhead because
on average only a small fraction of local variables are unsafe, while the approach
described above incurs no overhead to update the colour table when functions have
no unsafe locals or arguments, which is common for simple functions that are invoked
often.
The colour table is also updated when heap objects are allocated or freed. The
program is instrumented to call wrappers of the allocation functions such as malloc
and calloc. These wrappers receive the colour of the object being allocated as an
additional argument. They call the corresponding allocator and then set the colour
table entries for the allocated object to the argument colour. They set ds/8e entries for
an object of size s using memset. They also set the colour-table entries for the eight-
byte slots immediately before and after the object to colour 1. These two slots contain
a chunk header maintained by the standard allocator in Windows that can double as
a guard since it is never touched by application code. Calls to free are also replaced
by calls to a wrapper. This wrapper sets the colour-table entries of the object being
freed to zero and then invokes free. A different colour is used for guards in the heap
to detect some invalid uses of free (as explained in the next section).
Then it checks if the pointer argument points to an object with this colour, if it is eight
byte aligned, if it points in user space, and if the slot before this object has colour one.
An exception is raised if this check fails. The first check prevents double frees because
the colour of heap objects is reset to zero when they are freed. The last two checks
prevent frees whose argument is a pointer to a non-heap object or a pointer into the
middle of an allocated object. Recall that colour one is reserved for heap guards and
is never assigned to other memory locations.
Checks are also inserted before indirect calls. These checks look up the colour of
the target function in the colour table and compare this colour with the colour of the
indirect call instruction. An exception is raised if they do not match. For example,
the following x86 assembly code is generated to check an indirect call to the function
whose address is in edx and whose colour is supposed to be 20.
1 s h r edx , 3
2 cmp byte p t r [ e d x +0 x40000000 ] , 20
3 j e L1
4 int 3
5 L1 : s h l edx , 3
6 call edx ; indirect call
The first instruction shifts the function pointer right by three to compute the colour
table index of the first instruction in the target function. The cmp instruction checks
if the colour in the table is 20, the colour for allowed targets for this indirect call
instruction. If they are different, WIT raises an exception. If they are equal, the index
is shifted left by three to restore the original function pointer value and the function
is called.
Unlike the instruction sequence used for write checks, this sequence zeroes the three
least significant bits of the function pointer value. WIT aligns the first instruction in
a function on a 16-byte boundary (already the compiler default unless optimising for
space), so this has no effect if the function pointer value is correct. But it prevents
attacks that cause a control flow transfer into the middle of the first eight byte slot
of an allowed target function. Therefore, this instruction sequence ensures that the
indirect call transfers control to the first instruction of a call target that is allowed by
the static analysis. The checks on indirect calls are sufficient to enforce control-flow
integrity because all other control data is protected by the write checks.
The libc library is also instrumented to detect memory errors due to incorrect
use of libc functions without having to provide wrappers for every one. However,
instrumenting libc using the WIT version described so far would require a different
libc binary for each program. Instead, a special WIT variant is used for libraries. It
assigns the same well-known colour (different from zero or one) to all unsafe objects
allocated by the library and inserts guards around these objects. All safe objects
used by the library functions have colour zero. Before writes, the WIT variant for
libc checks that the colour of the location being written is greater than one, that is,
that the location is not a safe object or a guard object. These checks prevent libc
functions from violating control-flow integrity. They also prevent all common buffer
overflows due to incorrect use of libc functions. However, they cannot prevent attacks
that overwrite an unsafe object by exploiting format-string vulnerabilities with the %n
specifier. However, these can be prevented with static analysis [136, 12, 126, 35] and
are in any case disallowed by some implementations [47].
Wrappers need to be written for functions that are not instrumented, including libc
functions written in assembly (for example, memcpy and strcpy) and for system calls
(for example, recv). These wrappers receive the colours of destination buffers as extra
arguments and scan the colour-table entries corresponding to the slots written by the
wrapped function to ensure that they have the right colour. Since the colour table is
very compact, these wrappers introduce little overhead.
4.4 Protection
WIT prevents attacks that exploit buffer overflows and underflows by writing ele-
ments sequentially until an object boundary is crossed. These attacks are always
prevented because the write checks fail when a guard is about to be overwritten. This
type of attack is very common.
The write checks do not detect buffer overflows and underflows inside an object.
For example, they will not detect an overflow of an array inside a C structure that
overwrites a function pointer, a data pointer, or some security-critical data in the
same function. In the first two cases, WIT can prevent the attacker from successfully
exploiting this type of overflow because the indirect call checks severely restrict the
targets of indirect calls and because the write checks may prevent writes using the cor-
rupt data pointer. Most backwards-compatible C bounds checkers [82, 170, 130, 57]
do not detect overflows inside objects and, unlike WIT, have no additional checks to
prevent successful exploits. Moreover, enforcing the bounds of sub-objects is inher-
ently ambiguous in C. Nevertheless, a recent proposal by Nagarakatte et al. [104] does
offer support for detecting overflows inside objects, although this comes at a cost of
up to 3× memory increase to store bounds for individual pointers.
The write checks prevent all attacks that attempt to overwrite code or objects with
colour zero. Since objects have colour zero by default, this includes many common
types of attacks. For example, return addresses, saved base pointers, and exception
handler pointers in the stack all have colour zero. Other common attack targets like
the import address table (IAT), which is used for dynamic linking, also have colour
zero. The write checks prevent the attacker from modifying code because the colours
4.5 Interoperability 73
assigned to indirect call targets are different from the colours assigned to unsafe ob-
jects and the rest of the code has colour zero.
WIT can prevent corruption of the heap-management data structures used by the
standard allocator in Windows without any changes to the allocator code. The checks
on free prevent corruption due to incorrect use of free, and the write checks pre-
vent corruption by unsafe aligned writes because the data structures have colour one
or zero. However, writes that are not aligned may overwrite the first few bytes of
the heap metadata after an object. Misaligned writes generate exceptions in many
architectures but they are allowed in the x86. To a large extent, misaligned writes
can be disallowed cheaply, by checking the alignment at pointer casts. Adding eight
bytes of padding at the end of each heap object can prevent corruption in all cases.
In most applications, this adds little space and time overhead but it can add signifi-
cant overhead in applications with many small allocations. This overhead may not be
justifiable because most programs avoid misaligned writes for portability and perfor-
mance, and recent versions of the Windows allocator can detect many cases of heap
metadata corruption with reasonable overhead through sanity checks in memory al-
location routines.
Control-flow integrity provides an effective second line of defence when the write
checks fail to detect an attack, but which attacks violate control-flow integrity also
depends on the precision of the points-to analysis. In the experiments of Section 4.6.5,
the maximum number of indirect call targets with the same colour is 212 for gap, 38
for vortex, and below 7 for all the other applications.
Even if the analysis assigned the same colour to all indirect call targets, an attacker
that corrupted a function pointer could only invoke one of these targets. Furthermore,
these targets do not include functions in dynamically linked libraries that are invoked
indirectly through the IAT. These library functions have colour zero and indirect calls
through the IAT are not checked because the IAT is protected by the write checks.
Therefore, the attacker cannot use a corrupt function pointer to transfer control to
library code, to injected code, or to other addresses in executable memory regions.
This makes it hard to launch attacks that subvert the intended control flow, which are
the most common.
WIT does not prevent out-of-bound reads. These can lead to disclosure of con-
fidential data but are hard to exploit for executing arbitrary code without violating
write integrity or control-flow integrity in the process, because writes and indirect
calls through pointers retrieved by accessing out-of-bounds memory are still checked.
4.5 Interoperability
Interoperability requires running programs containing both instrumented and unin-
strumented code without raising false alerts. Uninstrumented code is not disrupted,
because WIT does not change the programmer visible layout of data structures in
memory. All memory, however, has colour 0 by default, thus instrumented code
would raise an alert when writing to memory allocated by uninstrumented code that
has the default colour. Fortunately, heap allocations by uninstrumented code can
be intercepted to assign a special colour at runtime and guards between allocations.
74 Write integrity testing
The data and code sections of uninstrumented modules can also be assigned special
colours at startup. Guards, however, cannot be placed between global variables. With
these in place, an error handler can check for the special colour before raising an alert.
However, the case of instrumented code accessing memory in stack frames of unin-
strumented functions remains. This can happen if an uninstrumented function calls
an instrumented function passing it a reference to a local variable. It can be avoided
to a large extent by providing two versions for each function callable from unin-
strumented code: one callable from instrumented code, and another from uninstru-
mented. All direct function calls in instrumented code can be modified to call the
instrumented version using name mangling, while uninstrumented code defaults to
the uninstrumented version with the unmangled name. Uninstrumented code, how-
ever, can still end up calling an instrumented function through a function pointer,
and pass it an uncoloured local variable argument. To avoid false positives, this can
be detected by the error handler by unwinding the stack of instrumented functions
to decide whether the faulting address belongs to the stack frame of an instrumented
function before raising an alert, but this has not been implemented in the current
prototype.
ϯϬ
Ϯϱ йWhŽǀĞƌŚĞĂĚĨŽƌt/d
ϮϬ
ϭϱ
ϭϬ
ϱ
Ϭ
ŐnjŝƉ ǀƉƌ ŵĐĨ ĐƌĂĨƚLJ ƉĂƌƐĞƌ ŐĂƉ ǀŽƌƚĞdž ďnjŝƉϮ ƚǁŽůĨ
Figure 4.3: CPU overhead for SPEC benchmarks.
йWhŽǀĞƌŚĞĂĚĨŽƌt/d
Olden, the average overhead is 4% and the maximum is 13%. It is hard to do defini-
tive comparisons with previous techniques because they use different compilers, op-
erating systems and hardware, and they prevent different types of attacks. However,
WIT’s overhead can be compared with published overheads of other techniques on
SPEC and Olden benchmarks. For example, CCured [105] reports a maximum over-
head of 87% and an average of 28% for Olden benchmarks, but it slows down some
applications by more than a factor of 9. The bounds-checking technique in [57] has an
average overhead of 12% and a maximum overhead of 69% in the Olden benchmarks.
WIT has three times lower overhead on average and the maximum is five times lower.
Further comparisons are provided in Chapter 6.
Figures 4.5 and 4.6 show WIT’s memory overhead for SPEC and Olden benchmarks.
The overhead is low for all benchmarks. For SPEC, the average memory overhead is
13% and the maximum is 17%. For Olden, the average is 13% and the maximum is
16%. This overhead is in line with expectations: since WIT uses one byte in the colour
table for each 8 bytes of application data, the memory overhead is close to 12.5%. The
overhead can decrease below 12.5% because colour-table entries are not used for safe
objects and most of the code. On the other hand, the overhead can grow above 12.5%
because of guard objects and pads between unsafe objects, but the results show that
this overhead is small. It is interesting to compare this overhead with that of previous
76 Write integrity testing
techniques even though they have different coverage. For example, CCured [105]
reports an average memory overhead of 85% for Olden and a maximum of 161%.
Xu et al. [170] report an average increase of memory usage by a factor of 4.31 for the
Olden benchmarks and 1.59 for the SPEC benchmarks.
Ϯϱ
йŵĞŵŽƌLJŽǀĞƌŚĞĂĚĨŽƌt/d
ϮϬ
ϭϱ
ϭϬ
ϱ
Ϭ
ŐnjŝƉ ǀƉƌ ŵĐĨ ĐƌĂĨƚLJ ƉĂƌƐĞƌ ŐĂƉ ǀŽƌƚĞdž ďnjŝƉϮ ƚǁŽůĨ
Figure 4.5: Memory overhead for SPEC benchmarks.
йŵĞŵŽƌLJŽǀĞƌŚĞĂĚĨŽƌt/d
Next, the SPEC benchmarks were used to break down the runtime overhead in-
troduced by WIT. Parts of the instrumentation were removed in sequence. First,
indirect-call checks were removed. Then instructions that perform write checks were
removed, except for the first lea instruction (this maintains the register pressure due
to the checks). Next the lea instructions were removed, which removes the register
pressure added by write checks by freeing the registers they used. After that the in-
structions that set colours for heap objects were removed, and, finally, the setting of
colours for stack objects.
Figure 4.7 shows a breakdown of the CPU overhead. For all benchmarks, the write
checks account for more than half of the CPU overhead and a significant fraction
of their overhead is due to register pressure. For vortex, crafty and bzip2, setting
colours on stack allocations contributes significantly to the overhead because they
have array local variables. For example, bzip2 has large arrays which are not used
in their entirety; nevertheless, colours need to be assigned each time. For gap and
4.6 Experimental evaluation 77
vpr setting colours on heap allocations contributes 12% and 3% of the overhead but
for all other benchmarks this overhead is negligible. The indirect-call checks have a
negligible impact on performance except for gap where they contribute 16% of the
overhead, because one of the main loops in gap looks up function pointers from a
data structure and calls the corresponding functions. The gap overhead may thus be
representative for C++ programs not otherwise evaluated in this work. These results
suggest that improving write-safety analysis and eliminating redundant checks for
overlapping memory slots in loops and structures could reduce the overhead even
further.
$!!"#
,!"#
!"#$%&'()*+$%"&$"$,-'.-/0"
+!"#
*!"#
)!"# -./01.234.5#-67-89#
(!"# 970#67:;#-.2.19#
'!"# 970#90:-8#-.2.19#
&!"# 17<=9071#;1799>17#
%!"#
90.17#-67-89#
$!"#
!"#
I also experimented with a version of the WIT runtime that adds 8 bytes of padding
at the end of each heap object to protect heap metadata from hypothetical corrup-
tion by misaligned writes. In addition to wasted space, poor cache utilisation also
increased the execution time. The average time to complete the SPEC benchmarks
increased from 10% to 11% and the average memory overhead increased from 13%
to 15%. The average time to complete the Olden benchmarks increased from 4% to
7% and the average memory overhead increased from 13% to 63%. The overhead
increased significantly in the Olden benchmarks because there are many small alloca-
tions. As discussed earlier, it seems likely that the increased security does not justify
the extra overhead.
SQL Server is a relational database from Microsoft that was infected by the infa-
mous Slammer [102] worm. The vulnerability exploited by Slammer causes sprintf
to overflow a stack buffer. WIT was used to compile the SQL Server library with the
vulnerability. WIT detects Slammer when the sprintf function tries to write over the
guard object inserted after the vulnerable buffer.
Ghttpd is an HTTP server with several vulnerabilities [63]. The vulnerability tested
is a stack-buffer overflow when logging GET requests inside a call to vsprintf. WIT
detects attacks that exploit this vulnerability when vsprintf tries to write over the
guard object at the end of the buffer.
NullHTTPd is another HTTP server. This server has a heap-overflow vulnerability
that can be exploited by sending HTTP POST requests with a negative content length
field [64]. These requests cause the server to allocate a heap buffer that is too small
to hold the data in the request. While calling recv to read the POST data into the
buffer, the server overwrites the heap-management data structures maintained by
the C library. This vulnerability can be exploited to overwrite arbitrary words in
memory. I attacked NullHTTPd using the technique described by Chen et al. [37]. This
attack inspired the example in Section 4.1, and works by corrupting the CGI-BIN
configuration string. This string identifies a directory holding programs that may be
executed while processing HTTP requests. Therefore, by corrupting it, the attacker
can force NullHTTPd to run arbitrary programs. This is a non-control-data attack
because the attacker does not subvert the intended control-flow in the server. WIT
detects the attack when the wrapper for the recv call is about to write to the guard
object at the end of the buffer.
Stunnel is a generic tunnelling service that encrypts TCP connections using SSL.
I studied a format-string vulnerability in the code that establishes a tunnel for
SMTP [65]. An attacker can overflow a stack buffer by sending a message that is
passed as a format string to the vsprintf function. WIT detects the attack when
vsprintf attempts to write the guard object at the end of the buffer.
80 Write integrity testing
Libpng is a library for processing images in the PNG file format [26]. Many applica-
tions use libpng to display images. I built a test application distributed with libpng
and attacked it using the vulnerability described in [155]. The attacker can supply
a malformed image file that causes the application to overflow a stack buffer. WIT
detects the attack when a guard object is about to be written.
ϭϲϬ
ϭϰϬ
ŶƵŵďĞƌŽĨĐŽůŽƌƐ
ϭϮϬ
ϭϬϬ
ϴϬ
ϲϬ
ϰϬ
ϮϬ
Ϭ
ŐnjŝƉ ǀƉƌ ŵĐĨ ĐƌĂĨƚLJ ƉĂƌƐĞƌ ŐĂƉ ǀŽƌƚĞdž ďnjŝƉϮ ƚǁŽůĨ
Figure 4.8: Number of colours for SPEC benchmarks.
Figure 4.8 shows the number of colours used by objects and functions in these
benchmarks, and Figure 4.9 shows a cumulative distribution of the fraction of memory
write instructions versus the upper bound on the number of objects writable by each
instruction. For example, the first graph in Figure 4.9 shows that 88% of the memory
write instructions in bzip can write at most one object at runtime, 99.5% can write
at most two objects, and all instructions can write at most three objects. Therefore,
even in this worst case, the attacker can only use a pointer to one object to write to
another in 12% of the write instructions and in 96% of these instructions it can write
to at most one other object. In practice, the program code and the guards will further
reduce the sets of objects writable by each instruction.
The results in Figure 4.9 show that the precision of the points-to analysis can vary
significantly from one application to the other. For all applications except mcf and
4.6 Experimental evaluation 81
ϭ
Ϭ͘ϵ
ďnjŝƉ
ĨƌĂĐƚŝŽŶŽĨƐƚŽƌĞŝŶƐƚƌƵĐƚŝŽŶƐ
Ϭ͘ϴ
Ϭ͘ϳ ŐĂƉ
Ϭ͘ϲ ŵĐĨ
Ϭ͘ϱ
Ϭ͘ϰ
Ϭ͘ϯ
Ϭ͘Ϯ
Ϭ͘ϭ
Ϭ
Ϭ ϱ ϭϬ ϭϱ
ƵƉƉĞƌďŽƵŶĚŽŶŶƵŵďĞƌŽĨǁƌŝƚĂďůĞŽďũĞĐƚƐ
ϭ
Ϭ͘ϵ
ĨƌĂĐƚŝŽŶŽĨƐƚŽƌĞŝŶƐƚƌƵĐƚŝŽŶƐ
Ϭ͘ϴ ĐƌĂĨƚLJ
Ϭ͘ϳ ŐnjŝƉ
Ϭ͘ϲ ƉĂƌƐĞƌ
Ϭ͘ϱ
Ϭ͘ϰ
Ϭ͘ϯ
Ϭ͘Ϯ
Ϭ͘ϭ
Ϭ
Ϭ ϱϬ ϭϬϬ ϭϱϬ
ƵƉƉĞƌďŽƵŶĚŽŶŶƵŵďĞƌŽĨǁƌŝƚĂďůĞŽďũĞĐƚƐ
ϭ
Ϭ͘ϵ
ĨƌĂĐƚŝŽŶŽĨƐƚŽƌĞŝŶƐƚƌƵĐƚŝŽŶƐ
Ϭ͘ϴ ƚǁŽůĨ
Ϭ͘ϳ ǀƉƌ
Ϭ͘ϲ
ǀŽƌƚĞdž
Ϭ͘ϱ
Ϭ͘ϰ
Ϭ͘ϯ
Ϭ͘Ϯ
Ϭ͘ϭ
Ϭ
Ϭ ϭϬϬϬϬ ϮϬϬϬϬ ϯϬϬϬϬ ϰϬϬϬϬ ϱϬϬϬϬ ϲϬϬϬϬ
ƵƉƉĞƌďŽƵŶĚŽŶŶƵŵďĞƌŽĨǁƌŝƚĂďůĞŽďũĞĐƚƐ
Figure 4.9: Cumulative distribution of the fraction of store instructions versus the
upper bound on the number of objects writable by each instruction.
82 Write integrity testing
parser, the attacker cannot make the majority of instructions write to incorrect objects.
For bzip, gap, crafty, and gzip, 93% of the instructions can write to at most one
incorrect object in the worst case. The precision is worse for twolf, vpr and vortex
because they allocate many objects dynamically. However, the fraction of instructions
that can write a large number of objects is relatively small. These constraints can
render a fraction of the bugs in a program unexploitable by preventing compromised
write instructions from affecting security critical data. The fraction varies depending
on the precision of the analysis and the location of bugs and security critical data in
the program, but if a bug is covered, the protection cannot be worked around.
4.7 Discussion
WIT improves the efficiency of runtime checks by giving up direct detection of invalid
memory reads and approximating the correctness of memory writes. Nevertheless,
it guarantees the detection of sequential buffer-overflows and prevents branching to
arbitrary control-flow targets, and it can further increase the protection depending
on the precision of a pointer analysis. Interestingly, WIT improves protection over
BBC in three cases: it addresses some temporal memory-errors and memory errors
inside structures by constraining subsequent uses of hijacked pointers, and it provides
control-flow integrity.
The key advantages, however, are that WIT is 6 times faster on average than BBC
for the SPEC benchmarks; achieves consistently good performance by avoiding a slow
path for handling out-of-bounds pointers; and is more immune to false positives.
These benefits justify the required tradeoffs.
BBC and WIT were designed for user-space programs. The next chapter discusses
practical memory safety for kernel extensions.
Chapter 5
Byte-granularity isolation
Bugs in kernel extensions written in C and C++ remain one of the main causes of
poor operating-system reliability despite proposed techniques that isolate extensions
in separate protection domains to contain faults. Previous fault-isolation techniques
for commodity systems either incur high overheads to support unmodified kernel
extensions, or require porting for efficient execution. Low-overhead isolation of ex-
isting kernel extensions on standard hardware is a hard problem because extensions
communicate with the kernel using a complex interface, and they communicate fre-
quently.
This chapter discusses byte-granularity isolation (BGI), a new software fault isola-
tion technique that addresses this problem. BGI is based on WIT, and can also detect
common types of memory errors inside kernel extensions (those that can be detected
by WIT using a single colour for unsafe objects), but extends WIT’s mechanisms to
isolate kernel extensions in separate protection domains that share the same address
space. Like WIT, its threat model does not aim to protect confidentiality or mitigate
malicious kernel extensions, but adds protection for the availability of the kernel in
the presence of memory errors in its extensions. The solution is efficient and offers
fine-grained temporal and spatial protection that makes it possible to support legacy
APIs like the Windows Driver Model (WDM) [114] that is used by the vast majority
of Windows extensions. BGI also ensures type safety for kernel objects by checking
that API functions are passed objects of the expected type at runtime.
5.1 Overview
Figure 5.1 highlights some of the difficulties in isolating existing kernel extensions.
It shows how a simplified file system driver might process a read request in the
Windows Driver Model (WDM) [114]. This example illustrates how BGI works. Error
handling is omitted for clarity.
At load time, the driver registers the function ProcessRead with the kernel. When an
application reads data from a file, the kernel creates an I/O Request Packet (IRP) to
describe the request and calls the driver to process it. ProcessRead sends the request
to a disk driver to read the data from disk and then decrypts the data (by XORing
the data with the key). SetParametersForDisk and DiskReadDone are driver functions
used by ProcessRead that are not shown in the figure.
83
84 Byte-granularity isolation
while they are being used by the kernel. Furthermore, these solutions fail to ensure
that these objects are initialised before they are used. These are serious limitations, as
corrupted kernel objects in driver memory can propagate corruption to the kernel’s
fault domain. For example, they do not prevent writes to event e in the example
above. This is a problem because the event object includes a linked list header that is
used internally by the kernel. Thus, by overwriting or failing to initialise the event,
the driver can cause the kernel to write to arbitrary addresses.
These techniques also perform poorly when extensions interact with the kernel
frequently because they copy objects passed by reference in cross-domain calls. For
example, they would copy the buffer to allow ProcessRead to write to the buffer during
decryption. XFI by Erlingsson et al. [159] avoids the copy but falls back to slower
checks for such cases. Finally, solutions relying on hardware page protection for
isolation such as Nooks by Swift et al. [149, 147] incur additional overhead when
switching domains, which is a frequent operation in device drivers that often perform
little computation and lots of communication.
BGI is designed to have adequate spatial and temporal resolution to avoid these
problems. It can grant access to precisely the bytes that a domain should access, and
can control precisely when the domain is allowed to access these bytes because it
can grant and revoke access efficiently. Therefore, BGI can provide strong isolation
guarantees for WDM drivers with low overhead and no changes to the source code.
BGI driver
BGI assigns an access control list (ACL) to every byte of virtual memory (Sec-
tion 5.2). To store ACLs compactly, they are encoded as small integers, so BGI can
store them in a table with 1 byte for every 8 bytes of virtual memory—as in WIT—
when all bytes in the 8-byte slot share the ACL and the ACL has a single entry (as-
suming domains with default access are not listed). It adds, however, a slow path
to cover the general case when not all bytes in a slot share an ACL, and reserves a
table entry value to indicate such cases. WIT’s compile-time changes to the layout
of data ensure this slow path is rarely taken. Support for the general case, however,
is still necessary in BGI because, unlike in WIT, individual structure fields in kernel
objects can have different ACLs, and fields cannot be aligned to avoid breaking binary
compatibility with the rest of the kernel.
86 Byte-granularity isolation
BGI is not designed to isolate malicious code. It assumes that attackers can control
the input to a driver but they do not write the driver code. It is designed to contain
faults due to errors in the driver and to contain attacks that exploit these errors.
Device-driver writers are trusted to use the BGI compiler.
Windows kernel extensions are written in C or C++. To isolate a Windows kernel
extension using BGI, a user compiles the extension with the BGI compiler and links
it with the BGI interposition library, as shown in Figure 5.2. BGI-isolated extensions
can run alongside trusted extensions on the same system, as shown in Figure 5.3.
The instrumentation inserted by the compiler along with the interposition library can
enforce fault isolation as described in the following sections.
Windows kernel
the other checks prevent errors due to incorrect reads from propagating outside the
domain.
BGI grants an icall right on the first byte of functions that can be called indirectly or
passed in kernel calls that expect function pointers. This is used to prevent extensions
from bypassing checks and to ensure that control-flow transfers across domains target
only allowed entry points. Cross-domain control transfers through indirect calls need
to check for this right, but are otherwise implemented as simple function calls without
stack switches, extra copies, or page table changes.
Type rights are used to enforce dynamic type safety for kernel objects. There is a
different type right for each type of kernel object, for example, mutex, device, I/O
request packet, and deferred procedure call objects. When an extension initialises a
kernel object stored in the extension’s memory, the interposition library grants the
appropriate type right on the first byte of the object, and grants write access to the
fields that are writable by the extension according to the API rules and revokes write
access to the rest of the object. Access rights change as the extension interacts with the
kernel; for example, rights are revoked when the object is deleted, or the extension
no longer requires access to the object (e.g. when a driver completes processing an
I/O request packet). The interposition library also checks if extensions pass objects of
the expected type in cross-domain calls. This ensures type safety: extensions can only
create, delete, and manipulate kernel objects using the appropriate interface functions.
Type safety prevents many incorrect uses of the kernel interface.
The last type of right, ownership rights, are used to keep track of the domain that
should free an allocated object and which deallocation function to use. The ownership
rights for objects are associated with their adjacent guards, rather than objects’ bytes
(these guards were discussed in Section 4.3.2).
The interposition library and the code inserted by the compiler use two primitives
to manipulate ACLs: SetRight, which grants and revokes access rights, and CheckRight,
which checks access rights before memory accesses. When a thread running in do-
main d calls SetRight( p, s, r ) with r 6= read, BGI grants d access right r to all the bytes in
the range [ p, p + s). For example, SetRight( p, s, write) gives the domain write access to
the byte range. To revoke any other access rights to the byte range [ p, p + s), a thread
running in the domain calls SetRight( p, s, read). To check ACLs before a memory ac-
cess to byte range [ p, p + s), a thread running in domain d calls CheckRight( p, s, r ). If
d does not have right r to all the bytes in the range, BGI raises a software interrupt.
BGI defines variants of SetRight and CheckRight that are used with icall and type rights.
SetType( p, s, r ) marks p as the start of an object of type r and size s that can be used
by the domain, and also prevents writes to the range [ p, p + s). CheckType( p, r ) is
equivalent to CheckRight( p, 1, r ).
Figure 5.4 shows an example partition of five kernel drivers into domains. Driver 1
runs in the trusted domain with the kernel code because it is trusted. The other drivers
are partitioned into two untrusted domains. There is a single driver in domain d2 but
there are three drivers in domain d1 . Frequently, the functionality needed to drive a
device is implemented by multiple driver binaries that communicate directly through
custom interfaces. These drivers should be placed in the same domain. Figure 5.5
shows example ACLs for the domains in Figure 5.4. The greyed boxes correspond to
the default ACL that only allows domains to read the byte. The other ACLs grant
88 Byte-granularity isolation
some of the domains more rights on accesses to the corresponding byte; for example,
one of them grants domains d1 and d2 the right to use a shared lock.
kernel space
.
.
.
d1: read, d2: read, d3: read
d1: read, d2: read, d3: write
d1: read, d2: read, d3: read
d1: read, d2: write, d3: read
d1: lock, d2: lock, d3: read
0x7FFFFFFF d1: read, d2: read, d3: read d1: read, d2: read, d3: read
d1: write, d2: read, d3: read .
.
d1: read, d2: read, d3: read
.
d1: read, d2: read, d3: read
. .
.
.
0x00000000 d1: read, d2: read, d3: read d1: read, d2: read, d3: read
rights on a guard (Section 4.3.2) before or after the allocated memory (depending on
whether the allocation is smaller or larger than a page). Ownership rights are used
to identify the allocation function and the domain that allocated the memory. The
wrappers for deallocation functions check that the calling domain has the appropriate
ownership right and that is has write access to the region being freed. If these checks
fail, they signal an error. Otherwise, they revoke the ownership right and write access
to the memory being freed. This ensures that only the domain that owns an object can
free the object, that it must use the correct deallocation function, and that an object
can be freed at most once.
Call checks prevent extension errors from making threads in other domains execute
arbitrary code. Some kernel functions take function pointer arguments. Since the
90 Byte-granularity isolation
kernel may call the functions they point to, the interposition library checks if the
extension has the appropriate icall right to these functions. Kernel wrappers call
CheckType( p, icallN ) on each function pointer argument p before calling the kernel
function they wrap, where N is the number of stack bytes used by the arguments to
an indirect call through p. The stdcall calling convention used in Windows drivers
requires the callee to remove its arguments from the stack before returning (and does
not support the vararg feature). Therefore, the icall rights encode N to prevent stack
corruption when functions with the wrong type are called indirectly. Conversely,
extension wrappers check function pointers returned by extension functions.
The icall rights are also granted and revoked by the interposition library with help
from the compiler. The compiler collects the addresses of all functions whose address
is taken by the extension code and the number of bytes consumed by their arguments
on the stack. This information is stored in a section in the extension binary. The
wrapper for the driver initialisation function calls SetType( p, 1, icallN ) for every pair
of function address p and byte count N in this section to associate an icall right with
the entry point of the function. When kernel functions return function pointers, their
wrappers replace these pointers by pointers to the corresponding kernel wrappers and
grant the appropriate icall rights. Since BGI does not grant icall rights in any other
case, cross-domain calls into the domain can only target valid entry points: functions
whose address was taken in the code of an extension running in the domain and
kernel wrappers whose pointers were returned by the interposition library.
Type checks are used to enforce a form of dynamic type safety for kernel objects.
There is a different type right for each type of kernel object. When a kernel function
allocates or initialises a kernel object with address p, size s, and type t, its wrapper
calls SetType( p, s, t) and grants write access to any fields that can be written directly
by the extension. The wrappers for kernel functions that receive kernel objects as
arguments check if the extension has the appropriate type right to those arguments,
and wrappers for kernel functions that deallocate or uninitialise objects revoke the
type right to the objects. Since many kernel objects can be stored in heap or stack
memory allocated by the extension, BGI also checks if this memory holds active ker-
nel objects when it is freed. Together, these checks ensure that extensions can only
create, delete, and manipulate kernel objects using the appropriate kernel functions.
Moreover, extensions in an untrusted domain can only use objects that were received
from the kernel by a thread running in the domain.
The type checks performed by BGI go beyond traditional type checking because
the type, i.e., the set of operations that are allowed on an object, changes as the ex-
tension interacts with the kernel. BGI implements a form of dynamic typestate [143]
analysis. For example, in Figure 5.1, the extension wrapper for ProcessRead grants
irp right to the first byte of the IRP. This allows calling IoSetCompletionRoutine
and IoCallDriver, that check if they receive a valid IRP. But the irp right is re-
voked by the wrapper for IoCallDriver to prevent modifications to the IRP while it
is used by the disk driver. The extension wrapper for DiskReadDone grants the irp
right back after the disk driver is done. Then the right is revoked by the wrapper for
IoCompleteRequest because the IRP is deleted after completion. These checks enforce
interface usage rules that are documented but were not previously enforced.
5.3 Interposition library 91
In addition to using access rights to encode object state, some kernel wrappers use
information in the fields of objects to decide whether a function can be called without
corrupting the kernel. This is safe because BGI prevents the extension from modifying
these fields.
1 VOID
2 _BGI_KeInitializeDpc(PRKDPC d,
3 PKDEFERRED_ROUTINE routine, PVOID a) {
4 CheckRight(d, sizeof(KDPC), write);
5 CheckFuncType(routine, PKDEFERRED_ROUTINE);
6 KeInitializeDpc(d, routine, a);
7 SetType(d, sizeof(KDPC), dpc);
8 }
9
10 BOOLEAN
11 _BGI_KeInsertQueueDpc(PRKDPC d, PVOID a1, PVOID a2) {
12 CheckType(d, dpc);
13 return KeInsertQueueDpc(d, a1, a2);
14 }
Figure 5.6 shows two example kernel wrappers. The first one wraps the KeIni-
tializeDpc function that initialises a data structure called a deferred procedure call
(DPC). The arguments are a pointer to a memory location supplied by the extension
that is used to store the DPC, a pointer to an extension function that will be later
called by the kernel, and a pointer argument to that function. The wrapper starts
by calling CheckRight(d, sizeof (KDPC), write) to check if the extension has write access
to the memory region where the kernel is going to store the DPC. Then it checks if
the extension has the appropriate icall right to the function pointer argument. Check-
FuncType is a macro that converts the function pointer type into an appropriate icall
right and calls CheckType. In this case, it calls CheckType(routine, icall16), where 16
is the number of stack bytes used by the arguments to an indirect call to a routine
of type PKDEFERRED ROUTINE. If these checks succeed, the DPC is initialised and the
wrapper grants the dpc right to the extension for the byte pointed to by d; note that
this simultaneously revokes write access to the DPC object. It is critical to prevent
the extension from writing directly to the object because it contains a function pointer
and linked list pointers. If the extension corrupted the DPC object, it could make the
kernel execute arbitrary code or write to an arbitrary location. KeInsertQueueDpc is
one of the kernel functions that manipulate DPC objects. Its wrapper performs a type
check to ensure that the first argument points to a valid DPC. These type checks pre-
vent several incorrect uses of the interface, including preventing a DPC object from
being initialised more than once or being used before it is initialised.
In Figure 5.6, the wrapper for KeInitializeDpc passes the function pointer routine
to the kernel. In some cases, the wrapper replaces the function pointer supplied
by the extension by a pointer to an appropriate extension wrapper and replaces its
92 Byte-granularity isolation
pointer argument with a pointer to a structure containing the original function and
its argument. This is not necessary in the example because routine returns no values
and it is invoked with arguments that the extension already has the appropriate rights
to (d, a, a1, and a2).
In collaboration with Miguel Castro, Manuel Costa, Jean-Philippe Martin, Marcus
Peinado, Austin Donnelly, Paul Barham, and Richard Black [33], I implemented 262
kernel wrappers and 88 extension wrappers. These cover the most common WDM,
WDF, and NDIS interface functions [101] and include all interface functions used by
the drivers in the experiments. Most of the wrappers are as simple as the ones in
Figure 5.6 and could be generated automatically from source annotations similar to
those proposed by Hackett et al. [73] and Zhou et al. [174]. There are 16700 lines of
code in the interposition library. Although writing wrappers represents a significant
amount of work, it only needs to be done once for each interface function by the OS
vendor. Driver writers do not need to write wrappers or change their source code.
5.4 Instrumentation
The BGI compiler inserts four types of instrumentation in untrusted extensions. It
redirects kernel function calls to their wrappers in the interposition library, adds code
to function prologues and epilogues to grant and revoke access rights for automatic
variables, and adds access-right checks before writes and indirect calls.
The compiler rewrites all calls to kernel functions to call the corresponding wrap-
pers to ensure that all communication between untrusted extensions and the kernel
is mediated by the interposition library. The compiler also modifies extension code
that takes the address of a kernel function to take the address of the corresponding
kernel wrapper in the interposition library. This ensures that indirect calls to kernel
functions are also redirected to the interposition library.
The compiler inserts calls to SetRight in function prologues to grant the calling
domain write access to local variables on function entry. In the example in Figure 5.1,
it inserts SetRight(&e, sizeof(e), write) in the prologue of the function ProcessRead.
To revoke access to local variables on function exit, the compiler modifies function
epilogues to first verify that local variables do not store active kernel objects and then
call SetRight to revoke access.
The compiler inserts a check before each write in the extension code to check
if the domain has write access to the target memory locations. It inserts
CheckRight( p, s, write) before a write of s bytes to address p. The compiler also inserts
checks before indirect calls in the extension code by adding a CheckType( p, icallN )
before any indirect call through pointer p that uses N bytes for arguments in the
stack.
The checks inserted by the compiler and those performed by the interposition li-
brary are sufficient to ensure control-flow integrity: untrusted extensions cannot by-
pass the checks inserted by the compiler; indirect calls (either in the driver code or
on its behalf in the kernel) target functions whose address was taken in the exten-
sion code or whose address was returned by the interposition library; returns transfer
control back the caller; and, finally, exceptions transfer control to the appropriate han-
5.4 Instrumentation 93
dler. Like WIT, BGI does not need additional checks on returns or exception handling
because write checks prevent corruption of return addresses and exception handler
pointers.
BGI can also prevent many attacks internal to a domain. These are a subset of
the attacks detected by WIT that includes the most common attacks. Control-flow
integrity prevents the most common attacks that exploit errors in extension code be-
cause these attacks require control flow to be transferred to injected code or to chosen
locations in code that is already loaded. BGI also prevents sequential buffer overflows
and underflows that can be used to mount attacks that do not violate control-flow
integrity. These are prevented, as in WIT, when write checks detect attempts to over-
write guards placed between consecutive dynamic, static, and automatic memory
allocations.
The instrumentation inserted by BGI uses a table similar to WIT and BBC to stores
ACLs in the form of small integers. This enables an efficient software implementation
for enforcing byte-granularity memory protection.
untrusted domains in addition to the trusted domain, because many rights are used
only by a particular type of driver (e.g., a filesystem or a network driver) or by drivers
that use a particular version of the interface (e.g., WDF). The number of supported
domains should be sufficient for most scenarios because many drivers developed by
the operating system vendor may run in the trusted domain, and each untrusted
domain may run several related drivers. Moreover, a 64-bit implementation could use
2-byte d-rights, with similar space overhead by using 16-byte slots, raising the number
of supported d-rights to 65k and making the solution future proof.
0x80000000
user address
space
user rights table
0x10000000
Figure 5.7: Kernel and user ACL tables in an x86 Windows address space.
The first challenge is addressed by reserving virtual address space for the kernel-
space table in kernel space and reserving virtual address space for a user-space table
in the virtual address space of each process. Therefore, there is a single kernel table
and a user table per process that is selected automatically by the virtual memory hard-
ware. Figure 5.7 shows the location of the user and kernel tables in the address space
of an x86 Windows operating system. The kernel reserves virtual address space for
the kernel table at address 0xe0000000 when the system boots, and reserves virtual
address space in every process at address 0x10000000 when the process is created.
The kernel allocates physical pages to the tables on demand when they are first ac-
cessed and zeroes them to set access rights to read for all domains. This prevents
5.4 Instrumentation 95
incorrect accesses by default, and also protects the tables themselves from being over-
written. Since some extension code cannot take page faults, the kernel was modified
to preallocate physical pages to back kernel table entries that correspond to pinned
virtual memory pages.
The same strategy could be used to implement BGI on the x64 architecture. Even
though it is necessary to reserve a large amount of virtual memory for the tables
in a 64-bit architecture, only top level table entries need to be allocated to do this.
Additional page metadata and physical pages only need to be allocated to the tables
on demand.
The second challenge is addressed by making multiple d-rights for a slot rare, and
handling the remaining cases using a special d-right value conflict in the table. A check
that encounters the conflict right fails, and invokes an error handler. This handler
checks whether the table value is conflict, and in that case, consults an auxiliary sparse
data structure—a conflict table—storing access right information for every byte of the
8-byte slot in question. A conflict table is a splay tree that maps the address of a slot to
a list of arrays with 8 d-rights. Each array in the list corresponds to a different domain
and each d-right in an array corresponds to a byte in the slot. A kernel conflict-table
is used for slots in kernel space and a user conflict-table per process is used for slots
in user space. Conflict tables are allocated in kernel space and each process object
includes a pointer to its user conflict-table.
checks if p is 8-byte aligned and s is at least 8. If this check succeeds, it uses the
first 8 bytes to store the type of the object by executing SetRight( p, 8, r ); SetRight( p +
8, s − 8, read), which avoids the access to the conflict table. Otherwise, it executes
SetRight( p, 1, r ); SetRight( p + 1, s − 1, read) as before. Similarly, CheckType( p, r ) checks
if p is 8-byte aligned and the d-right in the kernel table corresponds to access right
r for the domain. Only if this check fails, does it access the conflict table to check if
the byte pointed to by p has the appropriate d-right. To further reduce the number
of accesses to the conflict table, local variables and fields in local driver structs are
aligned on 8-byte boundaries if they have a kernel object type. (This can be achieved
using appropriate compiler-attributes for the definitions of the kernel object structures
in header files). Functions are 16-byte aligned.
A final optimisation avoids accesses to the conflict table while allowing a domain
to have different access rights to the bytes in a slot in two common cases: when a
domain has right write to the first half of the bytes in the slot and read to the second
half of the bytes, and, conversely, when a domain has right read to the first half of the
bytes in the slot and write to the second half of the bytes. BGI reserves two additional
d-rights per domain to encode these cases. This optimisation is effective in avoiding
accesses to conflict tables when a domain is granted write access to individual fields
in a kernel object whose layout cannot be modified. Such fields are typically 4-byte
aligned.
In 64-bit architectures most fields in kernel objects are 8-byte aligned, thus accesses
to the conflict table are minimised if 8-byte slots are used on 64-bits. These techniques,
however, remain relevant if the slot size on 64-bit architectures is increased to 16 bytes
(to allow encoding d-rights using 2 bytes, or to reduce memory overhead).
Figure 5.8: Code sequence that implements SetRight( p, 32, write) for the x86.
5.4 Instrumentation 97
Initially, pointer p is in register ebx and it can point either to kernel or user space.
The first instruction moves p into register eax. Next, the sar and btc instructions
compute the address of the entry in the right table without checking if p points to
kernel or user space and without using the base addresses of the tables. Figure 5.7
helps clarify how this works. Addresses in user space have the most significant bit
set to 0 and addresses in kernel space have the most significant bit set to 1. The sar
instruction shifts eax right by 3 bits and makes the 3 most significant bits equal to the
most significant bit originally in eax. (Note the use of arithmetic shift instead of the
logical shift used in the user-space WIT solution.) After executing sar, the four most
significant bits in eax will be 1111 for a kernel address and 0000 for a user address.
The btc instruction complements the least significant of these 4 bits. So the most
significant 4 bits in the result are 0xe when p is a kernel address and 0x1 when it is
a user address. The remaining bits in eax are the index of the table entry for the slot
pointed to by p. The final mov instruction sets four entries in the table to 0x02, which
grants the domain write access to [ p, p + 32).
A similar code sequence can be used on the x64 architecture by replacing 32-bit by
64-bit registers (because pointers are 64 bits long), shifting by 4 instead of 3 (to use
16-byte slots), and complementing a different bit (because the table bases would be at
different addresses).
1 push eax
2 lea eax , [ ebp -38 h ]
3 sar eax , 3
4 btc eax , 0 x1C
5 add eax , 5
6 xor dword ptr [ eax -4] , 0 x02020202
7 jne L1
8 L2 : pop eax
9 ...
10 ret 4
11 ...
12 L1 : push eax
13 lea eax , [ ebp -38 h ]
14 push eax
15 push 6
16 call _BGI_slowRevokeAccess
17 jmp L2
Figure 5.9: Code sequence that revokes access to local variables on function epi-
logues.
The BGI compiler inserts a code sequence similar to the one in Figure 5.8 in the
prologues of instrumented functions to grant write access to local variables. However,
the code sequence to revoke write access to local variables on function exit is more
complicated because it must check if a local variables stores an active kernel object.
Figure 5.9 shows an example. The intuition is to use xor instead of mov to zero the local
variables in a fast path, and check the xor instruction’s result to determine whether
98 Byte-granularity isolation
the values where not the expected ones. If type rights have been granted to the bytes in
question the check will fail and a slow path is invoked to deal with kernel objects on
the stack. The slow path must be able to restore the original d-right values, so the code
sequence is modified to ensure eax points just after the last XORed byte. In detail, the
sequence works as follows. The code stores the address of the guard before the first
local variable in eax (after saving eax) and the sar and btc instructions compute the
address of the kernel table entry for the guard. The add instruction updates eax to
point to the table entry right after the last table entry modified by the xor. It adds 5
to eax to account for the guard slot before the local variable and the 4 slots occupied
by the variable. If the local variable does not store any kernel object, the xor revokes
access to the local variable and the branch is not taken. Otherwise, the branch is taken
and the _BGI_slowRevokeAccess function is called. This function undoes the failed
xor and checks if the table entries for the slots occupied by the local variable have
d-rights corresponding to kernel objects. If it finds an active kernel object, it signals
an error. When functions have more local variables, the compiler adds another add,
xor, and jne for each variable. The address of the guard before the first local variable,
the current value of eax, and the number of slots for the stack frame are passed to
_BGI_slowRevokeAccess. _BGI_slowRevokeAccess works out whether the failed xor
was a four, two, or one byte variant, it undoes all xors up to the byte pointed to by
eax (non-inclusive), and then scans all the slots for kernel objects.
The BGI compiler also inserts checks before writes and indirect calls. Figure 5.10
shows an efficient code sequence that implements CheckRight( p, 1, write). Initially, p
is in ebx. The code computes the address of the table entry for the slot pointed to
by p in the same way as SetRight. Then, the cmp instruction checks if the entry has
d-right 0x02. If the check fails, the code calls one of the _BGI_slowCheck functions.
These functions receive a pointer to the memory range being checked and their name
encodes the size of the range. In this case, the code calls _BGI_slowCheck1, which
checks if the table entry contains a d-right that encodes write access to the half slot
being accessed and, if this fails, checks the d-right for the appropriate byte in the
conflict table. The indirect-call check is similar but it also checks the number of stack
bytes stored before the function and it does not have a slow path because functions
are always 16-byte aligned.
The BGI compiler uses simple static analysis to eliminate certain SetRight and
CheckRight sequences. It does not add SetRight for local variables that are not ar-
rays or structs and whose address is not taken. It also eliminates CheckRight before
the writes to these variables. This analysis is based on the one used in Chapters 3
and 4, but is more conservative, to avoid eliminating checks for memory that may
have its access rights change.
The interposition library implements SetRight( p, s, r ) similarly to the compiler, but
uses memset to set d-rights when s is not known statically or is large. Additionally,
it must deal with the case where p or p + s are not 8-byte aligned. In this case, the
interposition library sets d-rights as before for slots that are completely covered by
the byte range, but calls a function to deal with the remaining slots; this function sets
the corresponding table entries to the d-rights that encode write access to half a slot,
when possible. Otherwise, it records d-rights for individual bytes in the appropriate
conflict table. The interposition library implements CheckRight as in Figure 5.10, but
5.5 Experimental evaluation 99
it iterates the check for larger memory ranges. To improve performance, it compares
four d-rights at a time when checking write access to large memory ranges.
Figure 5.10: Code sequence that implements CheckRight( p, 1, write) for the x86.
5.4.5 Synchronisation
BGI avoids synchronisation on table accesses as much as possible to achieve good
performance. It uses enough synchronisation to avoid false positives but may fail to
isolate errors in some uncommon racy schedules. It should be hard for attackers to
exploit these races to escape containment given the assumption that they do not write
the extension code.
In particular, no synchronisation is used for the common case of granting or revok-
ing write access to all the bytes in a slot. Synchronisation is not necessary because:
(1) by design, it is an error for an untrusted domain to have non-read rights to a byte
of memory that is writable by another untrusted domain and (2) if threads running
in the same domain attempt to set conflicting d-rights on the same byte, this means
there is a pre-existing race in the code. Synchronisation is required, however, when
granting and revoking type rights and when granting and revoking write access to
half a slot. BGI uses atomic compare-and-swap on table entries to prevent false pos-
itives due to competing writes. Similarly, synchronisation is required when granting
or revoking rights involves an access to a conflict table. An atomic swap is used to
record the conflict in the appropriate table entry and a spin lock is used per conflict
table. Since these tables are rarely used, contention for the lock is not a problem.
There is no synchronisation in the fast path when checking rights. To prevent false
positives, the slow path retries the check after a memory barrier and uses spin locks to
synchronise accesses to the conflict tables when needed. Furthermore, the right checks
are not atomic with the access they check. This can never lead to false positives but
may fail to prevent an invalid access in some schedules, because when the right is
revoked there is a race between the check and the access.
Table 5.1 lists the kernel extensions used to evaluate BGI. The classpnp driver im-
plements common functionality required by storage class drivers like disk. The last
five drivers sit at the bottom of the USB driver stack in Windows Vista. The usbehci,
usbuhci, usbohci, and usbwhci drivers implement different USB host controller inter-
faces and usbport implements common functionality shared by these drivers. These
16 extensions have a total of more than 400,000 lines of code and use 350 different
functions from WDM, IFS, NDIS, and KMDF interfaces [101]. The source code for the
first 8 extensions is available in the Windows Driver Kit [101].
All experiments ran on HP xw4600 workstations with an Intel Core2 Duo CPU at
2.66 GHz and 4GB of RAM, running Windows Vista Enterprise SP1. The workstations
were connected with a Buffalo LSW100 100 Mbps switching hub for the Intel PRO/100
driver tests and with an HP 10-GbE CX4 cable for the Neterion Xframe driver tests.
Experiments with the FAT file system and disk drivers ran on a freshly formatted
Seagate Barracuda ST3160828AS 160GB disk. Experiments with the USB drivers used
a SanDisk 2GB USB2.0 Flash drive. The extensions were compiled, with and without
BGI instrumentation, using the Phoenix [99] compiler with the -O2 (maximise speed)
option. All the performance results are averages of at least three runs.
5.5.1 Effectiveness
BGI’s effectiveness at detecting faults before they propagate outside a domain was
measured by injecting faults into the fat and intelpro drivers, following a method-
ology similar to the one described in [174].
Bugs from the five types in Table 5.2 were injected into the source code of the
drivers. Empirical evidence indicates that these types of bugs are common in op-
erating systems [145, 39]. The random increment to loop upper bounds and to the
5.5 Experimental evaluation 101
number of bytes copied was 8 with 50% probability, 8 to 1K with 44% probability and
1K to 2K with 6% probability.
Buggy drivers were produced by choosing a bug type and injecting five bugs of
that type in random locations in the driver source. After excluding the buggy drivers
that failed to compile without BGI, 304 buggy fat drivers were tested by formatting
a disk, copying two files onto it, checking the file contents, running the Postmark
benchmark [83], and finally running chkdsk. Then 371 buggy intelpro drivers were
tested by downloading a large file over http and checking its contents.
There were 173 fat tests and 256 intelpro tests that failed without BGI. There
were different types of failure: blue screen inside driver, blue screen outside driver,
operating system hang, test program hang, and test program failure. A blue screen
is inside the driver if the faulting thread was executing driver code. Injected faults
escape when the test ends with a blue screen outside the driver or an operating system
hang. These are faults that affected the rest of the system. Injected faults are internal
when the test ends with one of the other types of failure.
Buggy drivers with escaping faults were isolated in a separate BGI domain and
tested again. Table 5.3 reports the number of faults that BGI was able to contain
before they caused an operating system hang or a blue screen outside the driver.
The faults that caused hangs were due to infinite loops and resource leaks. These
faults affect the rest of the system not by corrupting memory but by consuming an
excessive amount of CPU or other resources. Even though BGI does not contain
explicit checks for this type of fault, it surprisingly contains 60% of the hangs in fat
and 47% in intelpro. BGI is able to contain some hangs because it can detect some
internal driver errors before they cause the hang; for example, it can prevent buffer
overflows from overwriting a loop control variable. Containment could be improved
by modifying extension wrappers to impose upper bounds on the time to execute
driver functions, and by modifying the kernel wrappers that allocate and deallocate
resources to impose an upper bound on resources allocated to the driver [133, 167].
102 Byte-granularity isolation
BGI can contain more than 98% of the faults before they cause a blue screen outside
the drivers. These faults corrupt state: they violate some assumption about the state
in code outside the driver. These are the faults that BGI was designed to contain
and they account for 90% of the escaping faults in fat and 80% in intelpro. BGI
can contain a number of interesting faults; for example, three buggy drivers complete
IRPs incorrectly. This bug is particularly hard to debug without isolation because it
can corrupt an IRP that is now being used by the kernel or a different driver. BGI
contains these faults and pinpoints the call to the IoCompleteRequest that causes the
problem.
One of the two faults that cause a blue screen outside the intelpro driver is due to
a bug in another driver in the stack that is triggered by the modified behaviour in this
variant of intelpro.
BGI’s ability to detect internal faults was also evaluated by isolating and testing
buggy drivers with internal faults in a separate BGI domain. Table 5.4 shows the
number of internal errors that BGI can detect before they are detected by checks in
the test program or checks in the driver code. The results show that BGI can simplify
debugging by detecting many internal errors early.
5.5.2 Performance
The overhead introduced by BGI was measured using benchmarks. For the disk, file
system, and USB drivers, performance was measured using the PostMark [83] file-
system benchmark that simulates the workload of an email server.
For fat, PostMark was configured to use cached I/O with 10,000 files and 1 mil-
lion transactions. For the other drivers, it was configured for synchronous I/O with
100 files and 10,000 transactions. Caching was disabled for these drivers because,
otherwise, most I/O is serviced from the cache, masking the BGI overhead.
The disk and classpnp drivers were run in the same protection domain because
classpnp provides services to disk. This is common in Windows: a port driver im-
plements functionality common to several miniport drivers to simplify their imple-
mentation. Similarly, usbport and usbehci were run in the same domain. The other
drivers were tested separately.
Throughput is measured in transactions per second (Tx/s) as reported by PostMark,
and kernel CPU time as reported by kernrate, the Windows kernel profiler [101]. Ta-
ble 5.5 shows the percentage difference in kernel CPU time and throughput. BGI
increases kernel CPU time by a maximum of 10% in these experiments. The through-
put degradation is negligible in the 4 test cases that use synchronous I/O because
the benchmark is I/O bound. For fat, the benchmark is CPU bound and throughput
decreases by only 12%.
5.5 Experimental evaluation 103
Table 5.5: Overhead of BGI on disk, file system, and USB drivers, when running
PostMark.
These results are significantly better than those reported for Nooks [150]; for ex-
ample, Nooks increased kernel time by 185% in a FAT file-system benchmark. BGI
performs better because it does not change page tables or copy objects in cross-domain
calls.
The next results measure the overhead introduced by BGI to isolate the network
card drivers. For the TCP tests, socket buffers of 256KB and 32KB messages were
used with intelpro and socket buffers of 1MB and 64KB messages with xframe. IP
and TCP checksum offloading was enabled with xframe. For the UDP tests, 16-byte
packets were used with both drivers. Throughput was measured with the ttcp utility
and kernel CPU time with kernrate. These tests are similar to those used to evaluate
SafeDrive [174]. Table 5.6 shows the percentage difference in kernel CPU time and
throughput due to BGI.
The results show that isolating the drivers with BGI has little impact on throughput.
There is almost no throughput degradation with intelpro because this is a driver for
a slow 100Mb/s Ethernet card. There is a maximum degradation of 10% with xframe.
BGI reduces UDP throughput with xframe by less than SafeDrive [174]: 10% versus
11% for sends and 0.2% versus 17% for receives. But it reduces TCP throughput by
more than SafeDrive: 2.5% versus 1.1% for sends and 3.7% versus 1.3% for receives.
The SafeDrive results, however, were obtained with a Broadcom Tigon3 1Gb/s card
while xframe is a driver for a faster 10Gb/s Neterion Xframe E.
The average CPU overhead introduced by BGI across the network benchmarks is 8%
and the maximum is 16%. For comparison, Nooks [150] introduced a CPU overhead
of 108% on TCP sends and 46% on TCP receives using similar benchmarks with 1Gb/s
Ethernet. Nexus [167] introduced a CPU overhead of 137% when streaming a video
104 Byte-granularity isolation
∆ Tx/s(%)
Buffer size
BGI XFI fast path XFI slow path
1 -8.76 -6.8 -5.9
512 -7.08 -5.3 -4.8
4K -2.48 -2.7 -2.6
64K -1.14 -1.4 -1.7
Table 5.7: Comparison of BGI’s and XFI’s overhead on the kmdf driver.
using an isolated 1Gb/s Ethernet driver. Nexus runs drivers in user space, which
increases the cost of cross-domain switches. BGI’s CPU overhead is similar to the CPU
overhead reported for SafeDrive in the network benchmarks [174] but BGI provides
stronger isolation.
Finally, the kmdf benchmark driver was used to compare the performance of BGI
and XFI [159]. Table 5.7 shows the percentage change in transaction rate for differ-
ent buffer sizes for BGI and for two XFI variants offering write protection. In each
transaction, a user program stores and retrieves a buffer from the driver. BGI and XFI
have similar performance in this benchmark but XFI cannot provide strong isolation
guarantees for drivers that use the WDM, NDIS, or IFS interfaces (as discussed in
Section 5.1 when examining the limitations of previous solutions). XFI was designed
to isolate drivers that use the new KMDF interfaces. KMDF is a library that simplifies
development of Windows drivers. KMDF drivers use the simpler interfaces provided
by the library and the library uses WDM interfaces to communicate with the kernel.
However, many KMDF drivers have code that uses WDM interfaces directly because
KMDF does not implement all the functionality in the WDM interfaces. For example,
the KMDF driver serial writes to some fields in IRPs directly and calls WDM func-
tions that manipulate IRPs. Therefore, it is unclear whether XFI can provide strong
isolation guarantees for real KMDF drivers.
5.6 Discussion
Faults in kernel extensions are the primary cause of unreliability in commodity oper-
ating systems. Previous fault-isolation techniques cannot be used with existing kernel
extensions, or incur significant performance penalties. BGI’s fine-grained isolation
can support the existing Windows kernel extension API and the experimental results
show that the overhead introduced by BGI is low enough to be used in practice to iso-
late Windows drivers in production systems, detecting internal kernel-errors before a
system hang.
The Windows kernel API proved very amenable to fault isolation through API in-
terposition, raising the question of whether such interposition can be successfully
applied to other legacy operating systems with complicated interfaces. For this, a
well-defined API with strict usage rules is key. On a practical note, support for
Windows alone may have significant impact due to the popularity of this operat-
ing system. Nevertheless, it should be possible to use similar mechanisms for other
operating systems, at least for some kernel extensions, and it will be worthwhile to
5.6 Discussion 105
consider support for such interposition when designing future operating-system ex-
tension APIs.
Finally, it should be noted that BGI’s trusted computing base includes the compiler
as well as the interposition library, which may include bugs of their own. Future
work could simplify the implementation of the interposition library by using API
annotations to generate wrappers automatically.
Chapter 6
Related work
Mitigating the lack of memory safety in C programs is an old and important re-
search problem that has generated a huge body of literature. Unfortunately, previous
solutions have not been able to curb the problem in practice. This is because com-
prehensive solutions incur prohibitive performance overheads or break backwards
compatibility at the source or binary level. Efficient and backwards-compatible solu-
tions, on the other hand, have raised the bar, but remain vulnerable against advanced
exploitation techniques.
107
108 Related work
6.2.2 Tweaking C
Rather than porting programs to a totally new language, programmers might find it
easier to use a C dialect [81], a C subset amenable to analysis [60], or optional code
annotations [107, 42]. Making annotations optional permits unmodified C programs
to be accepted. Missing annotations can either fail safe at a performance cost [107],
or fall back to unsafe execution [42]. Annotations may have to be trusted, introducing
opportunities for errors, and may interact in complicated ways with the type system
6.2 Comprehensive solutions 109
(e.g. in C++). The approach of this dissertation, on the other hand, works on unmod-
ified programs, does not require annotations, and the runtime mechanisms can be
readily used with C++.
Cyclone [81, 71, 146] is a new dialect designed to minimise the changes required
to port over C programs. But it still makes significant changes: C-style unions are
replaced by ML-style sum types, and detailed pointer type declarations are required
to support separate compilation. For common programs, about 10% of the program
code must still be rewritten—an important barrier to adoption. Furthermore, Cyclone
uses garbage collection, which can introduce unpredictable overheads. The perfor-
mance problems of garbage collection were partially alleviated using region-based
memory management [71], at the expense of further increasing the required porting
effort. Cyclone’s overhead is acceptable for I/O bound applications, but is consider-
able for computationally-intensive benchmarks (up to a factor of 3 according to [81]).
An average slowdown by a factor of 1.4 is independently reported in [171].
Other approaches [88, 60] aim for memory safety without runtime checks or anno-
tations for a subclass of type-safe C programs. They cannot, however, address large,
general purpose programs, limiting their applicability to embedded and real-time
control systems.
CCured [107, 43, 105] is a backwards-compatible dialect of C using static analy-
sis with runtime checks for completeness. It uses type inference to conservatively
separate pointers into several classes of decreasing safety, taking advantage of anno-
tations when available. Most pointers are in fact found to be safe, and do not require
bound checks. It uses fat pointers for the rest, and inserts any necessary runtime
bounds checks. Unfortunately, fat pointers have backwards-compatibility problems.
In follow-up work [43], metadata for some pointers is split into a separate data struc-
ture whose shape mirrors that of the original user data, enabling backwards compati-
bility with libraries, as long as changes to pointers in data do not invalidate metadata.
CCured, like Cyclone, addresses temporal safety by changing the memory manage-
ment model to garbage collection. Overhead is up to 150% [107] (26% average, 87%
maximum, for the Olden benchmarks). Modifications to the source code are still re-
quired in some places, and unmodified programs may trigger false positives. With
the Cyclone benchmarks minimally modified to eliminate compile-time errors and
runtime false positives, CCured incurs an average slowdown of 4.7 according to [171].
Deputy [42] is a backwards-compatible dependent type system for C that lets pro-
grammers annotate bounded pointers and tagged unions. It has 20% average over-
head for a number of annotated real-world C programs, but offers no temporal pro-
tection and, in its authors’ tests, 3.4% of the lines of code had to be modified. A
version of Deputy for device drivers is discussed in Section 6.4.
pointer to three times its size to include upper and lower bounds. The BCC generated
code was supposed to be about 30 times slower than normal, but the programs ran too
slowly to be useful [141]. Rtcc [141] was a similar system adding run-time checking
of array subscripts and pointer bounds to the Portable C Compiler (PCC), resulting in
about ten-times slower program execution. Safe-C [11] was also based on fat pointers
but could detect both spatial and temporal errors, with execution overheads ranging
from 130% to 540%. Fail-Safe C [112] is a new, completely memory-safe compiler
for the C language that is fully compatible with the ANSI C specification. It uses
fat pointers, as well as fat integers to allow casts between pointers and integers. It
reportedly slows down programs by a factor of 2–4, and it does not support programs
using custom memory management.
These solutions have several problems caused by the use of fat pointers: assump-
tions about unions with integer and pointer members are often violated when using
fat pointers, and memory accesses to fat pointers are not atomic. Programs relying
on these assumptions, for example concurrent programs, may break unless modified.
In some cases, in-band metadata like fat pointers can be compromised. Furthermore,
memory overhead is proportional to the number of pointers, which can reach 200%
in pointer intensive programs. The most important problem caused by fat pointers,
however, is binary incompatibility: the resulting binaries cannot be linked with existing
binary code such as libraries, because fat pointers change the pointer representation
and the memory layout of data structures.
and weaken protection. The memory overhead of CRED was not studied, but should
be similar to [82].
Xu et al. [170] describe a technique to improve backwards compatibility and reduce
the overhead of previous techniques. Their solution provides spatial and temporal
protection. It separates metadata from pointers using data structures that mirror those
in the program. The metadata for each pointer p include bounds and a capability used
to detect temporal errors, as well as a pointer to a structure that contains metadata for
pointers that are stored within *p. This is one of the most comprehensive techniques,
but its average runtime overhead when preventing only spatial errors is 63% for Olden
and 97% for SPEC CINT—higher than BBC and WIT. Moreover, it has significant
memory overhead for pointer intensive programs, up to 331% on average for the
Olden benchmarks.
The system by Dhurjati et al. [57] partitions objects into pools at compile time and
uses a splay tree for each pool. These splay trees can be looked up more efficiently
than the single splay tree used by previous approaches, and each pool has a cache for
even faster lookups. This technique has an average overhead of 12% and a maximum
overhead of 69% for the Olden benchmarks. BBC’s average overhead for the same
benchmarks is 6% with a maximum of less than 20% (even if measured against the
buddy system baseline), while WIT’s average overhead for the same benchmarks is
4% with a maximum of 13%. Unfortunately, this system has not been evaluated for
large CPU-bound programs beyond the Olden benchmarks and the memory overhead
has not been studied. Interestingly, this technique can be combined with BBC easily.
SoftBound [104] is a recent proposal that records base and bound information for
every pointer to a separate memory area, and updates these metadata when loading
or storing pointer values. SoftBound’s full-checking mode provides complete spatial
protection with 67% average runtime overhead. To further reduce overheads, Soft-
Bound has a store-only checking mode, similar to WIT, that incurs only 22% average
runtime overhead. Similarly to the solutions presented in this dissertation, the key to
the efficiency of this system is its very simple code sequence for runtime checks: a
shift, a mask, an add, and two loads. This is still longer than either WIT (shift, load)
or BBC (shift, load, shift). The big difference is that its simple code sequence comes at
a high memory cost. Storing metadata per pointer can have memory overhead of up
to 200% for the linear-table version (16 bytes per entry) and 300% for the hashtable
version (24 bytes per entry). These theoretical worst-case overheads were encountered
in practice for several pointer-intensive programs. SoftBounds deals with memcpy in a
special way to ensure that metadata is propagated, but custom memcpy-like function
will break metadata tracking. One advantage, however, is that it can protect sub-
objects. Unfortunately, program-visible memory layout enables programs to legally
navigate across sub-objects, causing false positives. Finally, as with some other solu-
tions, library code modifying pointers in shared data structures will leave metadata
in an inconsistent state.
SafeCode [59] maintains the soundness of the compile-time points-to graph, the
call graph, and of available type information at runtime, despite memory errors. It
requires no source-code changes, allows manual memory management, and does not
use metadata on pointers or memory. The run-time overheads are less than 10% in
nearly all cases and 30% in the worst case encountered, but it cannot prevent many
112 Related work
array bound-errors relevant in protecting programs (e.g. those accessing objects of the
same type).
Similarly to WIT, DFI [32] combines static points-to analysis with runtime instru-
mentation to address non-control data attacks (these evade CFI, as discussed in Sec-
tion 6.2.9). It computes a data-flow graph at compile time, and instruments the pro-
gram to ensure that the flow of data at runtime follows the data-flow graph. To
achieve this it maintains a table with the identifier of the last instruction to write
to each memory location. The program is instrumented to update this table before
every write with the identifier of the instruction performing the write, and to check
before every read whether the identifier of the instruction that wrote the value being
read matches one of those prescribed by the static data-flow graph. DFI can detect
many out-of-bounds reads and reads-after-free, but it does not have guards to im-
prove coverage when the analysis is imprecise and its average overhead for the SPEC
benchmarks, where it overlaps with WIT, is 104%.
The technique described by Yong et al. [171] also has similarities with WIT: it assigns
colours to objects and checks a colour table on writes. However, it has worse coverage
than WIT because it uses only two colours, does not insert guards, and does not
enforce control-flow integrity. The two colours distinguish between objects that can
be written by an unsafe pointer and those that cannot. Yong et al. incur an average
overhead ten times larger than WIT on the SPEC benchmarks where they overlap.
Many debugging tools use a new virtual page for each allocation and rely on
page-protecting freed objects to catch dangling pointer dereferences. Dhurjati and
Adve [58] make this technique usable in practice by converting its memory overhead
to address space overhead: separate virtual memory pages for each object are backed
by the same physical memory. The run-time overhead for five Unix servers is less than
4%, and for other Unix utilities less than 15%. For allocation-intensive benchmarks,
however, the time overhead increases up to a factor of 11.
HeapSafe [70] uses manual memory management with reference counting to detect
dangling pointers. Its overhead over a number of CPU-bound benchmarks has a
geometric mean of 11%, but requires source-code modifications and annotations, e.g.
to deal with deallocation issues and memcpy-style functions that may transparently
modify pointers.
Conservative garbage collection [24] can also be used to address temporal errors,
but requires source-code tweaks, and has unpredictable overheads.
execution. This thwarts the attacker’s ability to specify a malicious pointer value,
which is crucial to the success of memory corruption attacks. The approach, however,
has been criticised for producing excessive false positives [87, 138].
Clause et al. [41] describe a technique to detect illegal memory accesses using dy-
namic taint tracking with many colours. It assigns a random colour to memory ob-
jects when they are allocated and, when a pointer to an object is created, it assigns the
colour of the object to the pointer. Then it propagates pointer colours on assignment
and arithmetic. On reads and writes to memory, it checks if the colour of the pointer
and the memory match. Their software-only version slows down SPEC CINT by a
factor of 100 or more. With special hardware and 256 colours, their average overhead
for SPEC CINT is 7%. This technique is similar to WIT, which has similar overhead
without special hardware support.
SafeMem [123] makes novel use of existing ECC memory technology to insert
guards around objects with reasonable overhead (1.6%-14.4%). The necessary hard-
ware, however, is not readily available on commodity systems.
HardBound [56] is a hardware-based bounds checking system. Moreover, many
proposals described earlier, e.g. [50, 84, 41] can use hardware support to improve
their performance. No comprehensive solutions, however, have been implemented in
commodity hardware to date.
The only hardware-based solution that is seeing wide adoption is data execution
prevention (DEP) [98] through the use of the NX bit in page table entries. DEP can
efficiently prevent attacker code smuggled into the program as data. It does not
protect, however, against attacks executing code already present in the program [53,
119, 134, 28, 89] and non-control-data attacks [37]. Moreover, the combination of DEP
and ASLR is not immune to attacks either [129].
check for incorrect uses of the extension interface, and it only supports 4-byte granu-
larity (which is insufficient without major changes to commodity operating systems).
Software-based fault-isolation techniques, like SFI [163], PittSFIeld [97] and
XFI [159], can isolate kernel extensions with low overhead but they do not deal with
the complex extension interfaces of commodity operating systems (as discussed in
Section 5.1). Other software-based fault-isolation techniques have similar problems.
SafeDrive [174] implements a bounds-checking mechanism for C that requires pro-
grammers to annotate extension source code. It has low overhead but it provides
weak isolation guarantees. It does not provide protection from temporal errors, and,
while it can prevent out-of-bounds reads, it does not distinguish between read and
write accesses; for example, SafeDrive would allow the driver to write to fields in the
IRP that can be legally read but not modified by extensions. Furthermore, it requires
programmers to annotate the extension source code. The Ivy project aims to combine
SafeDrive with Shoal [9] and HeapSafe [70], which should provide improved isolation
when completed. However, this combination would still lack BGI’s dynamic typestate
checking, would require annotations, and it would be likely to perform worse than
BGI because both Shoal and HeapSafe can introduce a large overhead.
Finally, SVA [51] provides a safe execution environment for an entire operating
system, and, similarly to BGI, it can enforce some safety properties to prevent many
memory errors. It does not isolate extensions, however, and incurs higher overhead
than BGI.
Table 7.1: Summary of the three proposals. All solutions are backwards compat-
ible, but protection and performance vary.
119
120 Conclusions
Unlike many previous solutions, I focused more on making runtime checks effi-
cient, rather than reducing their frequency through static analysis. Approaches rely-
ing on static analysis to eliminate redundant checks may have significant variation in
performance, depending on the precision of the analysis. Lightweight checks, on the
other hand, provide consistently good performance. This is especially true for WIT,
that does not have BBC’s slow path for handling out-of-bounds pointers.
At the implementation level, a simple but efficient data structure is used at the
core of all solutions: a linear table mapping memory ranges to their metadata, which
can be object sizes, colours, or rights. A key technique used in all solutions to im-
prove performance without breaking binary compatibility was to modify the layout
of objects in memory, for example, by padding objects to powers of two or organising
memory in slots.
The three systems were implemented using the Microsoft Phoenix compiler frame-
work [99], but it should be possible to re-implement them using the GNU GCC or
LLVM [92] frameworks. Preliminary experiments with partial implementations of
BBC and WIT using LLVM showed performance comparable to the Phoenix-based
implementations.
[1] Martı́n Abadi, Mihai Budiu, Úlfar Erlingsson, and Jay Ligatti. Control-flow in-
tegrity: Principles, implementations, and applications. In Proceedings of the 12th
ACM Conference on Computer and Communications Security (CCS). ACM, 2005.
Cited on pages 20, 27, 61, and 114.
[2] Martı́n Abadi, Mihai Budiu, Úlfar Erlingsson, and Jay Ligatti. A theory of
secure control flow. In Proceedings of the 7th IEEE International Conference on
Formal Engineering Methods (ICFEM), 2005. Cited on pages 20, 27, 61, and 114.
[3] Jonathan Afek and Adi Sharabani. Dangling pointer: Smashing the pointer for
fun and profit. http://www.blackhat.com/presentations/bh-usa-07/Afek/
Whitepaper/bh-usa-07-afek-WP.pdf, 2007. Cited on page 24.
[4] Dave Ahmad. The rising threat of vulnerabilities due to integer errors. IEEE
Security and Privacy Magazine, 1(4), July 2003. Cited on pages 23 and 24.
[5] Periklis Akritidis, Cristian Cadar, Costin Raiciu, Manuel Costa, and Miguel
Castro. Preventing memory error exploits with WIT. In Proceedings of the
2008 IEEE Symposium on Security and Privacy. IEEE Computer Society, 2008.
Cited on pages 19 and 45.
[6] Periklis Akritidis, Manuel Costa, Miguel Castro, and Steven Hand. Baggy
bounds checking: An efficient and backwards-compatible defense against out-
of-bounds errors. In Proceedings of the 18th USENIX Security Symposium. USENIX
Association, 2009. Cited on page 19.
[7] Lars Ole Andersen. Program Analysis and Specialization for the C Programming Lan-
guage. PhD thesis, DIKU, University of Copenhagen, 1994. Cited on pages 20,
21, and 65.
[9] Zachary R. Anderson, David Gay, and Mayur Naik. Lightweight annotations
for controlling sharing in concurrent data structures. In Proceedings of the ACM
SIGPLAN Conference on Programming Language Design and Implementation (PLDI).
ACM, 2009. Cited on page 118.
[10] Ken Ashcraft and Dawson Engler. Using programmer-written compiler exten-
sions to catch security holes. In Proceedings of the 2002 IEEE Symposium on Secu-
rity and Privacy. IEEE Computer Society, 2002. Cited on page 108.
121
122 Bibliography
[11] Todd M. Austin, Scott E. Breach, and Gurindar S. Sohi. Efficient detection of
all pointer and array access errors. In Proceedings of the ACM SIGPLAN Con-
ference on Programming Language Design and Implementation (PLDI). ACM, 1994.
Cited on pages 27, 29, 58, and 110.
[12] Dzintars Avots, Michael Dalton, V. Benjamin Livshits, and Monica S. Lam.
Improving software security with a C pointer analysis. In Proceedings of
the 27th International Conference on Software Engineering (ICSE). ACM, 2005.
Cited on pages 65, 72, and 113.
[13] Michael Bailey, Evan Cooke, Farnam Jahanian, David Watson, and Jose Nazario.
The Blaster worm: Then and now. IEEE Security and Privacy Magazine, 3(4), 2005.
Cited on page 23.
[14] Arash Baratloo, Navjot Singh, and Timothy Tsai. Transparent run-time defense
against stack smashing attacks. In Proceedings of the USENIX Annual Technical
Conference. USENIX Association, 2000. Cited on page 112.
[15] Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho,
Rolf Neugebauer, Ian Pratt, and Andrew Warfield. Xen and the art of virtual-
ization. In Proceedings of the 19th ACM Symposium on Operating System Principles
(SOSP). ACM, 2003. Cited on page 117.
[16] Elena Gabriela Barrantes, David H. Ackley, Stephanie Forrest, and Darko Ste-
fanović. Randomized instruction set emulation. ACM Transactions on Information
and System Security (TISSEC), 8(1), 2005. Cited on page 115.
[17] Elena Gabriela Barrantes, David H. Ackley, Trek S. Palmer, Darko Stefanovic,
and Dino Dai Zovi. Randomized instruction set emulation to disrupt binary
code injection attacks. In Proceedings of the 10th ACM Conference on Computer and
Communications Security (CCS). ACM, 2003. Cited on page 115.
[18] Emery D. Berger and Benjamin G. Zorn. DieHard: probabilistic memory safety
for unsafe languages. In Proceedings of the ACM SIGPLAN Conference on Program-
ming Language Design and Implementation (PLDI). ACM, 2006. Cited on pages 115
and 116.
[19] Brian N. Bershad, Stefan Savage, Przemysław Pardyak, Emin Gün Sirer, Marc E.
Fiuczynski, David Becker, Craig Chambers, and Susan Eggers. Extensibil-
ity safety and performance in the SPIN operating system. In Proceedings of
the 15th ACM Symposium on Operating System Principles (SOSP). ACM, 1995.
Cited on page 117.
[20] Sandeep Bhatkar and R. Sekar. Data space randomization. In Proceedings of the
5th international Conference on Detection of Intrusions and Malware, and Vulnerability
Assessment (DIMVA). Springer-Verlag, 2008. Cited on pages 115 and 116.
[21] Sandeep Bhatkar, R. Sekar, and Daniel C. DuVarney. Efficient techniques for
comprehensive protection from memory error exploits. In Proceedings of the 14th
USENIX Security Symposium. USENIX Association, 2005. Cited on page 115.
Bibliography 123
[22] Andrea Bittau, Petr Marchenko, Mark Handley, and Brad Karp. Wedge: Split-
ting applications into reduced-privilege compartments. In Proceedings of the
5th USENIX Symposium on Networked Systems Design and Implementation (NSDI).
USENIX Association, 2008. Cited on page 114.
[25] Herbert Bos and Bart Samwel. Safe kernel programming in the OKE. In Pro-
ceedings of the 5th IEEE Conference on Open Architectures and Network Programming
(OPENARCH). IEEE Computer Society, 2002. Cited on page 117.
[26] Thomas Boutell and Tom Lane. Portable network graphics (PNG) specification
and extensions. http://www.libpng.org/pub/png/spec/. Cited on pages 51
and 80.
[27] Danilo Bruschi, Lorenzo Cavallaro, and Andrea Lanzi. Diversified process
replicæ for defeating memory error exploits. In 3rd International Workshop on
Information Assurance (WIA). IEEE Computer Society, 2007. Cited on page 116.
[28] Erik Buchanan, Ryan Roemer, Hovav Shacham, and Stefan Savage. When good
instructions go bad: Generalizing return-oriented programming to RISC. In
Proceedings of the 15th ACM Conference on Computer and Communications Security
(CCS). ACM, 2008. Cited on pages 27 and 117.
[29] Bulba and Kil3r. Bypassing StackGuard and StackShield. Phrack, 10(56), 2000.
http://phrack.com/issues.html?issue=56&id=10. Cited on page 107.
[30] Cristian Cadar, Periklis Akritidis, Manuel Costa, Jean-Phillipe Martin, and
Miguel Castro. Data randomization. Technical Report TR-2008-120, Microsoft
Research, 2008. Cited on page 115.
[31] Martin C. Carlisle. Olden: Parallelizing Programs with Dynamic Data Struc-
tures on Distributed-Memory Machines. PhD thesis, Princeton University, 1996.
Cited on pages 44 and 74.
[32] Miguel Castro, Manuel Costa, and Tim Harris. Securing software by enforc-
ing data-flow integrity. In Proceedings of the 7th USENIX Symposium on Op-
erating Systems Design and Implementation (OSDI). USENIX Association, 2006.
Cited on pages 21, 28, 50, 65, and 112.
[33] Miguel Castro, Manuel Costa, Jean-Philippe Martin, Marcus Peinado, Perik-
lis Akritidis, Austin Donnelly, Paul Barham, and Richard Black. Fast byte-
granularity software fault isolation. In Proceedings of the 22nd ACM Symposium
on Operating System Principles (SOSP). ACM, 2009. Cited on pages 19 and 92.
124 Bibliography
[34] Walter Chang, Brandon Streiff, and Calvin Lin. Efficient and extensible secu-
rity enforcement using dynamic data flow analysis. In Proceedings of the 15th
ACM Conference on Computer and Communications Security (CCS). ACM, 2008.
Cited on page 113.
[35] Karl Chen and David Wagner. Large-scale analysis of format string vulnerabil-
ities in Debian Linux. In Proceedings of the Workshop on Programming Languages
and Analysis for Security (PLAS). ACM, 2007. Cited on pages 72, 108, and 113.
[36] Shuo Chen, Jun Xu, Nithin Nakka, Zbigniew Kalbarczyk, and Ravishankar K.
Iyer. Defeating memory corruption attacks via pointer taintedness detection.
In Proceedings of the International Conference on Dependable Systems and Networks
(DSN). IEEE Computer Society, 2005. Cited on page 113.
[37] Shuo Chen, Jun Xu, Emre C. Sezer, Prachi Gauriar, and Ravishankar K. Iyer.
Non-control-data attacks are realistic threats. In Proceedings of the 14th USENIX
Security Symposium. USENIX Association, 2005. Cited on pages 23, 26, 30, 31,
79, 114, and 117.
[38] Jim Chow, Ben Pfaff, Tal Garfinkel, and Mendel Rosenblum. Shredding
your garbage: Reducing data lifetime through secure deallocation. In Pro-
ceedings of the 14th USENIX Security Symposium. USENIX Association, 2005.
Cited on page 41.
[40] Peter Chubb. Get more device drivers out of the kernel! In Proceedings of the
2002 Ottawa Linux Symposium (OLS), 2004. Cited on pages 16 and 117.
[41] James Clause, Ioannis Doudalis, Alessandro Orso, and Milos Prvulovic. Ef-
fective memory protection using dynamic tainting. In Proceedings of the 22nd
IEEE/ACM International Conference on Automated Software Engineering (ASE).
ACM, 2007. Cited on pages 114 and 117.
[42] Jeremy Condit, Matthew Harren, Zachary Anderson, David Gay, and George C.
Necula. Dependent types for low-level programming. In Proceedings of the 16th
European Symposium on Programming (ESOP), 2007. Cited on pages 108 and 109.
[43] Jeremy Condit, Matthew Harren, Scott McPeak, George C. Necula, and Westley
Weimer. CCured in the real world. In Proceedings of the ACM SIGPLAN Con-
ference on Programming Language Design and Implementation (PLDI). ACM, 2003.
Cited on pages 16 and 109.
[44] Matt Conover and w00w00 Security Team. w00w00 on heap overflows. http:
//www.w00w00.org/files/articles/heaptut.txt, 1999. Cited on page 23.
Bibliography 125
[45] Manuel Costa, Jon Crowcroft, Miguel Castro, and Antony Rowstron. Can we
contain Internet worms? In Proceedings of the 3rd ACM Workshop on Hot Topics in
Networks (HotNets). ACM, 2004. Cited on page 113.
[46] Manuel Costa, Jon Crowcroft, Miguel Castro, Antony Rowstron, Lidong Zhou,
Lintao Zhang, and Paul Barham. Vigilante: End-to-end containment of Internet
worms. In Proceedings of the 20th ACM Symposium on Operating System Principles
(SOSP). ACM, 2005. Cited on page 113.
[47] Crispin Cowan, Matt Barringer, Steve Beattie, Greg Kroah-Hartman, Mike
Frantzen, and Jamie Lokier. FormatGuard: Automatic protection from printf
format string vulnerabilities. In Proceedings of the 10th USENIX Security Sympo-
sium. USENIX Association, 2001. Cited on pages 72 and 113.
[48] Crispin Cowan, Calton Pu, Dave Maier, Heather Hintony, Jonathan Walpole,
Peat Bakke, Steve Beattie, Aaron Grier, Perry Wagle, and Qian Zhang. Stack-
Guard: Automatic adaptive detection and prevention of buffer-overflow attacks.
In Proceedings of the 7th USENIX Security Symposium. USENIX Association, 1998.
Cited on pages 17, 27, and 107.
[49] Benjamin Cox, David Evans, Adrian Filipi, Jonathan Rowanhill, Wei Hu, Jack
Davidson, John Knight, Anh Nguyen-Tuong, and Jason Hiser. N-variant
systems: A secretless framework for security through diversity. In Pro-
ceedings of the 15th USENIX Security Symposium. USENIX Association, 2006.
Cited on page 116.
[50] Jedidiah R. Crandall and Frederic T. Chong. Minos: Control data attack preven-
tion orthogonal to memory model. In Proceedings of the 37th Annual IEEE/ACM
International Symposium on Microarchitecture (MICRO). IEEE Computer Society,
2004. Cited on pages 113 and 117.
[51] John Criswell, Andrew Lenharth, Dinakar Dhurjati, and Vikram Adve. Secure
virtual architecture: A safe execution environment for commodity operating
systems. In Proceedings of the 21st ACM Symposium on Operating System Principles
(SOSP). ACM, 2007. Cited on pages 42, 84, and 118.
[53] Solar Designer. “return-to-libc” attack. Bugtraq security mailing list, 1997.
Cited on pages 23 and 117.
[55] David Detlefs, Al Dosser, and Benjamin Zorn. Memory allocation costs in
large C and C++ programs. Software—Practice and Experience, 24(6), 1994.
Cited on page 58.
126 Bibliography
[56] Joe Devietti, Colin Blundell, Milo M. K. Martin, and Steve Zdancewic. Hard-
Bound: Architectural support for spatial safety of the C programming lan-
guage. In Proceedings of the 13th International Conference on Architectural Sup-
port for Programming Languages and Operating Systems (ASPLOS). ACM, 2008.
Cited on page 117.
[58] Dinakar Dhurjati and Vikram Adve. Efficiently detecting all dangling pointer
uses in production servers. In Proceedings of the International Conference
on Dependable Systems and Networks (DSN). IEEE Computer Society, 2006.
Cited on page 113.
[59] Dinakar Dhurjati, Sumant Kowshik, and Vikram Adve. SAFECode: Enforc-
ing alias analysis for weakly typed languages. SIGPLAN Notices, 41(6), 2006.
Cited on pages 27, 58, and 111.
[60] Dinakar Dhurjati, Sumant Kowshik, Vikram Adve, and Chris Lattner. Mem-
ory safety without runtime checks or garbage collection. In Proceedings of the
ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Sys-
tems (LCTES), 2003. Cited on pages 27, 58, 108, and 109.
[61] Petros Efstathopoulos, Maxwell Krohn, Steve VanDeBogart, Cliff Frey, David
Ziegler, Eddie Kohler, David Mazières, Frans Kaashoek, and Robert Morris.
Labels and event processes in the Asbestos operating system. In Proceedings
of the 20th ACM Symposium on Operating System Principles (SOSP). ACM, 2005.
Cited on page 118.
[62] Hiroaki Etoh and Kunikazu Yoda. ProPolice. In IPSJ SIGNotes Computer Security
(CSEC), volume 14, 2001. Cited on page 107.
[63] Security Focus. Ghttpd Log() function buffer overflow vulnerability. http:
//www.securityfocus.com/bid/5960. Cited on page 79.
[64] Security Focus. Null HTTPd remote heap overflow vulnerability. http://www.
securityfocus.com/bid/5774. Cited on page 79.
[65] Security Focus. STunnel client negotiation protocol format string vulnerability.
http://www.securityfocus.com/bid/3748. Cited on page 79.
[66] Alessandro Forin, David Golub, and Brian Bershad. An I/O system for Mach
3.0. In Proceedings of the USENIX Mach Symposium (MACHNIX). USENIX Asso-
ciation, 1991. Cited on page 117.
[67] Stephanie Forrest, Anil Somayaji, and David H. Ackley. Building diverse com-
puter systems. In Proceedings of the 6th Workshop on Hot Topics in Operating Sys-
tems (HotOS). IEEE Computer Society, 1997. Cited on page 28.
Bibliography 127
[68] Keir Fraser, Steven Hand, Rolf Neugebauer, Ian Pratt, Andrew Warfield, and
Mark Williamson. Safe hardware access with the Xen virtual machine monitor.
In Proceedings of the 1st Workshop on Operating System and Architectural Support
for the on demand IT InfraStructure (OASIS-1), 2004. Cited on page 117.
[70] David Gay, Rob Ennals, and Eric Brewer. Safe manual memory management.
In Proceedings of the 6th International Symposium on Memory Management (ISMM).
ACM, 2007. Cited on pages 27, 58, 113, and 118.
[71] Dan Grossman, Greg Morrisett, Trevor Jim, Michael Hicks, Yanling Wang, and
James Cheney. Region-based memory management in Cyclone. In Proceedings
of the ACM SIGPLAN Conference on Programming Language Design and Implemen-
tation (PLDI). ACM, 2002. Cited on pages 27, 58, and 109.
[72] Samuel Z. Guyer and Calvin Lin. Client-driven pointer analysis. In Proceedings
of the 10th International Static Analysis Symposium (SAS). Springer-Verlag, 2003.
Cited on page 108.
[73] Brian Hackett, Manuvir Das, Daniel Wang, and Zhe Yang. Modular checking
for buffer overflows in the large. In Proceedings of the 28th International Conference
on Software Engineering (ICSE). ACM, 2006. Cited on page 92.
[74] Hermann Härtig, Michael Hohmuth, Jochen Liedtke, Sebastian Schönberg, and
Jean Wolter. The performance of µ-kernel-based systems. In Proceedings of
the 16th ACM Symposium on Operating System Principles (SOSP). ACM, 1997.
Cited on page 117.
[75] R. Hastings and B. Joyce. Purify: Fast detection of memory leaks and access
errors. In Proceedings of the USENIX Winter Technical Conference, pages 125–138,
San Francisco, California, 1992. USENIX Association. Cited on page 112.
[76] Nevin Heintze and Olivier Tardieu. Ultra-fast aliasing analysis using CLA:
A million lines of C code in a second. SIGPLAN Notices, 36(5), 2001.
Cited on pages 21, 61, and 62.
[77] Jorrit N. Herder, Herbert Bos, Ben Gras, Philip Homburg, and Andrew S. Tanen-
baum. MINIX 3: A highly reliable, self-repairing operating system. ACM
SIGOPS Operating Systems Review, 40(3), 2006. Cited on page 117.
[78] Jason D. Hiser, Clark L. Coleman, Michele Co, and Jack W. Davidson. MEDS:
The memory error detection system. In Proceedings of the 1st International Sympo-
sium on Engineering Secure Software and Systems (ESSoS). Springer-Verlag, 2009.
Cited on page 112.
128 Bibliography
[79] Alex Ho, Michael Fetterman, Christopher Clark, Andrew Warfield, and Steven
Hand. Practical taint-based protection using demand emulation. In Proceed-
ings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems
(EuroSys). ACM, 2006. Cited on page 113.
[80] Galen C. Hunt and James R. Larus. Singularity: Rethinking the software stack.
ACM SIGOPS Operating Systems Review, 41(2), 2007. Cited on page 117.
[81] Trevor Jim, Greg Morrisett, Dan Grossman, Michael Hicks, James Cheney, and
Yanling Wang. Cyclone: A safe dialect of C. In Proceedings of the USENIX Annual
Technical Conference. USENIX Association, 2002. Cited on pages 108 and 109.
[82] Richard W. M. Jones and Paul H. J. Kelly. Backwards-compatible bounds check-
ing for arrays and pointers in C programs. In Proceedings of the 3rd Inter-
national Workshop on Automated and Algorithmic Debugging (AADEBUG), 1997.
Cited on pages 22, 29, 30, 33, 34, 36, 38, 44, 45, 48, 50, 72, 110, and 111.
[83] Jeffrey Katcher. PostMark: A new file system benchmark. Technical Report
TR3022, Network Appliance Inc., 1997. Cited on pages 101 and 102.
[84] Gaurav S. Kc, Angelos D. Keromytis, and Vassilis Prevelakis. Countering code-
injection attacks with instruction-set randomization. In Proceedings of the 10th
ACM Conference on Computer and Communications Security (CCS). ACM, 2003.
Cited on pages 115 and 117.
[85] Samuel C. Kendall. BCC: Runtime checking for C programs. In Proceed-
ings of the USENIX Summer Technical Conference. USENIX Association, 1983.
Cited on page 109.
[86] Vladimir Kiriansky, Derek Bruening, and Saman P. Amarasinghe. Secure ex-
ecution via program shepherding. In Proceedings of the 11th USENIX Security
Symposium. USENIX Association, 2002. Cited on pages 20, 27, and 114.
[87] Jingfei Kong, Cliff C. Zou, and Huiyang Zhou. Improving software security
via runtime instruction-level taint checking. In Proceedings of the 1st Workshop
on Architectural and System Support for Improving Software Dependability (ASID).
ACM, 2006. Cited on page 114.
[88] Sumant Kowshik, Dinakar Dhurjati, and Vikram Adve. Ensuring code safety
without runtime checks for real-time control systems. In Proceedings of the inter-
national conference on Compilers, Architecture, and Synthesis for Embedded Systems
(CASES). ACM, 2002. Cited on page 109.
[89] Sebastian Krahmer. x86-64 buffer overflow exploits and the borrowed
code chunks exploitation technique. www.suse.de/~krahmer/no-nx.pdf, 2005.
Cited on page 117.
[90] Maxwell Krohn, Alexander Yip, Micah Brodsky, Natan Cliffer, M. Frans
Kaashoek, Eddie Kohler, and Robert Morris. Information flow control for stan-
dard os abstractions. In Proceedings of the 21st ACM Symposium on Operating
System Principles (SOSP). ACM, 2007. Cited on page 118.
Bibliography 129
[91] David Larochelle and David Evans. Statically detecting likely buffer overflow
vulnerabilities. In Proceedings of the 10th USENIX Security Symposium. USENIX
Association, 2001. Cited on page 108.
[92] Chris Lattner and Vikram Adve. LLVM: A compilation framework for lifelong
program analysis & transformation. In Proceedings of the international sympo-
sium on Code generation and optimization: feedback-directed and runtime optimization
(CGO). IEEE Computer Society, 2004. Cited on page 120.
[93] Ben Leslie, Peter Chubb, Nicholas Fitzroy-Dale, Stefan Götz, Charles Gray,
Luke Macpherson, Daniel Potts, Yue-Ting Shen, Kevin Elphinstone, and Gernot
Heiser. User-level device drivers: Achieved performance. Journal of Computer
Science and Technology (JCS&T), 20(5), 2005. Cited on page 117.
[94] Joshua LeVasseur, Volkmar Uhlig, Jan Stoess, and Stefan Götz. Unmodified
device driver reuse and improved system dependability via virtual machines.
In Proceedings of the 6th USENIX Symposium on Operating Systems Design and
Implementation (OSDI). USENIX Association, 2004. Cited on page 117.
[95] Vitaliy B. Lvin, Gene Novark, Emery D. Berger, and Benjamin G. Zorn.
Archipelago: Trading address space for reliability and security. In Proceedings
of the 13th International Conference on Architectural Support for Programming Lan-
guages and Operating Systems (ASPLOS). ACM, 2008. Cited on page 115.
[96] Jean-Philippe Martin, Michael Hicks, Manuel Costa, Periklis Akritidis, and
Miguel Castro. Dynamically checking ownership policies in concurrent C/C++
programs. In Proceedings of the 37th ACM SIGPLAN-SIGACT Symposium on Prin-
ciples Of Programming Languages (POPL). ACM, 2010. Cited on page 21.
[97] Stephen McCamant and Greg Morrisett. Evaluating SFI for a CISC architec-
ture. In Proceedings of the 15th USENIX Security Symposium. USENIX Association,
2006. Cited on pages 16, 84, and 118.
[98] Microsoft. A detailed description of the Data Execution Prevention (DEP) fea-
ture in Windows XP Service Pack 2, Windows XP Tablet PC Edition 2005, and
Windows Server 2003. http://support.microsoft.com/kb/875352/EN-US/.
Cited on pages 16 and 117.
[102] David Moore, Vern Paxson, Stefan Savage, Colleen Shannon, Stuart Staniford,
and Nicholas Weaver. Inside the slammer worm. IEEE Security and Privacy
Magazine, 1(4), 2003. Cited on pages 23 and 79.
130 Bibliography
[103] Andrew C. Myers and Barbara Liskov. Protecting privacy using the decen-
tralized label model. ACM Transactions on Software Engineering and Methodology
(TOSEM), 9, 2000. Cited on page 118.
[104] Santosh Nagarakatte, Jianzhou Zhao, Milo Martin, and Steve Zdancewic. Soft-
Bound: Highly compatible and complete spatial memory safety for C. In Pro-
ceedings of the ACM SIGPLAN Conference on Programming Language Design and
Implementation (PLDI). ACM, 2009. Cited on pages 16, 37, 44, 50, 72, and 111.
[105] George C. Necula, Jeremy Condit, Matthew Harren, Scott McPeak, and Westley
Weimer. CCured: Type-safe retrofitting of legacy software. ACM Transactions on
Programming Languages and Systems (TOPLAS), 27(3), 2005. Cited on pages 16,
75, 76, and 109.
[106] George C. Necula and Peter Lee. Safe kernel extensions without run-time check-
ing. ACM SIGOPS Operating Systems Review, 30(SI), 1996. Cited on page 117.
[107] George C. Necula, Scott McPeak, and Westley Weimer. CCured: Type-
safe retrofitting of legacy code. In Proceedings of the 29th ACM SIGPLAN-
SIGACT Symposium on Principles Of Programming Languages (POPL). ACM, 2002.
Cited on pages 16, 108, and 109.
[109] James Newsome and Dawn Xiaodong Song. Dynamic taint analysis for auto-
matic detection, analysis, and signature generation of exploits on commodity
software. In Proceedings of the Network and Distributed System Security Symposium
(NDSS). The Internet Society, 2005. Cited on pages 28 and 113.
[110] Edmund B. Nightingale, Daniel Peek, Peter M. Chen, and Jason Flinn. Par-
allelizing security checks on commodity hardware. In Proceedings of the 13th
International Conference on Architectural Support for Programming Languages and
Operating Systems (ASPLOS). ACM, 2008. Cited on page 116.
[113] Elias Levy (Aleph One). Smashing the stack for fun and profit. Phrack, 7(49),
1996. http://phrack.com/issues.html?issue=49&id=14. Cited on page 23.
[114] Walter Oney. Programming the Microsoft Windows Driver Model. Microsoft
Press, second edition, 2002. Cited on pages 22 and 83.
Bibliography 131
[116] Hilarie Orman. The Morris worm: A fifteen-year perspective. IEEE Security and
Privacy Magazine, 1(5), 2003. Cited on page 23.
[117] Harish Patil and Charles N. Fischer. Efficient run-time monitoring using shadow
processing. In Proceedings of the 2nd International Workshop on Automated and
Algorithmic Debugging (AADEBUG), 1995. Cited on pages 110 and 116.
[118] Harish Patil and Charles N. Fischer. Low-cost, concurrent checking of pointer
and array accesses in C programs. Software—Practice and Experience, 27(1), 1997.
Cited on pages 110 and 116.
[119] Jonathan Pincus and Brandon Baker. Beyond stack smashing: Recent advances
in exploiting buffer overruns. IEEE Security and Privacy Magazine, 2(4), 2004.
Cited on pages 23 and 117.
[120] Georgios Portokalidis and Herbert Bos. Eudaemon: Involuntary and on-
demand emulation against zero-day exploits. In Proceedings of the 3rd ACM
SIGOPS/EuroSys European Conference on Computer Systems (EuroSys). ACM, 2008.
Cited on page 113.
[121] Georgios Portokalidis, Asia Slowinska, and Herbert Bos. Argos: An emu-
lator for fingerprinting zero-day attacks for advertised honeypots with auto-
matic signature generation. ACM SIGOPS Operating Systems Review, 40(4), 2006.
Cited on page 113.
[122] Niels Provos, Markus Friedl, and Peter Honeyman. Preventing privilege escala-
tion. In Proceedings of the 12th USENIX Security Symposium. USENIX Association,
2003. Cited on page 114.
[123] Feng Qin, Shan Lu, and Yuanyuan Zhou. SafeMem: Exploiting ECC-memory
for detecting memory leaks and memory corruption during production runs.
In Proceedings of the 11th International Symposium on High-Performance Computer
Architecture (HPCA). IEEE Computer Society, 2005. Cited on page 117.
[124] Feng Qin, Cheng Wang, Zhenmin Li, Ho seop Kim, Yuanyuan Zhou, and
Youfeng Wu. LIFT: A low-overhead practical information flow tracking sys-
tem for detecting security attacks. In Proceedings of the 39th Annual IEEE/ACM
International Symposium on Microarchitecture (MICRO). IEEE Computer Society,
2006. Cited on page 113.
[125] Paruj Ratanaworabhan, Benjamin Livshits, and Benjamin Zorn. Nozzle: A de-
fense against heap-spraying code injection attacks. In Proceedings of the 18th
USENIX Security Symposium, 2009. Cited on page 26.
[128] William Robertson, Christopher Kruegel, Darren Mutz, and Fredrik Valeur.
Run-time detection of heap-based overflows. In Proceedings of the 17th USENIX
Large Installation Systems Administration Conference (LISA). USENIX Association,
2003. Cited on pages 27 and 108.
[129] Giampaolo Fresi Roglia, Lorenzo Martignoni, Roberto Paleari, and Danilo Br-
uschi. Surgically returning to randomized lib(c). In Proceedings of the 25th An-
nual Computer Security Applications Conference (ACSAC). IEEE Computer Society,
2009. Cited on page 117.
[130] Olatunji Ruwase and Monica S. Lam. A practical dynamic buffer overflow de-
tector. In Proceedings of the Network and Distributed System Security Symposium
(NDSS). The Internet Society, 2004. Cited on pages 16, 32, 33, 34, 36, 37, 38, 44,
45, 48, 50, 72, and 110.
[131] Babak Salamat, Todd Jackson, Andreas Gal, and Michael Franz. Orchestra:
Intrusion detection using parallel execution and monitoring of program variants
in user-space. In Proceedings of the 4th ACM SIGOPS/EuroSys European Conference
on Computer Systems (EuroSys). ACM, 2009. Cited on page 116.
[133] Margo I. Seltzer, Yasuhiro Endo, Christopher Small, and Keith A. Smith. Dealing
with disaster: surviving misbehaved kernel extensions. In Proceedings of the
2nd USENIX Symposium on Operating Systems Design and Implementation (OSDI),
1996. Cited on page 101.
[134] Hovav Shacham. The geometry of innocent flesh on the bone: return-into-libc
without function calls (on the x86). In Proceedings of the 14th ACM Conference
on Computer and Communications Security (CCS). ACM, 2007. Cited on pages 27
and 117.
[135] Hovav Shacham, Matthew Page, Ben Pfaff, Eu-Jin Goh, Nagendra Modadugu,
and Dan Boneh. On the effectiveness of address-space randomization. In Pro-
ceedings of the 11th ACM Conference on Computer and Communications Security
(CCS). ACM, 2004. Cited on pages 28 and 115.
[136] Umesh Shankar, Kunal Talwar, Jeffrey S. Foster, and David Wagner. Detect-
ing format string vulnerabilities with type qualifiers. In Proceedings of the 10th
USENIX Security Symposium. USENIX Association, 2001. Cited on pages 72
and 113.
[137] SkyLined. Internet Explorer IFRAME src&name parameter BoF remote compro-
mise. http://skypher.com/wiki/index.php/Www.edup.tudelft.nl/~bjwever/
advisory_iframe.html.php, 2004. Cited on page 23.
Bibliography 133
[138] Asia Slowinska and Herbert Bos. Pointless tainting?: Evaluating the practicality
of pointer tainting. In Proceedings of the 4th ACM SIGOPS/EuroSys European
Conference on Computer Systems (EuroSys). ACM, 2009. Cited on page 114.
[139] Ana Nora Sovarel, David Evans, and Nathanael Paul. Where’s the FEEB? the
effectiveness of instruction set randomization. In Proceedings of the 14th USENIX
Security Symposium. USENIX Association, 2005. Cited on page 115.
[140] Manu Sridharan and Stephen J. Fink. The complexity of Andersen’s analy-
sis in practice. In Static Analysis Symposium/Workshop on Static Analysis, 2009.
Cited on page 65.
[141] Joseph L. Steffen. Adding run-time checking to the portable C compiler.
Software—Practice and Experience, 22(4), 1992. Cited on page 110.
[142] Raoul Strackx, Yves Younan, Pieter Philippaerts, Frank Piessens, Sven Lach-
mund, and Thomas Walter. Breaking the memory secrecy assumption. In Pro-
ceedings of the Second European Workshop on System Security (EuroSec). ACM, 2009.
Cited on pages 28 and 116.
[143] Robert E. Strom and Shaula Yemini. Typestate: A programming language con-
cept for enhancing software reliability. IEEE Transactions on Software Engineering
(TSE), 12(1), 1986. Cited on page 90.
[144] Jeremy Sugerman, Ganesh Venkitachalam, and Beng-Hong Lim. Virtualizing
I/O devices on VMware Workstation’s hosted virtual machine monitor. In Pro-
ceedings of the USENIX Annual Technical Conference. USENIX Association, 2001.
Cited on page 117.
[145] Mark Sullivan and Ram Chillarege. Software defects and their impact on system
availability—a study of field failures in operating systems. In Proceedings of the
21st Annual International Symposium on Fault-Tolerant Computing (FTCS). IEEE
Computer Society, 1991. Cited on page 100.
[146] Nikhil Swamy, Michael Hicks, Greg Morrisett, Dan Grossman, and Trevor Jim.
Safe manual memory management in Cyclone. Science of Computer Programming
(SCP), 62(2), 2006. Cited on page 109.
[147] Michael M. Swift, Muthukaruppan Annamalai, Brian N. Bershad, and Henry M.
Levy. Recovering device drivers. In Proceedings of the 6th USENIX Symposium on
Operating Systems Design and Implementation (OSDI). USENIX Association, 2004.
Cited on pages 84, 85, and 117.
[148] Michael M. Swift, Muthukaruppan Annamalai, Brian N. Bershad, and Henry M.
Levy. Recovering device drivers. ACM Transactions on Computer Systems (TOCS),
24(4), 2006. Cited on page 117.
[149] Michael M. Swift, Brian N. Bershad, and Henry M. Levy. Improving the relia-
bility of commodity operating systems. In Proceedings of the 19th ACM Sympo-
sium on Operating System Principles (SOSP). ACM, 2003. Cited on pages 84, 85,
and 117.
134 Bibliography
[150] Michael M. Swift, Brian N. Bershad, and Henry M. Levy. Improving the relia-
bility of commodity operating systems. ACM Transactions on Computer Systems
(TOCS), 23(1), 2005. Cited on page 103.
[152] The Apache Software Foundation. The Apache HTTP Server Project. http:
//httpd.apache.org. Cited on page 50.
[155] The MITRE Corporation. Multiple buffer overflows in libpng 1.2.5. CVE-2004-
0597, 2004. Cited on page 80.
[158] Nathan Tuck, Brad Calder, and George Varghese. Hardware and binary modi-
fication support for code pointer protection from buffer overflow. In Proceedings
of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (MI-
CRO). IEEE Computer Society, 2004. Cited on page 116.
[159] Úlfar Erlingsson, Silicon Valley, Martı́n Abadi, Michael Vrable, Mihai Budiu,
and George C. Necula. XFI: Software guards for system address spaces. In
Proceedings of the 7th USENIX Symposium on Operating Systems Design and Imple-
mentation (OSDI). USENIX Association, 2006. Cited on pages 16, 84, 85, 104,
and 118.
[161] D. Wagner and R. Dean. Intrusion detection via static analysis. In Proceedings of
the 2001 IEEE Symposium on Security and Privacy. IEEE Computer Society, 2001.
Cited on page 114.
[162] David Wagner, Jeffrey S. Foster, Eric A. Brewer, and Alexander Aiken. A first
step towards automated detection of buffer overrun vulnerabilities. In Proceed-
ings of the Network and Distributed System Security Symposium (NDSS). The Inter-
net Society, 2000. Cited on page 108.
[163] Robert Wahbe, Steven Lucco, Thomas E. Anderson, and Susan L. Graham. Effi-
cient software-based fault isolation. In Proceedings of the 14th ACM Symposium on
Operating System Principles (SOSP). ACM, 1993. Cited on pages 16, 84, and 118.
Bibliography 135
[164] Yoav Weiss and Elena Gabriela Barrantes. Known/chosen key attacks against
software instruction set randomization. In Proceedings of the 22nd Annual Com-
puter Security Applications Conference (ACSAC). IEEE Computer Society, 2006.
Cited on page 115.
[165] John Wilander and Mariam Kamkar. A comparison of publicly available tools
for static intrusion prevention. In Proceedings of the 7th Nordic Workshop on Secure
IT Systems (NordSec), 2002. Cited on page 108.
[167] Dan Williams, Patrick Reynolds, Kevin Walsh, Emin Gün Sirer, and Fred Schnei-
der. Device driver safety through a reference validation mechanism. In Proceed-
ings of the 8th USENIX Symposium on Operating Systems Design and Implementation
(OSDI). USENIX Association, 2008. Cited on pages 101, 103, and 117.
[168] Emmett Witchel, Junghwan Rhee, and Krste Asanović. Mondrix: Memory
isolation for Linux using Mondriaan memory protection. In Proceedings of
the 20th ACM Symposium on Operating System Principles (SOSP). ACM, 2005.
Cited on page 117.
[169] Wei Xu, Sandeep Bhatkar, and R. Sekar. Taint-enhanced policy enforcement: A
practical approach to defeat a wide range of attacks. In Proceedings of the 15th
USENIX Security Symposium. USENIX Association, 2006. Cited on page 113.
[170] Wei Xu, Daniel C. DuVarney, and R. Sekar. An efficient and backwards-
compatible transformation to ensure memory safety of C programs. ACM SIG-
SOFT Software Engineering Notes (SEN), 29(6), 2004. Cited on pages 16, 27, 44,
45, 48, 50, 58, 72, 76, and 111.
[171] Suan Hsi Yong and Susan Horwitz. Protecting C programs from attacks via
invalid pointer dereferences. ACM SIGSOFT Software Engineering Notes (SEN),
28(5), 2003. Cited on pages 109 and 112.
[172] Suan Hsi Yong and Susan Horwitz. Pointer-range analysis. In Proceedings of the
11th International Static Analysis Symposium (SAS), volume 3148 of Lecture Notes
in Computer Science. Springer-Verlag, 2004. Cited on page 38.
[173] Nickolai Zeldovich, Silas Boyd-Wickizer, Eddie Kohler, and David Mazières.
Making information flow explicit in HiStar. In Proceedings of the 7th USENIX
Symposium on Operating Systems Design and Implementation (OSDI). USENIX As-
sociation, 2006. Cited on page 118.
[174] Feng Zhou, Jeremy Condit, Zachary Anderson, Ilya Bagrak, Rob Ennals,
Matthew Harren, George Necula, and Eric Brewer. SafeDrive: Safe and re-
coverable extensions using language-based techniques. In Proceedings of the
136 Bibliography
[175] Pin Zhou, Wei Liu, Long Fei, Shan Lu, Feng Qin, Yuanyuan Zhou, Samuel
Midkiff, and Josep Torrellas. AccMon: Automatically detecting memory-related
bugs via program counter-based invariants. In Proceedings of the 37th Annual
IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE Com-
puter Society, 2004. Cited on page 116.
[176] Misha Zitser, Richard Lippmann, and Tim Leek. Testing static analysis tools
using exploitable buffer overflows from open source code. In Proceedings of the
12th ACM SIGSOFT International Symposium on Foundations of Software Engineer-
ing (FSE). ACM, 2004. Cited on page 108.
[178] Cliff Changchun Zou, Weibo Gong, and Don Towsley. Code Red worm propaga-
tion modeling and analysis. In Proceedings of the 9th ACM Conference on Computer
and Communications Security (CCS). ACM, 2002. Cited on page 23.