0% found this document useful (0 votes)

68 views

Foundational Proof-Carrying Code

The document discusses foundational proof-carrying code (PCC), which aims to verify the safety of machine language programs using proofs based on the foundations of mathematical logic rather than relying on a type system or other components that require trust. It describes how foundational PCC avoids problems of trust by defining semantics and safety properties directly in logic without additional axioms. The key components are defining semantics of machine instructions and safety policies in higher-order logic, and requiring code providers to give executable code along with a proof in this logic guaranteeing safety. This approach aims to minimize the trusted computing base to just the logical framework and definition of semantics, rather than other components like a type checker or verification condition generator.

Uploaded by

mmorsy1981

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views

Foundational Proof-Carrying Code

Uploaded by

mmorsy1981

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Foundational Proof-Carrying Code

Andrew W. Appel
Princeton University
Abstract
Proof-carrying code is a framework for the mechanical verification of safety properties of machine language
programs, but the problem arises of quis custodiat ipsos custodeswho will verify the verifier itself? Foundational proof-carrying code is verification from the smallest possible set of axioms, using the simplest possible verifier and the smallest possible runtime system. I will describe many of the mathematical and engineering problems to be solved in the construction of a foundational
proof-carrying code system.

1 Introduction
When you obtain a piece of software a shrinkwrapped application, a browser plugin, an applet, an OS
kernel extension you might like to ascertain that its safe
to execute: it accesses only its own memory and respects
the private variables of the API to which its linked. In a
Java system, for example, the byte-code verifier can make
such a guarantee, but only if theres no bug in the verifier
itself, or in the just-in-time compiler, or the garbage collector, or other parts of the Java virtual machine (JVM).
If a compiler can produce Typed Assembly Language
(TAL) [14], then just by type-checking the low-level representation of the program we can guarantee safety but
only if theres no bug in the typing rules, or in the typechecker, or in the assembler that translates TAL to machine language. Fortunately, these components are significantly smaller and simpler than a Java JIT and JVM.
Proof-carrying code (PCC) [15] constructs and verifies
a mathematical proof about the machine-language program itself, and this guarantees safety but only if theres
no bug in the verification-condition generator, or in the
logical axioms, or the typing rules, or the proof-checker.
This research was supported in part by DARPA award F30602-991-0519 and by National Science Foundation grant CCR-9974553.

To appear in LICS 01, 16th Annual IEEE Symposium on Logic in Computer Science, Jume 16, 2001.

What is the minimum possible size of the components

that must be trusted in a PCC system? This is like asking, what is the minimum set of axioms necessary to
prove a particular theorem? A foundational proof is one
from just the foundations of mathematical logic, without
additional axioms and assumptions; foundational proofcarrying code is PCC with trusted components an order
of magnitude smaller than previous PCC systems.
Conventional proof-carrying code. Necula [15]
showed how to specify and verify safety properties of
machine-language programs to ensure that an untrusted
program does no harm does not access unauthorized
resources, read private data, or overwrite valuable data.
The provider of a PCC program must provide both the
executable code and a machine-checkable proof that
this code does not violate the safety policy of the host
computer. The host computer does not run the given code
until it has verified the given proof that the code is safe.
In most current approaches to PCC and TAL [15, 14],
the machine-checkable proofs are written in a logic with
a built-in understanding of a particular type system. More
formally, type constructors appear as primitives of the
logic and certain lemmas about these type constructors
are built into the verification system. The semantics of
the type constructors and the validity of the lemmas concerning them are proved rigorously but without mechnical
verification by the designers of the PCC verification system. We will call this type-specialized PCC.
A PCC system must understand not only the language
of types, but also the machine language for a particular
machine. Neculas PCC systems [15, 7] use a verificationcondition generator (VCgen) to derive, for each program,
a verification condition a logical formula that if true
guarantees the safety of the program. The code producer
must prove, and the code consumer must check the proof
of, the verification condition. (Both producer and consumer independently run the VCgen to derive the right
formula for the given program.)
The VCgen is a fairly large program (23,000 lines of C
in the Cedilla Systems implementation [7]) that examines

2 Choice of logic and framework

the machine instructions of the program, expands the substitutions of its machine-code Hoare logic, examines the
formal parameter declarations to derive function preconditions, and examines result declarations to derive postconditions. A bug in the VCgen will lead to the wrong
formula being proved and checked.
The soundness of a PCC systems typing rules and
VCgen can, in principle, be proved as a metatheorem. Human-checked proofs of type systems are almost
tractable; the appendices of Neculas thesis [16] and Morrisett et al.s paper [14] contain such proofs, if not of the
actual type systems used in PCC systems, then of their
simplified abstractions. But constructing a mechanicallycheckable correctness proof of a full VCgen would be a
daunting task.

To do machine-checked proofs, one must first choose

a logic and a logical framework in which to manipulate
the logic. The logic that we use is Churchs higher-order
logic with axioms for arithmetic; we represent our logic,
and check proofs, in the LF metalogic [10] implemented
in the Twelf logical framework [18]. We have chosen LF
because it naturally produces proof objects that we can
send to a consumer.
The Twelf system allows us to specify constructors of
our object logic. Our object logic has types tp; its primitive types are propositions o and numbers num; there is
an arrow constructor to build function types, and pair
to build tuples. For any object-logic type T , object-logic
expressions of that type have metalogical type tm T . Finally, for any formula A we can talk about proofs of A,
which belong to the metalogical type pf(A).

Foundational PCC. Unlike type-specialized PCC, the

foundational PCC described by Appel and Felty [3]
avoids any commitment to a particular type system and
avoids using a VC generator. In foundational PCC the operational semantics of the machine code is defined in a
logic that is suitably expressive to serve as a foundation
of mathematics. We use higher-order logic with a few axioms of arithmetic, from which it is possible to build up
most of modern mathematics. The operational semantics
of machine instructions [12] and safety policies [2] are
easily defined in higher-order logic. In foundational PCC
the code provider must give both the executable code plus
a proof in the foundational logic that the code satisfies
the consumers safety policy. The proof must explicitly
define, down to the foundations of mathematics, all required concepts and explicitly prove any needed properties of these concepts.
Foundational PCC has two main advantages over typespecialized PCC it is more flexible and more secure.
Foundational PCC is more flexible because the code producer can explain a novel type system or safety argument to the code consumer. It is more secure because the
trusted base can be smaller: its trusted base consists only
of the foundational verification system together with the
definition of the machine instruction semantics and the
safety policy. A verification system for higher-order logic
can be made quite small [10, 17].
In our research project at Princeton University (with
the help of many colleages elsewhere) we are building
a foundational PCC system, so that we can specify and
automatically prove and check the safety of machinelanguage programs. In this paper I will explain the components of the system.

tp
: type.
tm
: tp -> type.
o: tp.
num: tp.
arrow: tp -> tp -> tp.
%infix right 14 arrow.
pair: tp -> tp -> tp.
pf
: tm o -> type.

We have object-logic constructors lam (to construct

functions), @ (to apply a function to an argument, written
infix), imp (logical implication), and forall (universal
quantification):
lam: (tm T1 -> tm T2) -> tm (T1 arrow T2).
@ : tm (T1 arrow T2) -> tm T1 -> tm T2.
%infix left 20 @.
imp
: tm o -> tm o -> tm o.
%infix right 10 imp.
forall : (tm T -> tm o) -> tm o.

The trick of using lam and @ to coerce between metalogical functions tm T1 -> tm T2 and object-logic
functions tm (T1 arrow T2) is described by Harper,
Honsell, and Plotkin [10]. We need object-logic functions
so that we can quantify over them using forall; that is,
the type of F in forall [F] predicate(F) must
be tm T for some T such as num arrow num, but cannot be tm T1 -> tm T2.
We have introduction and elimination rules for these
constructors (rules for pairing omitted here):
beta_e: {P: tm T -> tm o}
pf(P (lam F @ X)) -> pf(P (F X)).
beta_i: {P: tm T -> tm o}
pf(P (F X)) -> pf(P (lam F @ X)).
imp_i: (pf A -> pf B) -> pf (A imp B).
imp_e: pf (A imp B) -> pf A -> pf B.

forall_i:
({X:tm T}pf(A X)) -> pf(forall A).
forall_e:
pf(forall A) -> {X:tm T}pf(A X).
not_not_e: pf ((B imp forall [A] A)
imp forall [A] A)
-> pf B.

0:
1:

Our proofs dont need extensionality or the general axiom

of choice.
Once we have defined the constructors of the logic,
we can define lemmas and new operators as definitions
in Twelf:

r
r0
r1
..
.

31:
32:

r31
fp0
..
.

63:
64:
65:

fp31
cc

m
0:
1:
2:

..
.

unused
..
.

and : tm o -> tm o -> tm o =

[A][B]
forall [C] (A imp B imp C) imp C.

A single step of the machine is the execution of one instruction. We can specify instruction execution by giving
a step relation (r, m) 7 (r0, m0 ) that describes the relation
between the prior state (r, m) and the state (r0 , m0 ) of the
machine after execution.
For example, to describe the instruction r1 r2 + r3
we might start by writing,

%infix right 12 and.

and_i
: pf A -> pf B -> pf (A and B) =
[p1: pf A][p2: pf B]
forall_i [c: tm o]
imp_i [p3] imp_e (imp_e p3 p1) p2.

(r, m) 7 (r0, m0 )
r0 (1) = r(2) + r(3) (x 6= 1. r0 (x) = r(x)) m0 = m

and_e1 : pf (A and B) -> pf A =

[p1: pf (A and B)]
imp_e (forall_e p1 A)
(imp_i [p2: pf A] imp_i [p3: pf B] p2).

In fact, we can define add(i, j, k) as this predicate on

four arguments (r, m, r0 , m0 ):
add(i, j, k) =
r, m, r0 , m0 . r0 (i) = r( j) + r(k)
(x 6= i. r0 (x) = r(x))
m0 = m

Of course, the defined lemmas are checked by machine

(the Twelf type checker), and need not be trusted in the
same way that the core inference rules are. Our interactive
tutorial [1] provides an informal introduction to our object
logic.

Similarly, we can define the instruction ri m[r j + c]

as
load(i, j, c) =
r, m, r0 , m0 . r0 (i) = m(r( j) + c)
(x 6= i. r0 (x) = r(x)) m0 = m

3 Specifying machine instructions

But we must also take account of instruction fetch and

decoding. Suppose, for example, that the add instruction
is encoded as a 32-bit word, containing a 6-bit field with
opcode 3 denoting add, a 5-bit field denoting the destination register i, and 5-bit fields denoting the source registers j, k:
3
i
j
k

We start by modeling a specific von Neumann machine, such as the Sparc or the Pentium. A machine state
comprises a register bank and a memory, each of which
is a function from integers (addresses) to integers (contents). Every register of the instruction-set architecture
(ISA) must be assigned a number in the register bank: the
general registers, the floating-point registers, the condition codes, and the program counter. Where the ISA does
not specify a number (such as for the PC) we use an arbitrary index:

The load instruction might be encoded as,

12
i
j
c
26

Then we can say that some number w decodes to an

instruction instr iff,
3

the program counter r(PC ) points to an illegal instruction.

Now we will proceed to make it even more partial, by
defining as illegal those instructions that violate our safety
policy.
For example, suppose we wish to specify a safety policy that only readable addresses will be loaded, where
the predicate readable is given some suitable definion
such as
readable(x) = 0 x < 1000

decode(w, instr)
(i, j, k.
0 i < 25 0 j < 25 0 k < 25
w = 3 226 + i 221 + j 216 + k 20
instr = add(i, j, k))
(i, j, c.
0 i < 25 0 j < 25 0 c < 216
w = 12 226 + i 221 + j 216 + c 20
instr = load(i, j, sign-extend(c)))
...

(see Appel and Felten [2] for descriptions of security policies that are more interesting than this one).
We can add a new conjunct to the semantics of the load
instruction,

with the ellipsis denoting the many other instructions of

the machine, which must also be specified in this formula.
Neophytos Michael and I have shown [12] how to scale
this idea up to the instruction set of a real machine. Real
machines have large but semiregular instruction sets; instead of a single global disjunction, the decode relation
can be factored into operands, addressing modes, and so
on. Real machines dont use integer arithmetic, they use
modular arithmetic, which can itself be specified in our
higher-order logic. Some real machines have multiple
program counters (e.g., Sparc) or variable-length instructions (e.g., Pentium), and these can also be accommodated.
Our description of the decode relation is heavily factored by higher-order predicates (this would not be possible without higher-order logic). We have specified the
execution behavior of a large subset of the Sparc architecture (without register windows or floating-point). For
PCC, it is sufficient to specify a subset of the machine architecture; any unspecified instruction will be treated by
the safety policy as illegal, which may be inconvenient for
compilers that want to generate that instruction, but which
cannot compromise safety.
Our Sparc specification has two components, a syntactic part (the decode relation) and a semantic part (the
definitions of add, load, etc.). The syntactic part is derived from a 151-line specification written in the SLED
language of the New Jersey Machine-Code Toolkit [19];
our translator expands this to 1035 lines of higher-order
logic, as represented in Twelf; but we believe that a more
concise and readable translation would produce only 500
600 lines. The semantic part is about 600 lines of logic,
including the definition of modular arithmetic.

load(i, j, c) =
r, m, r0 , m0 . r0 (i) = m(r( j) + c)
readable(r( j) + c)
(x 6= i. r0 (x) = r(x)) m0 = m.
Now, in a machine state where the program counter points
to a load instruction that violates the safety policy, our
step relation 7 does not relate this state to any successor state (even though the real machine knows how to
execute it).
Using this partial step relation, we can define safety; a
given state is safe if, for any state reachable in the Kleene
closure of the step relation, there is a successor state:
safe-state(r, m) =
r0 , m0 . (r, m 7 r0 , m0 ) r00 , m00 . r0 , m0 7 r00 , m00
A program is just a sequence of integers (representing
machine instructions); we say that a program p is loaded
at a location start in memory m if
loaded(p, m, start) = i dom(p). m(i + start) = p(i)
Finally (assuming that programs are written in
position-independent code), a program is safe if, no matter where we load it in memory, we get a safe state:
safe(p) =
r, m, start. loaded(p, m, start) r(PC ) = start
safe-state(r, m)
The important thing to notice about this formulation is
that there is no verification-condition generator. The syntax and semantics of machine instructions, implicit in a
VCgen, have been made explicit and much more concise in the step relation. But the Hoare logic of machine
instructions and typing rules for function parameters, also
implicit in a VCgen, must now be proved as lemmas
about which more later.

4 Specifying safety
Our step relation (r, m) 7 (r0 , m0 ) is deliberately partial; some states have no successor state. In these states
4

5 Proving safety

trust the assembler.

In a sufficiently expressive logic, as we all know, proving theorems can be a great deal more difficult than
merely stating them and higher-order logic is certainly
expressive. For guidance in proving safety of machinelanguage programs we should not particularly look to previous work in formal verification of program correctness.
Instead, we should think more of type checking: automatic proofs of decidable safety properties of programs.
The key advances that makes it possible to generate
proofs automatically are typed intermediate languages
[11] and typed assembly language [14]. Whereas conventional compilers type-check the source program, then
throw away the types (using the lambda-calculus principle
of erasure) and then transform the program through progressively lower-level intermediate representations until
they reach assembly language and then machine language, a type-preserving compiler uses typed intermediate languages at each level. If the program type-checks
at a low level, then it is safe, regardless of whether the
previous (higher-level) compiler phases might be buggy
on some inputs. As the program is analyzed into smaller
pieces at the lower levels, the type systems become progressively more complex, but the type theory of the
1990s is up to the job of engineering the type systems.

Typing rules for machine language. In important insight in the development of PCC is that one can write
type-inference rules for machine language and machine
states. For example, Necula [15] used rules such as

source code
Compiler
Front-end
IR (or byte codes)

typecheck

Optimizer
lower-level IR
Code
Generator
assembly-level IR
Register
Allocator
native machine code

Conventional Compiler

source code
Compiler
Front-end
IR (or byte codes)
Type-preserving
Optimizer
typed lower-level IR
Type-preserving
Code Generator
typed assembly lang.
Type-preserving
Reg. Allocator
proof-carrying
native machine code

m ` x : 1 2
m ` m(x) : 1 m(x + 1) : 2
meaning that if x has type 1 2 in memory m meaning
that it is a pointer to a boxed pair then the contents of
location x will have type 1 and the contents of location
x + 1 will have type 2 .
Proofs of safety in PCC use the local induction hypotheses at each point in the program to prove that the
program is typable. This implies, by a type-soundness argument, that the program is therefore safe.
If the type system is given by syntactic inference rules,
the proof of type soundness is typically done by syntactic subject reduction one proves that each step of computation preserves typability and that typable states are
safe. The proof involves structural induction over typing
derivations. In conventional PCC, this proof is done in the
metatheory, by humans.
In foundational PCC we wish to include the typesoundness proof inside the proof that is transmitted to
the code consumer because (1) its more secure to avoid
reliance on human-checked proofs and (2) that way we
avoid restricting the protocol to a single type system. But
in order to do a foundational subject-reduction theorem,
we would need to build up the mathematical machinery to
manipulate typing derivations as syntactic objects, all represented inside our logic using foundational mathematical
concepts sets, pairs, and functions. We would need to
do case analyses over the different ways that a given type
judgement might be derived. While this can all be done,
we take a different approach to proving that typability implies safety.
We take a semantic approach. In a semantic proof one
assigns a meaning (a semantic truth value) to type judgements. One then proves that if a type judgement is true
then the typed machine state is safe. One further proves
that the type inference rules are sound, i.e., if the premises
are true then the conclusion is true. This ensures that
derivable type judgements are true and hence typable machine states are safe.
The semantic approach avoids formalizing syntactic
type expressions. Instead, one formalizes a type as a set
of semantic values. One defines the operator as a function taking two sets as arguments and returning a set. The

typecheck
typecheck
typecheck
proof
check

Type-preserving Compiler

TAL was originally designed to be used in a certifying compiler, but one that certifies the assembly code and
uses a trusted assembler to translate to machine code. But
we can use TAL to help generate proofs in a PCC system
that directly verifies the machine code. In such a system,
the proofs are typically by induction, with induction hypotheses such as, whenever the program-counter reaches
location l, the register 3 will be a pointer to a pair of integers. These local invariants can be generated from the
TAL formulation of the program, but in a PCC system
they can be checked in machine code without needing to
5

above type inference rule for pair projection can then be

replaced by the following semantic lemma in the foundational proof:

To represent a function value, we let x be the entry address of the function; here is the function f (x) = x + 1,
assuming that arguments and return results are passed in
register 1:

|= x :m 1 2

|= m(x) :m 1

m(x + 1) :m 2

200

Although the two forms of the application typeinference rule look very similar they are actually significantly different. In the second rule 1 and 2 range over
semantic sets rather than type expressions. The relation
|= in the second version is defined directly in terms of a
semantics for assertions of the form x :m . The second
rule is actually a lemma to be proved while the first rule
is simply a part of the definition of the syntactic relation
`. For the purposes of foundational PCC, we view the semantic proofs as preferable to syntactic subject-reduction
proofs because they lead to shorter and more manageable
foundational proofs. The semantic approach avoids the
need for any formalization of type expressions and avoids
the formalization of proofs or derivations of type judgements involving type expressions.

5.1

109

r1 := r1+1

201

4070

jump(r7)

datatype a list = nil

| :: of a * a list

we could not handle recursions where the type being defined occurs in a negative (contravariant) position, as in
datatype exp = APP of exp * exp
| LAM of exp -> exp

where the boxed occurrence of exp is a negative occurrence. Contravariant recursion is occasionally useful in
ML, but it is the very essence of object-oriented programming, so these limitations (no mutable fields, no contravariant recursion) are quite restrictive.

5.2

Indexed model of recursive types

In more recent work, David McAllester and I have

shown how to make an indexed semantic model that can
describe contravariant recursive types [4]. Instead of saying that a type is a set of values, we say that it is a set of
pairs hk, vi where k is an approximation index and v is a
value. The judgement hk, vi means, v approximately
has type , and any program that runs for fewer than k instructions cant tell the difference. The indices k allow
the construction of a well founded recursion, even when
modeling contravariant recursive types.
The type system works both for von Neumann machines and for -calculus; here I will illustrate the latter.
We define a type as a set of pairs hk, vi where k is a nonnegative integer and v is a value and where the set is

m
108

1111

Limitations. In the resulting semantics [3] we could

model heap allocation, but we could not model mutable
record-fields; and though our type system could describe

Building semantic models for type systems is interesting and nontrivial. In a first attempt, Amy Felty and
I [3] were able to model a pure-functional (immutable
datatypes) call-by-value language with records, address
arithmetic, polymorphism and abstract types, union and
intersection types, continuations and function pointers,
and covariant recursive types.
Our simplest semantics is set-theoretic: a type is a set
of values. But what is a value? It is not a syntactic construct, as in lambda-calculus; on a von Neumann machine
we wish to use a more natural representation of values that
corresponds to the way procedures and data structures are
represented in practice. This way, our type theory can
match reality without a layer of simulation in between.
We can represent a value as a pair (m, x), where m is a
memory and x is an integer (typically representing an address).
To represent a pointer data structure that occupies a
certain portion of the machines memory, we let x be the
root address of that structure. For example, the boxed pair
of integers h5, 7i represented at address 108 would be represented as the value ({108 7 5, 109 7 7}, 108).
108

200

This model of values would be sufficient in a semantics

of statically allocated data structures, but to have dynamic
heap allocation we must be able to indicate the set a of
allocated addresses, such that any modification of memory outside the allocated set will not disturb already allocated values. A state is a pair (a, m), and a value is a
pair ((a, m), x) of state and root-pointer. The allocset a
is virtual: it is not directly represented at run time, but is
existentially quantified.

Semantic models of types

such that if hk, vi and 0 j k then h j, vi . For

any closed expression e and type we write e :k if e is
safe for k steps and if whenever e 7 j v for some value v
with j < k we have hk j, vi ; that is,

Finally, for any well founded type-constructor F, we have

equirecursive types: F = F(F).
Our paper [4] proves all these theorems and shows the
extension of the result to types and values on von Neumann machines.

e :k je0 . 0 j < k e 7 j e0 nf(e0 )

hk j, e0 i

5.3

where nf(e0 ) means that e0 is a normal form has no successor in the call-by-value small-step evaluation relation.
We start with definitions for the sets that represent the
types:

>
int
1 2

Our work on mutable fields is still in a preliminary

stage. Amal Ahmed, Roberto Virga, and I are investigating the following idea. Our semantics of immutable fields
viewed a state as a pair (a, m) of a memory m and a set
a of allocated addresses. To allow for the update of existing values, we enhance a to become a finite map from
locations to types. The type a(l) at some location l specifies what kinds of updates at that location will preserve all
existing typing judgements. Then, as before, a type is a
predicate on states (a, m) and root-pointers x of type integer. In our object logic, we would write the types of these
logical objects as,

{}
{hk, vi | k 0}
{hk, 0i , hk, 1i , . . . | k 0}
{hk, (v1 , v2 )i | j < k. h j, v1 i 1 h j, v2 i 2 }
{hk, x.ei | j < k v. h j, vi e[v/x] : j }
{hk, vi | hk, vi F k+1 ()}

Next we define what is meant by a typing judgement.

Given a mapping from variables to types, we write
|=k e : to mean that

fin

allocset = num type

value = allocset memory num
type = num value o

. :k (e) :k
where (e) is the result of replacing the free variables in e
with their values under substitution . To drop the index
k, we define

The astute reader will notice that the metalogical type of

type is recursive, and in a way that has an inconsistent
cardinality: the set of types must be bigger than itself.
This problem had us stumped for over a year, but we now
have a tentative solution that replaces the type (in the allocset) with the Godel number of a type. We hope to report on this result soon; we are delayed by our general
practice of machine-checking our proofs in Twelf before
submitting papers for publication, which in this case has
saved us from some embarrassment.

|= e : k. |=k e :
Soundness theorem: It is trivial to prove from these
definitions that if |=e : and e
7 e0 then e0 is not
stuck, that is, e0 7 e00 .
Well founded type constructors. We define the notion
of a well founded type constructor. Here I will not give
the formal definition, but state the informal property that
if F is well founded and x : F(), then to extract from x
a value of type , or to apply x to a value of type , must
take at least one execution step. The constructors and
are well founded.

5.4

Typed machine language

Morrisetts typed assembly language [14] is at too high

a level to do proof-carrying code directly. Kedar Swadi,
Gang Tan, Roberto Virga, and I have been designing
a lower-level representation, called typed machine language, that will serve as the interface between compilers
and our prover. In fact, we hope that a clean enough definition of this language will shift most of the work from
the prover to the compilers type-checker.
In order to avoid overspecializing the typed machine
language (TML) with language-specific constructs such
as records and disjoint-union variants, our TML will use
very low-level typing primitives such as union types, intersection types, offset (address-arithmetic) types, and de-

Typing rules. Proofs of theorems such as the following

are not too lengthy:
|= 2 (e) : 2
|= 1 (e) : 1
|= e : 1 2

Mutable fields

|= e : 1 2
|= 1 (e) : 1

|= e1 :
|= e2 :
|= e1 e2 :
7

pendent types. This will make type-checking of TML difficult; we will need to assume that each compiler will have
a source language with a decidable type system, and that
translation of terms (and types) will yield a witness to the
type-checking of the resultant TML representation.

Hoare logic. In reasoning about machine instructions at

a higher level of abstraction, notions from Hoare logic
are useful: preconditions, postconditions, and substition.
Without adding any new axioms, we can define a notion
of predicates on states to serve as preconditions and postconditions, and substitution as a relation on predicates.
But this can rapidly become inefficient, leading to proofs
that are quadratic or exponential in size. Kedar Swadi,
Roberto Virga, and I have taken some steps in lemmatizing substitution so that proofs dont blow up [5]; interesting related work has been done in Compaq SRCs
extended static checker [9].

Abstract machine instructions. One can view machine instructions at many levels of abstraction:
1. At the lowest level, an instruction is just an integer,
an opcode encoding.
2. At the next level, it implements a relation on raw machine states (r, m) 7 (r0 , m0 ).

Software engineering practices. We define all of these

abstraction levels in order to modularize our proofs. Since
our approach to PCC shifts most of the work to the human prover of static, machine-checkable lemmas about
the programming languages type system, we find it imperative to use the same software engineering practices in
implementing proofs as are used in building any large system. The three most important practices are (1) abstraction and modularity, (2) abstraction and modularity, and
(3) abstraction and modularity. At present, we have about
thirty thousand lines of machine-checked proofs, and we
would not be able to build and maintain the proofs without
a well designed modularization.

3. At a higher level, we can say that the Sparc add instruction implements a machine-independent notion
of add, and similarly for other instruction.
4. Then we can view add as manipulating not just registers, but local variables (which may be implemented
in registers or in the activation record).
5. We can view this instruction as one of various typed
instructions on typed values; in the usual view, add
has type int int int, but the address-arithmetic
add has type
(0 1 . . . n ) const(i) (i i+1 . . . n )

6 Pruning the runtime system

for any i.

Just as bugs in the compiler (of a conventional system)

or the proof checker (of a PCC system) can create security
holes, so can bugs in the runtime system: the garbage collector, debugger, marshaller/unmarshaller, and other components. An important part of research in Foundational
PCC is to move components from the runtime system to
the type-checkable user code. Then, any bugs in such
components will either be detected by type-checking (or
proof-checking), or will be type-safe bugs that may cause
incorrect behavior but not insecure behavior.
Garbage collectors do two strange things that have
made them difficult to express in a type-safe language:
they allocate and deallocate arenas of memory containing many objects of different types, and they traverse (and
copy) objects of arbitrary user-chosen types. Daniel Wang
has developed a solution to these problems [22], based on
the motto,

x
108
x+2
110

m
y0 : t0
y1 : t1
y2 : t2
y3 : t3

6. Finally, we can specialize this typed add to the particular context where some instance of it appears, for
example by instantiating the i, n, and i in the previous example.
Abstraction level 1 is used in the statement of the theorem
(safety of a machine-language program p). Abstraction
level 5 is implicitly used in conventional proof-carrying
code [15]. Our ongoing research involves finding semantic models for each of these levels, and then proving lemmas that can convert between assertions at the different
levels.

Garbage collection = Regions + Intensional types.

That is, the region calculus of Tofte and Talpin [20] can
8

be applied to the problem of garbage collection, as noticed in important recent work by Walker, Crary, and Morrisett [21]; to traverse objects of unknown type, the intensional type calculi of originally developed by Harper and
Morrisett [11] can be applied. Wangs work covers the
region operators and management of pointer sharing; related work by Monnier, Saha, and Shao [13] covers the
intensional type system.
Other potentially unsafe parts of the runtime system
are ad hoc implementations of polytypic functions those
that work by induction over the structure of data types
such as polymorphic equality testers, debuggers, and
marshallers (a.k.a. serializers or picklers). Juan Chen and
I have developed an implementation of polytypic primitives as a transformation on the typed intermediate representation in the SML/NJ compiler [6]. Like the R transformation of Crary and Weirich [8] it allows these polytypic functions to be typechecked, but unlike their calculus, ours does not require dependent types in the typed
intermediate language and is thus simpler to implement.

[4] Andrew W. Appel and David McAllester. An indexed

model of recursive types for foundational proof-carrying
code. Technical Report TR-629-00, Princeton University,
October 2000.
[5] Andrew W. Appel, Kedar N. Swadi, and Roberto Virga.
Efficient substitution in hoare logic expressions. In 4th International Workshop on Higher-Order Operational Techniques in Semantics (HOOTS 2000). Elsevier, September
2000. Electronic Notes in Theoretical Computer Science
41(3).
[6] Juan Chen and Andrew W. Appel. Dictionary passing for
polytypic polymorphism. Technical Report CS-TR-63501, Princeton University, March 2001.
[7] Christopher Colby, Peter Lee, George C. Necula, Fred
Blau, Ken Cline, and Mark Plesko. A certifying compiler
for Java. In Proceedings of the 2000 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 00). ACM Press, June 2000.
[8] Karl Crary and Stephanie Weirich. Flexible type analysis. In ACM SIGPLAN International Conference on Functional Programming Languages, September 1999.
[9] Cormac Flanagan and James B. Saxe. Avoiding exponential explosion: Generating compact verification conditions.
In POPL 2001: The 28th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages
193205. ACM Press, January 2001.

7 Conclusion
Our goal is to reduce the size of the trusted computing base of systems that run machine code from untrusted
sources. This is an engineering challenge that requires
work on many fronts. We are fortunate that during the
last two decades, many talented scientists have built the
mathematical infrastructure we need the theory and implementation of logical frameworks and automated theorem provers, type theory and type systems, compilation
and memory management, and programming language
design. The time is ripe to apply all of these advances
as engineering tools in the construction of safe systems.

[10] Robert Harper, Furio Honsell, and Gordon Plotkin. A

framework for defining logics. Journal of the ACM,
40(1):143184, January 1993.
[11] Robert Harper and Greg Morrisett. Compiling polymorphism using intensional type analysis. In Twenty-second
Annual ACM Symp. on Principles of Prog. Languages,
pages 130141, New York, Jan 1995. ACM Press.
[12] Neophytos G. Michael and Andrew W. Appel. Machine
instruction syntax and semantics in higher-order logic. In
17th International Conference on Automated Deduction,
June 2000.
[13] Stefan Monnier, Bratin Saha, and Zhong Shao. Principled
scavenging. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI01),
page to appear, June 2001.

References
[1] Andrew W. Appel.
Hints on proving theorems
in Twelf. www.cs.princeton.edu/appel/twelf-tutorial,
February 2000.

[14] Greg Morrisett, David Walker, Karl Crary, and Neal Glew.
From System F to typed assembly language. ACM Trans.
on Programming Languages and Systems, 21(3):527568,
May 1999.

[2] Andrew W. Appel and Edward W. Felten. Models for security policies in proof-carrying code. Technical Report
TR-636-01, Princeton University, March 2001.

[15] George Necula. Proof-carrying code. In 24th ACM

SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 106119, New York, January
1997. ACM Press.

[3] Andrew W. Appel and Amy P. Felty. A semantic model of

types and machine instructions for proof-carrying code. In
POPL 00: The 27th ACM SIGPLAN-SIGACT Symposium
on Principles of Programming Languages, pages 243253.
ACM Press, January 2000.

[16] George Ciprian Necula. Compiling with Proofs. PhD thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, September 1998.

[17] Frank Pfenning. Elf: A meta-language for deductive systems. In A. Bundy, editor, Proceedings of the 12th International Conference on Automated Deduction, pages 811
815, Nancy, France, June 1994. Springer-Verlag LNAI
814.
[18] Frank Pfenning and Carsten Schurmann. System description: Twelf a meta-logical framework for deductive systems. In The 16th International Conference on Automated
Deduction. Springer-Verlag, July 1999.
[19] Norman Ramsey and Mary F. Fernandez. Specifying representations of machine instructions. ACM Trans. on Programming Languages and Systems, 19(3):492524, May
1997.
[20] Mads Tofte and Jean-Pierre Talpin. Implementation of the
typed call-by-value -calculus using a stack of regions. In
Twenty-first ACM Symposium on Principles of Programming Languages, pages 188201. ACM Press, January
1994.
[21] David Walker, Karl Crary, and Greg Morrisett. Typed
memory management via static capabilities. ACM Trans.
on Programming Languages and Systems, 22(4):701771,
July 2000.
[22] Daniel C. Wang and Andrew W. Appel. Type-preserving
garbage collectors. In POPL 2001: The 28th ACM
SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 166178. ACM Press, January
2001.

Serial Port Complete: COM Ports, USB Virtual COM Ports, and Ports for Embedded Systems
From Everand
Serial Port Complete: COM Ports, USB Virtual COM Ports, and Ports for Embedded Systems
Jan Axelson
3.5/5 (9)
VCC: A Practical System For Verifying Concurrent C
No ratings yet
VCC: A Practical System For Verifying Concurrent C
20 pages
Phoas ICFP08
No ratings yet
Phoas ICFP08
14 pages
Conway Christopher
No ratings yet
Conway Christopher
101 pages
Lec2
No ratings yet
Lec2
51 pages
Computer-Aided Program Design: Spring 2015, Rice University Unit 1
No ratings yet
Computer-Aided Program Design: Spring 2015, Rice University Unit 1
41 pages
Europlop2024 Faultless 10
No ratings yet
Europlop2024 Faultless 10
9 pages
Production System: Fundamentals and Applications
From Everand
Production System: Fundamentals and Applications
Fouad Sabry
No ratings yet
C: A Verification Framework For Concurrent Imperative Programs
No ratings yet
C: A Verification Framework For Concurrent Imperative Programs
13 pages
Algorithmic Probability: Fundamentals and Applications
From Everand
Algorithmic Probability: Fundamentals and Applications
Fouad Sabry
No ratings yet
Dive Into Sea of C
From Everand
Dive Into Sea of C
M Ashok
No ratings yet
Verification Technology Spring 2008
No ratings yet
Verification Technology Spring 2008
8 pages
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
W01P1-Intro
No ratings yet
W01P1-Intro
31 pages
pldi25-paper579
No ratings yet
pldi25-paper579
23 pages
Deconstructing Voice Over IP
No ratings yet
Deconstructing Voice Over IP
8 pages
Pldi01 Tut PDF
No ratings yet
Pldi01 Tut PDF
88 pages
Explicit Algorithms For Probabilistic Model Checking: Dissertation
No ratings yet
Explicit Algorithms For Probabilistic Model Checking: Dissertation
210 pages
Compiler
No ratings yet
Compiler
16 pages
Mastering Python Network Automation: Automating Container Orchestration, Configuration, and Networking with Terraform, Calico, HAProxy, and Istio
From Everand
Mastering Python Network Automation: Automating Container Orchestration, Configuration, and Networking with Terraform, Calico, HAProxy, and Istio
Tim Peters
No ratings yet
"C Programming for Beginners: A Step-by-Step Guide"
From Everand
"C Programming for Beginners: A Step-by-Step Guide"
Lov kush
No ratings yet
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
Verification Techniques and Tools For Reliable Cyber-Physical Systems
No ratings yet
Verification Techniques and Tools For Reliable Cyber-Physical Systems
40 pages
Formal Verification: How Do I Know If My Circuit Works? Simulation Formal Verification: Prove It Works
No ratings yet
Formal Verification: How Do I Know If My Circuit Works? Simulation Formal Verification: Prove It Works
66 pages
Clarke
No ratings yet
Clarke
27 pages
Frap Book
No ratings yet
Frap Book
150 pages
3 s2.0 B978178548112350014X Main
No ratings yet
3 s2.0 B978178548112350014X Main
6 pages
Lec01 Jan 19 PDF
No ratings yet
Lec01 Jan 19 PDF
28 pages
Gattinger - 2018 - New Directions in Model Checking Dynamic Epistemic Logic
No ratings yet
Gattinger - 2018 - New Directions in Model Checking Dynamic Epistemic Logic
249 pages
PNP Kindle
No ratings yet
PNP Kindle
260 pages
Deconstructing Voice-over-IP: BSDFG, Asdfg and CSDFG
No ratings yet
Deconstructing Voice-over-IP: BSDFG, Asdfg and CSDFG
12 pages
Modal Types For Mobile Code
No ratings yet
Modal Types For Mobile Code
386 pages
Formal Verification Specification: AND Gate
No ratings yet
Formal Verification Specification: AND Gate
7 pages
Software Model Checking Survey
No ratings yet
Software Model Checking Survey
57 pages
C Programming
From Everand
C Programming
Netra
No ratings yet
Hacks To Crush Plc Program Fast & Efficiently Everytime... : Coding, Simulating & Testing Programmable Logic Controller With Examples
From Everand
Hacks To Crush Plc Program Fast & Efficiently Everytime... : Coding, Simulating & Testing Programmable Logic Controller With Examples
Michael Blake
5/5 (1)
PROGRAM LOGICS FOR CERTIFIED COMPILERS
No ratings yet
PROGRAM LOGICS FOR CERTIFIED COMPILERS
469 pages
Security+ Boot Camp Study Guide
From Everand
Security+ Boot Camp Study Guide
Chad Russell
5/5 (1)
Python Programming Concepts
From Everand
Python Programming Concepts
MRB
No ratings yet
Computer-Aided Reasoning: ACL2 Case Studies
No ratings yet
Computer-Aided Reasoning: ACL2 Case Studies
336 pages
Volume 1: Logical Foundations 1 Preface
No ratings yet
Volume 1: Logical Foundations 1 Preface
6 pages
Abstractions For Fault-Tolerant Distributed System Verification
No ratings yet
Abstractions For Fault-Tolerant Distributed System Verification
15 pages
FV Wiki
No ratings yet
FV Wiki
19 pages
Equality Proofs
No ratings yet
Equality Proofs
12 pages
L2
No ratings yet
L2
15 pages
Python Networking 101: Navigating essentials of networking, socket programming, AsyncIO, network testing, simulations and Ansible
From Everand
Python Networking 101: Navigating essentials of networking, socket programming, AsyncIO, network testing, simulations and Ansible
Odette Windsor
No ratings yet
Towards The Visualization of Erasure Coding: Raymond Sheppard
No ratings yet
Towards The Visualization of Erasure Coding: Raymond Sheppard
6 pages
Scott Aaronson P NP Survey
No ratings yet
Scott Aaronson P NP Survey
122 pages
NP Is Both True and Provable, Why Proving It Is So Hard
No ratings yet
NP Is Both True and Provable, Why Proving It Is So Hard
121 pages
Learning Linux Binary Analysis: Learning Linux Binary Analysis
From Everand
Learning Linux Binary Analysis: Learning Linux Binary Analysis
Ryan "elfmaster" O'Neill
4/5 (1)
ICS Integrated Canonizer and Solver
No ratings yet
ICS Integrated Canonizer and Solver
4 pages
Top Networking Terms You Should Know
From Everand
Top Networking Terms You Should Know
JOHN SMITH
No ratings yet
Weakest-Precondition of Unstructured Programs
No ratings yet
Weakest-Precondition of Unstructured Programs
8 pages
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
A Categorial Mistake in The Formal Verification Debate Extended Abstract
No ratings yet
A Categorial Mistake in The Formal Verification Debate Extended Abstract
4 pages
Foundation Course for Advanced Computer Studies
From Everand
Foundation Course for Advanced Computer Studies
Franck Ismael Djédjé
No ratings yet
2016 Complete Symbolic Simulation of SystemC Models Efficient Formal Verification of Finite Non-Terminating Programs
No ratings yet
2016 Complete Symbolic Simulation of SystemC Models Efficient Formal Verification of Finite Non-Terminating Programs
172 pages
The Calculus - of Computation
No ratings yet
The Calculus - of Computation
375 pages
Theory of Computation
No ratings yet
Theory of Computation
375 pages
Dot Product Proofs and their Applications
No ratings yet
Dot Product Proofs and their Applications
75 pages
Function Generator Using X86 Microprocessor
No ratings yet
Function Generator Using X86 Microprocessor
7 pages
ZTransformPairs PDF
No ratings yet
ZTransformPairs PDF
1 page
Intel 8086 Instruction Format
No ratings yet
Intel 8086 Instruction Format
16 pages
Survey On Security in Internet of Things State of The Art and Challenges
No ratings yet
Survey On Security in Internet of Things State of The Art and Challenges
8 pages
Internet of Things (IoT) A Vision, Architectural Elements, and Security Issues
No ratings yet
Internet of Things (IoT) A Vision, Architectural Elements, and Security Issues
5 pages
Nano Editor Keyboard Shortcuts
No ratings yet
Nano Editor Keyboard Shortcuts
1 page
Tutorial Design Compiler
No ratings yet
Tutorial Design Compiler
9 pages
Synopsys VCS Commands For Verilogn Compilation
No ratings yet
Synopsys VCS Commands For Verilogn Compilation
3 pages
Lab 6 Introduction To Verilog
No ratings yet
Lab 6 Introduction To Verilog
7 pages
IDC Digital Ink InfoBrief - en-US
No ratings yet
IDC Digital Ink InfoBrief - en-US
20 pages
Time-Adjustable Delay Circuit PDF
No ratings yet
Time-Adjustable Delay Circuit PDF
8 pages
FPGA Implementation of Neural Networks - A Survey of A Decade of Progress
No ratings yet
FPGA Implementation of Neural Networks - A Survey of A Decade of Progress
4 pages
Formal Methods Roadmap PVS, ICS, and SAL
No ratings yet
Formal Methods Roadmap PVS, ICS, and SAL
28 pages
Chapter 9 - Program Control Instructions
100% (5)
Chapter 9 - Program Control Instructions
60 pages
Linear Temporal Logic Symbolic Model Checking
100% (1)
Linear Temporal Logic Symbolic Model Checking
41 pages
1f547ad98129f27b06b07fe3da44e498
No ratings yet
1f547ad98129f27b06b07fe3da44e498
969 pages
Components of .Net Framework, CLR, CTS, CLS, Base Class Library
No ratings yet
Components of .Net Framework, CLR, CTS, CLS, Base Class Library
14 pages
Isomorphisms of Types: From - Calculus To Information Retrieval and Language Design
No ratings yet
Isomorphisms of Types: From - Calculus To Information Retrieval and Language Design
11 pages
Obliq Programming Language
No ratings yet
Obliq Programming Language
14 pages
Data Context Drivers
No ratings yet
Data Context Drivers
25 pages
Java String
No ratings yet
Java String
35 pages
Most Common GMAT Study Plans Used by GMAT Club Members
No ratings yet
Most Common GMAT Study Plans Used by GMAT Club Members
54 pages
Julia Tutorial
No ratings yet
Julia Tutorial
53 pages
CS3391 QB
No ratings yet
CS3391 QB
33 pages
FullStackCafe C#I
No ratings yet
FullStackCafe C#I
9 pages
Java BS Mid Paper 2014
No ratings yet
Java BS Mid Paper 2014
11 pages
Video Link Video Link
No ratings yet
Video Link Video Link
199 pages
Learn Python Programming Quickly
No ratings yet
Learn Python Programming Quickly
198 pages
Foundations of Computer Science C Edition (Aho, Ullman) (1994)
100% (3)
Foundations of Computer Science C Edition (Aho, Ullman) (1994)
885 pages
Tips - Delphi Xe2 Foundations Part 1 2012 PDF
No ratings yet
Tips - Delphi Xe2 Foundations Part 1 2012 PDF
159 pages
Unit 3
No ratings yet
Unit 3
42 pages
r05310505 Principles of Programming Langauges
No ratings yet
r05310505 Principles of Programming Langauges
5 pages
Python Essentials Concept For Beginners
No ratings yet
Python Essentials Concept For Beginners
220 pages
MCQ Questions Agra
No ratings yet
MCQ Questions Agra
116 pages
SACS2
No ratings yet
SACS2
68 pages
Unit-4 Java
No ratings yet
Unit-4 Java
42 pages
Java CTS Dumps 5
No ratings yet
Java CTS Dumps 5
30 pages
Chapter 2 Methods and Exception
No ratings yet
Chapter 2 Methods and Exception
42 pages
Intermediate Code Generator 1
No ratings yet
Intermediate Code Generator 1
48 pages
Software Testing Unit 3
No ratings yet
Software Testing Unit 3
32 pages
Compiler Design Notes (UNIT 4 & 5)
100% (2)
Compiler Design Notes (UNIT 4 & 5)
16 pages
Java Notes2 - TutorialsDuniya
No ratings yet
Java Notes2 - TutorialsDuniya
174 pages
Building Java Programs: Parameters and Objects
No ratings yet
Building Java Programs: Parameters and Objects
53 pages
Oops MCQ (Unit-1)
88% (16)
Oops MCQ (Unit-1)
7 pages
Java Chap2: Class As The Basis of All Computation (Prof. Ananda M Ghosh.)
90% (10)
Java Chap2: Class As The Basis of All Computation (Prof. Ananda M Ghosh.)
20 pages

Foundational Proof-Carrying Code

Uploaded by

Foundational Proof-Carrying Code

Uploaded by

Foundational Proof-Carrying Code

What is the minimum possible size of the components

2 Choice of logic and framework

To do machine-checked proofs, one must first choose

Foundational PCC. Unlike type-specialized PCC, the

We have object-logic constructors lam (to construct

Our proofs dont need extensionality or the general axiom

and : tm o -> tm o -> tm o =

%infix right 12 and.

and_e1 : pf (A and B) -> pf A =

In fact, we can define add(i, j, k) as this predicate on

Of course, the defined lemmas are checked by machine

Similarly, we can define the instruction ri m[r j + c]

3 Specifying machine instructions

But we must also take account of instruction fetch and

The load instruction might be encoded as,

Then we can say that some number w decodes to an

the program counter r(PC ) points to an illegal instruction.

with the ellipsis denoting the many other instructions of

trust the assembler.

above type inference rule for pair projection can then be

datatype a list = nil

Indexed model of recursive types

In more recent work, David McAllester and I have

Limitations. In the resulting semantics [3] we could

This model of values would be sufficient in a semantics

Semantic models of types

such that if hk, vi and 0 j k then h j, vi . For

Finally, for any well founded type-constructor F, we have

e :k je0 . 0 j < k e 7 j e0 nf(e0 )

Our work on mutable fields is still in a preliminary

Next we define what is meant by a typing judgement.

allocset = num type

The astute reader will notice that the metalogical type of

Typed machine language

Morrisetts typed assembly language [14] is at too high

Typing rules. Proofs of theorems such as the following

Hoare logic. In reasoning about machine instructions at

Software engineering practices. We define all of these

6 Pruning the runtime system

Just as bugs in the compiler (of a conventional system)

Garbage collection = Regions + Intensional types.

[4] Andrew W. Appel and David McAllester. An indexed

[10] Robert Harper, Furio Honsell, and Gordon Plotkin. A

[15] George Necula. Proof-carrying code. In 24th ACM

[3] Andrew W. Appel and Amy P. Felty. A semantic model of

You might also like