Acid: A Debugger Built From A Language
Acid: A Debugger Built From A Language
Phil Winterbottom
[email protected]
ABSTRACT
1. Introduction
The size and complexity of programs have increased in proportion to processor
speed and memory but the interface between debugger and programmer has changed
little. Graphical user interfaces have eased some of the tedious aspects of the interac
tion. A graphical interface is a convenient means for navigating through source and data
structures but provides little benefit for process control. The introduction of a new con
current language, Alef [Win93], emphasized the inadequacies of the existing Plan 9
[Pike90] debugger db, a distant relative of adb, and made it clear that a new debugger
was required.
Current debuggers like dbx, sdb, and gdb are limited to answering only the ques
tions their authors envisage. As a result, they supply a plethora of specialized com
mands, each attempting to anticipate a specific question a user may ask. When a
debugging situation arises that is beyond the scope of the command set, the tool is use
less. Further, it is often tedious or impossible to reproduce an anomalous state of the
program, especially when the state is embedded in the programs data structures.
Acid applies some ideas found in CAD software used for hardware test and simula
tion. It is based on the notion that the state and resources of a program are best repre
sented and manipulated by a language. The state and resources, such as memory, regis
ters, variables, type information and source code are represented by variables in the lan
guage. Expressions provide a computation mechanism and control statements allow
repetitive or selective interpretation based on the result of expression evaluation. The
heart of the Acid debugger is an interpreter for a small typeless language whose opera
tors mirror the operations of C and Alef, which in turn correspond well to the basic oper
ations of the machine. The interpreter itself knows nothing of the underlying hardware;
it deals with the program state and resources in the abstract. Fundamental routines to
control processes, read files, and interface to the system are implemented as builtin
__________________
Originally appeared in Proc. of the Winter 1994 USENIX Conf., pp. 211-222, San Francisco, CA
2
2. Related Work
DUEL [Gol93], an extension to gdb [Stal91], proposes using a high level expression
evaluator to solve some of these problems. The evaluator provides iterators to loop over
data structures and conditionals to control evaluation of expressions. The author shows
that complex state queries can be formulated by combining concise expressions but this
only addresses part of the problem. A program is a dynamic entity; questions asked
when the program is in a static state are meaningful only after the program has been
caught in that state. The framework for manipulating the program is still as primitive
as the underlying debugger. While DUEL provides a means to probe data structures it
entirely neglects the most beneficial aspect of debugging languages: the ability to con
trol processes. Acid is structured around a thread of control that passes between the
interpreter and the target program.
The NeD debugger [May92] is a set of extensions to TCL [Ous90] that provide
debugging primitives. The resulting language, NeDtcl, is used to implement a portable
interface between a conventional debugger, pdb [May90], and a server that executes
NeDtcl programs operating on the target program. Execution of the NeDtcl programs
implements the debugging primitives that pdb expects. NeD is targeted at multi-
process debugging across a network, and proves the flexibility of a language as a means
of communication between debugging tools. Whereas NeD provides an interface
between a conventional debugger and the process it debugs, Acid is the debugger itself.
While NeD has some of the ideas found in Acid it is targeted toward a different purpose.
Acid seeks to integrate the manipulation of a programs resources into the debugger
while NeD provides a flexible interconnect between components of the debugging envi
ronment. The choice of TCL is appropriate for its use in NeD but is not suitable for Acid.
Acid relies on the coupling of the type system with expression evaluation, which are the
root of its design, to provide the debugging primitives.
Dalek [Ols90] is an event based language extension to gdb. State transitions in the
target program cause events to be queued for processing by the debugging language.
3
Acid has many of the advantages of same process or local agent debuggers, like
Parasight [Aral], without the need for dynamic linking or shared memory. Acid improves
on the ideas of these other systems by completely integrating all aspects of the debug
ging process into the language environment. Of particular importance is the relationship
between Acid variables, program symbols, source code, registers and type information.
This integration is made possible by the design of the Acid language.
Interpreted languages such as Lisp and Smalltalk are able to provide richer debug
ging environments through more complete information than their compiled counter
parts. Acid is a means to gather and represent similar information about compiled pro
grams through cooperation with the compilation tools and library implementers.
5. Control Flow
The while and loop statements implement looping. The former is similar to the
same statement in C. The latter evaluates starting and ending expressions yielding inte
gers and iterates while an incrementing loop index is within the bounds of those expres
sions.
acid: i = 0; loop 1,5 do print(i=i+1)
0x00000001
0x00000002
0x00000003
0x00000004
0x00000005
acid:
6. Addressing
Two indirection operators allow Acid to access values in the program being
debugged. The * operator fetches a value from the memory image of an executing pro
cess; the @ operator fetches a value from the text file of the process. When either oper
ator appears on the left side of an assignment, the value is written rather than read.
The indirection operator must know the size of the object referenced by a variable.
The Plan 9 compilers neglect to include this information in the program symbol table, so
Acid cannot derive this information implicitly. Instead Acid variables have formats. The
format is a code letter specifying the printing style and the effect of some of the opera
tors on that variable. The indirection operators look at the format code to determine the
number of bytes to read or write. The format codes are derived from the format letters
used by db. By default, symbol table variables and numeric constants are assigned the
format code 'X' which specifies 32-bit hexadecimal. Printing such a variable yields
output of the form 0x00123456. An indirect reference through the variable fetches
32 bits of data at the address indicated by the variable. Other formats specify various
data types, for example i an instruction, D a signed 32 bit decimal, s a null-terminated
string. The fmt function allows the user to change the format code of a variable to con
trol the printing format and operator side effects. This function evaluates the expres
sion supplied as the first argument, attaches the format code supplied as the second
argument to the result and returns that value. If the result is assigned to a variable, the
new format code applies to that variable. For convenience, Acid provides the \ operator
as a shorthand infix form of fmt. For example:
acid: x=10
acid: x // print x in hex
0x0000000a
acid: x = fmt(x, 'D') // make x type decimal
acid: print(x, fmt(x, 'X'), x\X) // print x in decimal & hex
10 0x0000000a 0x0000000a
acid: x // print x in decimal
10
acid: x\o // print x in octal
000000000012
Here, main is the address of the function of the same name in the program under test.
The loop retrieves the five instructions beginning at that address and then prints the
address and the assembly language representation of each. Notice that the stride of the
increment operator varies with the size of the instruction: the MΧVL at 0x0000223a is
a two byte instruction while all others are four bytes long.
Registers are treated as normal program variables referenced by their symbolic
assembler language names. When a process stops, the register set is saved by the ker
nel at a known virtual address in the process memory map. The Acid variables associ
ated with the registers point to the saved values and the * indirection operator can then
be used to read and write the register set. Since the registers are accessed via Acid vari
ables they may be used in arbitrary expressions.
acid: PC // addr of saved PC
0xc0000f60
acid: *PC
0x0000623c // contents of PC
acid: *PC\a
main
acid: *R1=10 // modify R1
acid: asm(*PC+4) // disassemble @ PC+4
main+0x4 0x00006240 MΧVW R31,0x0(R29)
main+0x8 0x00006244 MΧVW $setR30(SB),R30
main+0x10 0x0000624c MΧVW R1,_clock(SB)
7. Process Interface
A program executing under Acid is monitored through the proc file system inter
face provided by Plan 9. Textual messages written to the ctl file control the execution
of the process. For example writing waitstop to the control file causes the write to
block until the target process enters the kernel and is stopped. When the process is
stopped the write completes. The startstop message starts the target process and
then does a waitstop action. Synchronization between the debugger and the target
process is determined by the actions of the various messages. Some operate asyn
chronously to the target process and always complete immediately, others block until
the action completes. The asynchronous messages allow Acid to control several pro
cesses simultaneously.
The interpreter has builtin functions named after each of the control messages.
The functions take a process id as argument. Any time a control message causes the
program to execute instructions the interpreter performs two actions when the control
operation has completed. The Acid variables pointing at the register set are fixed up to
point at the saved registers, and then the user defined function stopped is executed.
The stopped function may print the current address, line of source or instruction and
return to interactive mode. Alternatively it may traverse a complex data structure, gather
6
In this example, the three primitives are combined in an expression to print a line of
source code associated with an address. The src function prints a few lines of source
around the address supplied as its argument. A companion routine, Bsrc, communi
cates with the external editor sam. Given an address, it loads the corresponding source
file into the editor and highlights the line containing the address. This simple interface
is easily extended to more complex functions. For example, the step function can
select the current file and line in the editor each time the target program stops, giving
the user a visual trace of the execution path of the program. A more complete interface
allowing two way communication between Acid and the acme user interface [Pike93] is
under construction. A filter between the debugger and the user interface provides inter
pretation of results from both sides of the interface. This allows the programming envi
ronment to interact with the debugger and vice-versa, a capability missing from the
sam interface. The src and Bsrc functions are both written in Acid code using the
file and line primitives. Acid provides library functions to step through source level
statements and functions. Furthermore, addresses in Acid expressions can be specified
by source file and line. Source code is manipulated in the Acid list data type.
The bpset function plants a break point in memory. The function starts by using the
match builtin to search the breakpoint list to determine if a breakpoint is already set at
the address. The indirection operator, controlled by the format code returned by the
fmt primitive, is used to plant the breakpoint in memory. The variables bpfmt and
bpinst are Acid global variables containing the format code specifying the size of the
breakpoint instruction and the breakpoint instruction itself. These variables are set by
architecture-dependent library code when the debugger first attaches to the executing
image. Finally the address of the breakpoint is appended to the breakpoint list,
bplist.
defn step() // single step
{
local lst, lpl, addr, bput;
lpl = lst;
while lpl do { // place breakpoints
*(head lpl) = bpinst;
lpl = tail lpl;
}
text file. The startstop builtin writes the startstop message to the proc control file
for the process named pid. The target process executes until some condition causes it
to enter the kernel, in this case, the execution of a breakpoint. When the process
blocks, the debugger regains control and invokes the Acid library function stopped
which reports the address and cause of the blockage. The startstop function com
pletes and returns to the step function where the follow-set is used to replace the
breakpoints placed earlier. Finally, if the address of the original PC contained a break
point, it is replaced.
Notice that this approach to process control is inherently portable; the Acid code is
shared by the debuggers for all architectures. Acid variables and builtin functions pro
vide a transparent interface to architecture-dependent values and functions. Here the
breakpoint value and format are referenced through Acid variables and the follow
primitive masks the differences in the underlying instruction set.
The next function, similar to the dbx command of the same name, is a simpler
example. This function steps through a single source statement but steps over function
calls.
defn next()
{
local sp, bound;
The next function starts by saving the current stack pointer in a local variable. It then
uses the Acid library function fnbound to return the addresses of the first and last
instructions in the current function in a list. The stmnt function executes a single
source statement and then uses src to print a few lines of source around the new PC.
If the new value of the PC remains in the current function, next returns. When the
executed statement is a function call or a return from a function, the new value of the
PC is outside the bounds calculated by fnbound and the test of the while loop is
evaluated. If the statement was a return, the new value of the stack pointer is greater
than the original value and the loop completes without execution. Otherwise, the loop
is entered and instructions are continually executed until the value of the PC is between
the bounds calculated earlier. At that point, execution ceases and a few lines of source
in the vicinity of the PC are printed.
Acid provides concise and elegant expression for control and manipulation of tar
get programs. These examples demonstrate how a few well-chosen primitives can be
combined to create a rich debugging environment.
The support for parallel debugging in Acid depends on a crucial kernel modifica
tion: when the text segment of a program is written (usually to place a breakpoint), the
segment is cloned to prevent other threads from encountering the breakpoint. Although
this incurs a slight performance penalty, it is of little importance while debugging.
defn
Bitmap(addr) {
complex Bitmap addr;
print("Rectangle r {\n");
Rectangle(addr.r);
print("}\n");
print("Rectangle clipr {\n");
Rectangle(addr.clipr);
print("}\n");
print(" ldepth ", addr.ldepth, "\n");
print(" id ", addr.id, "\n");
print(" cache ", addr.cache, "\n");
};
The struct declaration specifies decoding instructions for the complex type named
Bitmap. Although the syntax is superficially similar to a C structure declaration, the
semantics differ markedly: the C declaration specifies a layout, while the Acid declara
tion tells how to decode it. The declaration specifies a type, an offset, and name for
each member of the complex object. The type is either the name of another complex
declaration, for example, Rectangle, or a format code. The offset is the number of
bytes from the start of the object to the member and the name is the members name in
the Alef or C declaration. This type description is a close match for C and Alef, but is
simple enough to be language independent.
The Bitmap function expects the address of a Bitmap as its only argument. It
uses the decoding information contained in the Bitmap structure declaration to
extract, format, and print the value of each member of the complex object pointed to by
the argument. The Alef compiler emits code to call other Acid functions where a
11
member is another complex type; here, Bitmap calls Rectangle to print its con
tents.
The complex declarations associate Alef variables with complex types. In the
example, darkgrey is the name of a global variable of type Bitmap in the program
being debugged. Whenever the name darkgrey is evaluated by Acid, it automatically
calls the Bitmap function with the address of darkgrey as the argument. The sec
ond complex declaration associates a local variable or parameter named b in function
Window_settag with the Bitmap complex data type.
Acid borrows the C operators . and -> to access the decoding parameters of a
member of a complex type. Although this representation is sufficiently general for
describing the decoding of both C and Alef complex data types, it may prove too restric
tive for target languages with more complicated type systems. Further, the assumption
that the compiler can select the proper Acid format code for each basic type in the lan
guage is somewhat naive. For example, when a member of a complex type is a pointer,
it is assigned a hexadecimal type code; integer members are always assigned a decimal
type code. This heuristic proves inaccurate when an integer field is a bit mask or set of
bit flags which are more appropriately displayed in hexadecimal or octal.
The go command creates a process and plants breakpoints at the entry to malloc and
free. The program is then started and continues until it exits or stops. If the reason
for stopping is anything other than the breakpoints in malloc and free, Acid prints
the usual status information and returns to the interactive prompt.
When the process stops on entering malloc, the debugger must capture and save
the address that malloc will return. After saving a stack trace so the calling routine
can be identified, it places a breakpoint at the return address and restarts the program.
When malloc returns, the breakpoint stops the program, allowing the debugger to
grab the address of the new memory block from the return register. The address and
stack trace are added to the list of outstanding memory blocks, the breakpoint is
12
The presence of a block in the allocation list does not imply it is there because of a leak;
for instance, it may have been in use when the program terminated. The refs()
library function scans the data, bss, and stack segments of the process looking for
pointers into the allocated blocks. When one is found, the block is deleted from the out
standing block list. The leak function is used again to report the blocks remaining
allocated and unreferenced. This strategy proves effective in detecting disconnected
(but non-circular) data structures.
The leak detection process is entirely passive. The program is not specially com
piled and the source code is not required. As with the Acid support functions for the
Alef runtime environment, the author of the library routines has encapsulated the func
tionality of the library interface in Acid code. Any programmer may then check a
programs use of the library routines without knowledge of either implementation. The
performance impact of running leak detection is great (about 10 times slower), but it
has not prevented interactive programs like sam and the 8½ window system from being
tested.
Acid can perform the same function in a language independent manner without
modifying the source, object or binary of the program. The following example shows ls
being run under the control of the Acid coverage library.
philw-helix% acid -l coverage /bin/ls
/bin/ls: mips plan 9 executable
/lib/acid/port
/lib/acid/mips
/lib/acid/coverage
acid: coverage()
acid
newstime
profile
tel
wintool
2: (error) msg: pid=11419 startstop: process exited
acid: analyse(ls)
ls.c:102,105
102: return 1;
103: }
104: if(db[0].qid.path&CHDIR && dflag==0){
105: output();
ls.c:122,126
122: memmove(dirbuf+ndir, db, sizeof(Dir));
123: dirbuf[ndir].prefix = 0;
124: p = utfrrune(s, '/');
125: if(p){
126: dirbuf[ndir].prefix = s;
The coverage function begins by looping through the text segment placing break
points at the entry to each basic block. The start of each basic block is found using the
Acid builtin function follow. If the list generated by follow contains more than one
element, then the addresses mark the start of basic blocks. A breakpoint is placed at
each address to detect entry into the block. If the result of follow is a single address
then no action is taken, and the next address is considered. Acid maintains a list of
breakpoints already in place and avoids placing duplicates (an address may be the desti
nation of several branches).
After placing the breakpoints the program is set running. Each time a breakpoint
is encountered Acid deletes the address from the breakpoint list, removes the break
point from memory and then restarts the program. At any instant the breakpoint list
contains the addresses of basic blocks which have not been executed. The analyse
function reports the lines of source code bounded by basic blocks whose addresses are
have not been deleted from the breakpoint list. These are the basic blocks which have
not been executed. Program performance is almost unaffected since each breakpoint is
executed only once and then removed.
The library contains a total of 128 lines of Acid code. An obvious extension of this
algorithm could be used to provide basic block profiling.
16. Conclusion
Acid has two areas of weakness. As with other language-based tools like awk, a
programmer must learn yet another language to step beyond the normal debugging
functions and use the full power of the debugger. Second, the command line interface
supplied by the yacc parser is inordinately clumsy. Part of the problem relates directly
to the use of yacc and could be circumvented with a custom parser. However, structural
problems would remain: Acid often requires too much typing to execute a simple com
mand. A debugger should prostitute itself to its users, doing whatever is wanted with a
minimum of encouragement; commands should be concise and obvious. The language
14
interface is more consistent than an ad hoc command interface but is clumsy to use.
Most of these problems are addressed by an Acme interface which is under construc
tion. This should provide the best of both worlds: graphical debugging and access to the
underlying acid language when required.
The name space clash between Acid variables, keywords, program variables, and
functions is unavoidable. Although it rarely affects a debugging session, it is annoying
when it happens and is sometimes difficult to circumvent. The current renaming
scheme is too crude; the new names are too hard to remember.
Acid has proved to be a powerful tool whose applications have exceeded expecta
tions. Of its strengths, portability, extensibility and parallel debugging support were by
design and provide the expected utility. In retrospect, its use as a tool for code test and
verification and as a medium for communicating type information and encapsulating
interfaces has provided unanticipated benefits and altered our view of the debugging
process.
17. Acknowledgments
Bob Flandrena was the first user and helped prepare the paper. Rob Pike endured
three buggy Alef compilers and a new debugger in a single sitting.
18. References
[Pike90] R. Pike, D. Presotto, K. Thompson, H. Trickey, Plan 9 from Bell Labs, UKUUG
Proc. of the Summer 1990 Conf., London, England, 1990, reprinted, in a different form,
in this volume.
[Gol93] M. Golan, D. Hanson, DUEL -- A Very High-Level Debugging Language,
USENIX Proc. of the Winter 1993 Conf., San Diego, CA, 1993.
[Lin90] M. A. Linton, The Evolution of DBX, USENIX Proc. of the Summer 1990 Conf.,
Anaheim, CA, 1990.
[Stal91] R. M. Stallman, R. H. Pesch, Using GDB: A guide to the GNU source level debug
ger, Technical Report, Free Software Foundation, Cambridge, MA, 1991.
[Win93] P. Winterbottom, Alef reference Manual, this volume.
[Pike93] Rob Pike, Acme: A User Interface for Programmers, USENIX Proc. of the
Winter 1994 Conf., San Francisco, CA, reprinted in this volume.
[Ols90] Ronald A. Olsson, Richard H. Crawford, and W. Wilson Ho, Dalek: A GNU,
improved programmable debugger, USENIX Proc. of the Summer 1990 Conf., Anaheim,
CA.
[May92] Paul Maybee, NeD: The Network Extensible Debugger USENIX Proc. of the
Summer 1992 Conf., San Antonio, TX.
[Aral] Ziya Aral, Ilya Gertner, and Greg Schaffer, Efficient debugging primitives for mul
tiprocessors, Proceedings of the Third International Conference on Architectural
Support for Programming Languages and Χperating Systems, SIGPLAN notices Nr. 22,
May 1989.