0% found this document useful (0 votes)
60 views

Testing Using Log File Analysis: Tools, Methods, and Issues

The document discusses testing software using log file analysis. It presents a framework for automatically analyzing log files to check if they conform to specifications. Log files record program events sequentially. The framework defines log file analyzers as parallel state machines that process log file lines. A language called LFAL is used to specify analyzers as state machines with transitions triggered by log file lines. The analyzers can check both unit-level and system-level tests. Issues like efficiency, false positives/negatives, and when to disable logging are also discussed. The framework provides a methodology between current testing and formal verification.

Uploaded by

Ayokunle Ajayi
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PS, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views

Testing Using Log File Analysis: Tools, Methods, and Issues

The document discusses testing software using log file analysis. It presents a framework for automatically analyzing log files to check if they conform to specifications. Log files record program events sequentially. The framework defines log file analyzers as parallel state machines that process log file lines. A language called LFAL is used to specify analyzers as state machines with transitions triggered by log file lines. The analyzers can check both unit-level and system-level tests. Issues like efficiency, false positives/negatives, and when to disable logging are also discussed. The framework provides a methodology between current testing and formal verification.

Uploaded by

Ayokunle Ajayi
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PS, PDF, TXT or read online on Scribd
You are on page 1/ 10

Testing using Log File Analysis: Tools, Methods, and Issues

James H. Andrews Dept. of Computer Science University of Western Ontario London, Ontario, Canada N6A 5B7 [email protected]

Abstract
Large software systems often keep log les of events. Such log les can be analyzed to check whether a run of a program reveals faults in the system. We discuss how such log les can be used in software testing. We present a framework for automatically analyzing log les, and describe a language for specifying analyzer programs and an implementation of that language. The language permits compositional, compact specications of software, which act as test oracles; we discuss the use and efcacy of these oracles for unit- and system-level testing in various settings. We explore methodological issues such as efciency and logging policies, and the scope and limitations of the framework. We conclude that testing using log le analysis constitutes a useful methodology for software verication, somewhere between current testing practice and formal verication methodologies.

1. Introduction
It is clear that many aspects of the desired behaviour of software can be given formal specications. What is not so clear is how those specications can be connected to actual programs in an automated way for practical software engineering purposes. Most formal verication or development methods assume languages with well-dened semantics, or consider only well-dened subsets of languages. However, most programs today are still developed using languages without well-dened semantics. In such languages, we cannot formally and reliably predict observables (e.g. program outputs) from program code; the only observables we can get hold of are those obtained by individual runs of the program itself. Yet even these observables are valuable. After all, programmers have always depended on them in order to test

and debug programs, following established, practical testing techniques [4, 15]. Of particular value is a kind of output le generally referred to as a debug le or log le. Such a le records events such as inputs and outputs, values of variables, and parameters and returns of function calls, sequentially as the program is running. Programmers inspect log les to identify and diagnose problems in code. We may refer to this practice as (informal) log le analysis. A way that formal methods can help in program testing using log le analysis naturally suggests itself. First, specify what the program should log to the log le and how. Next, or concurrently, formally specify the format of an acceptable log le. Run the program repeatedly on different inputs; see whether the log les meet the specication by doing a formal log le analysis. In cases in which they do not, correct the error and re-run. This paper explores some tools and methods for doing this testing using log le analysis, and the issues arising from doing it. Section 2 gives a summary of a formal framework for log le analysis. It denes the notion of a log le analyzer using state machine concepts, and discusses a prototype implementation. Sections 3 and 4 study by examples the use of this framework in unit testing and system testing, respectively. Section 5 discusses some issues to do with development methodology, such as the avoidance of false negatives and positives, and how and when logging should be turned off. Section 6 explores the advantages, scope and limitations of the framework. Section 7 discusses related work, and Section 8 gives conclusions and prospects for future work.

2. The Analysis Framework


In this section we summarize a framework for log le analysis which is presented in more detail in [1]. First we give a denition of the format of a log le which will be used in this paper. We then argue that the most appropriate and useful form for a formal log le analyzer is as a set of par-

allel state machines making transitions based on lines from the log le, and we dene a log le analyzer informally in this way. We then describe a textual language for specifying log le analyzers, and a prototype implementation of the language.

An identifying name; A (possibly innite) set of machine states, of which one is dened as the initial state, and a subset of which is dened as the nal states; A set of log les lines which the machine notices; and

2.1. Log Files


For simplicity, we impose some restrictions on the format of log les. A keyword is a sequence of alphanumeric characters and underscores beginning with a lower-case letter. A string is a sequence of characters enclosed in double quotes. A number is a real number in standard ASCII representation. A log le line is a sequence of keywords, strings and numbers, beginning with a keyword, which are separated by blanks and terminated by a new-line character sequence. A log le is a sequence of log le lines. The left-hand side of Figure 1 shows part of a log le from a hypothetical program, one of whose tasks is to control a heater in a room (compare [6]). The program gets input every ve seconds from a digital thermometer, printing a message of the form temp n every time a room temperature of n degrees Celsius is reported. The program is supposed to switch on the heater whenever the temperature drops below 20, and switch it off as soon as the temperature rises to 20 or more again; it reports on these events with log le messages of the form heater on or heater off. Note that the log le in the gure does not conform to the informal specication given above; the heater is not turned off until at least ve seconds after the temperature has returned to 20 or above. This is not necessarily obvious by inspection. It is possible, however, to check such a log le using a simple grammar. Things are not so straightforward for more complex log les. The right-hand side of Figure 1 is an example. Here, the system not only controls the temperature in the room, but also does tasks which involve the allocation and deallocation of memory. The system reports on each call to the C functions malloc and free, to aid in detecting problems like memory leaks. The result is two separate threads of log le reports which are arbitrarily interleaved. A transition relation between source states, log le lines and destination states (which may be the same as the source state). Informally, a log le machine processes a log le as follows. It starts in its initial state, and reacts to each line of the log le in sequence. If the line is not one which the machine notices, it discards the line and stays in the same state. If it does notice the line, and it can make a transition on that line to a destination state, it does so; if, however, no transition is possible, it stays in the same state and reports an error. If, at the end of the log le, the machine is not in one of its nal states, it also reports an error. (A more formal presentation appears in [1]). As an example, see Figure 2. This gure shows a log le machine, named heatermonitor, which accepts correct log les for the heater monitor specication from the Introduction. (It correctly does not accept the log le at the left of Figure 1, because it cannot make a transition from the should be off state at line 7 of the log le.) heatermonitor notices lines of the form temp N , heater on, and heater off. The usual conventions for depicting state machines are used; nal states are indicated by double circles, the initial state is indicated by a small arrow, and conditions on transitions appear in square brackets. A log le analyzer consists of a (possibly innite) set of log le machines, which process a log le in parallel. Typically, each machine in an analyzer will notice a different (and possibly disjoint) set of log le lines, but the union of all the sets of lines noticed will cover all the lines we expect in a log le.

2.3. A Language for Specifying Analyzers


Along with the abstract notion of a log le analyzer, we must have some concrete syntax for specifying analyzers. We have developed a simple language, referred to here as LFAL (Log File Analysis Language), for doing this [1]. The upper-level syntax of LFAL is a fairly straightforward syntactic representation of state machines; for each machine, we specify the name, initial and nal states, and transition relation. An analyzer specication is a sequence of machine specications. The transition relation is specied by clauses indicating the source, destination, and triggering line of individual transitions, possibly with limiting conditions. As an example, consider the log le machine from Figure 2. This

2.2. Log File Analyzers


The considerations in the last section suggest a view of a log le analyzer as a set of parallel state machines, each state machine analyzing one thread of events, which reports errors if transitions cannot be taken. Like the Statecharts formalism [10], this view allows parallel components to be expressed succinctly without state space explosion, and builds on a fairly simple, intuitive model of computation (state machines). We therefore dene a log le machine as consisting of:

temp 21 temp 20 temp 19 heater on temp 19 temp 20 temp 21 heater off temp 21

temp 21 malloc 2096 temp 19 malloc 2088 malloc 1016 heater on temp 19 temp 21 free 2088 heater off temp 21 Figure 1. Left: a simple log le. Right: a more complex log le.

heatermonitor: temp N [N>=20] off heater off should _be_off temp N [N<20] should _be_on heater on on temp N [N>=20] temp N [N<20]

Figure 2. A log le machine for the heater monitor system.

machine heatermonitor; initial_state off; from if from if from to from if from if from to off, on temp(N), (N >= 20), to off; off, on temp(N), (N < 20), to should_be_on; should_be_on, on heater(on), on; on, on temp(N), (N < 20), to on; on, on temp(N), (N >= 20), to should_be_off; should_be_off, on heater(off), off;

machine memcheck(Ptr); initial_state unalloc; from on to from on to unalloc, malloc(Ptr), alloc; alloc, free(Ptr), unalloc;

final_state unalloc.

final_state Any. Figure 3. The denition of the heatermonitor and memcheck log le machines in the log le analyzer language.

is specied in LFAL as the machine heatermonitor at the left of Figure 3. In the LFAL syntax, names, states and log le lines are represented by rst order terms over keywords, strings, numbers and variables, with keywords being used as function symbols. (A variable is a sequence of alphanumeric characters and underscores beginning with an uppercase letter.) Each log le line is represented by a term in the language; for instance, the line temp 20 is represented by the term temp(20). In heatermonitor, note the use of the variable N to match the current temperature read from the log le. Similarly, the declaration final state Any has the effect of declaring any state to be a nal state, since any term can match the variable Any. At the right of Figure 3 is an LFAL specication of a class of log le machines named memcheck(Ptr), where the variable Ptr can be replaced by any term. memcheck(t), for any term t, notices lines of the form malloc t and free t. It changes its state from alloc to unalloc and back, depending on whether it has deduced that t is a pointer to a block of memory which is currently allocated or unallocated. The fact that only unalloc is allowed to be a nal state signies that at the end of a log le run, every block of memory that was allocated must have been freed. The log le analyzer specication consisting of the heatermonitor specication plus the memcheck specication accepts correct log les of the format shown on the right of Figure 1.

3. Unit Testing using Log File Analysis


In this section, we explore the use of log le analysis for unit testing. We take as an example the testing of an object class which implements a simple dictionary data structure. The class contains methods for adding, deleting and nding information associated with given keys. A key cannot appear more than once in the dictionary. There are a wide variety of approaches to implementing dictionaries. The task we face in testing such a class for correctness is always the same, however: we must conrm that no key is added twice, that no key can be deleted without having been added, and that keys can be found in the dictionary if and only if they have been added and not yet deleted.

3.1. Traditional Approaches


A traditional practical approach to testing such a class is as follows. We write a test harness program which creates an instance object of the class and calls the objects methods in sequence according to some test cases. We then run the harness on the test cases, and evaluate the output either visually, or within the harness code itself. This traditional approach suffers from some problems. Evaluating the harness output visually is tedious and errorprone. However, evaluating the output within the harness requires us to write a duplicate dictionary class for comparison purposes. The duplicate dictionary class may itself be erroneous or inefcient, or may interact with the class under test in some undesirable way. Hoffman and Strooper [13] address these problems with their ClassBench framework, in which the harness moves an object through a small, easy-to-check subset of its potential state space (e.g. full, empty, all even keys, or all odd keys), checking results at each point in the subset. Unfortunately, in general this approach interferes with the selection of test cases, and may not allow us to satisfy functional, structural or boundary coverage criteria.

2.4. The Implementation


We have developed a prototype implementation of LFAL, in the form of a translator and an auxiliary library. To use the implementation, we write an analyzer, translate it, compile the translated program into an executable oracle, and run it on a log le of interest. The oracle displays a report stating that the log le conforms to our specication, or explaining why it does not. The target programming language for the translation in the prototype is Prolog. We use a modern implementation of Prolog which can compile source into efcient executable code. A transition condition is dened in the prototype language as simply a Prolog goal; programmers may also include any Prolog program text that they like, in order to provide auxiliary denitions used in the conditions. Recall that an analyzer may consist of an innite class of log le machines. This may seem to pose implementation difculties, but for practical purposes it does not. At any one time, there will typically be only a nite number of machines in the analyzer which are in non-initial states; we need to represent internally only those machines.

3.2. A Log File Analysis Approach


An approach using log le analysis can be adapted from the traditional approach as follows. Construct the test harness program so that it logs the results of each method call. After each call, it should write a log le line of the form method key result, where method is add, delete, or find, key is the key in question, and result is either succ or fail. Run the harness on test cases (see Discussion below). Compile and run the log le analyzer in Figure 4 on the log les produced.

If any errors are found, correct them and iterate. The log le analyzer describes succinctly the situations in which a key is in or not in the dictionary, and the situations in which the various methods should or should not succeed. Almost as important, it does not need to mention the equal number of cases in which errors should be reported. The compiled analyzer will handle this itself.

code may be in separate modules. Furthermore, there may be hundreds of such requirements in the requirements document of a typical library system.

4.1. A Log File Analysis Approach


A possible approach to the problem using log le analysis is as follows. Add logging code to the system that makes the following kinds of entries in the log le. checkin Barcode, when a book with Barcode is scanned as being returned. checkout record Barcode Borrower Duedate, immediately after the book with Barcode has been returned, giving information about the last checkout of the book. charge Borrower Amount, when the Borrower has been charged Amount in overdue nes. bookstatus Barcode Status, where Status is either shelved or unshelved, whenever the status of the book changes. Run the system on test cases, and run the resulting log les through a log le analyzer containing the machine in Figure 5. This analyzer machine, book return, moves through a simple sequence of parameterized states which track the effects of the books return until both the borrower has been charged any ne accruing, and the book has been shelved. Note that the charging of the ne and the shelving of the book can be done in either order when the machine is in one of the processing states. The methodology by which these requirements were handled may be repeated many times during the course of building a complete log le analyzer for checking systemlevel requirements. As new requirements are added, however, there will typically have to be fewer additional events logged, since some of the log le entries can be re-used for checking further requirements.

3.3. Discussion
A specication for an analyzer is abstract enough to be taken as a formal specication of the behaviour of the dictionary class itself, as in algebraic approaches [25]. Thus not only can the analyzer be used for testing a specic package, it can be used as a standard by which any implementation of the class can be judged, regardless of the source language of the implementation. Note also that given a log le analyzer, we are free to explore any automated or non-automated methods for running test cases, such as randomized testing, testing to satisfy coverage criteria, or generation of test cases from another formal specication [8]. We are limited in our analysis only by what we have chosen to log. Does this approach to unit testing scale up? There seems to be no inherent barrier to testing larger and more complex units. Instances of similar techniques have been used for almost a decade in the specic domain of conformance testing for communications protocol software [27], where specications of units are large and complex. See [1] for an analyzer which can be used to test implementations of the OSI ACSE protocol.

4. System Testing using Log File Analysis


Here we consider the use of log le analysis for systemlevel testing that is, for testing software systems for properties which cannot be tested at the level of an individual module or object class. Consider a library system which processes the checking in and checking out of books and the management of charges to borrowers, among other things. An example of two system-level requirements we might want to check are the following: When a borrower returns a book, the book shall have its barcode scanned and the book shall be reshelved. When a borrower returns a book late, they shall be charged the current late charge per day times the number of days the book is late. We are unlikely to be able to check in unit testing whether the system as a whole satises these requirements; for example, the barcode scanning, late-fee charging and book-status

4.2. Discussion
We have already seen another simple example of system testing using log le analysis, in the heater monitor program with memory leak checking. There, we are checking global properties of a program which cannot necessarily be checked by unit testing alone, since memory allocation might be done by many units. One of the strengths of the log le analysis approach is its uniformity, as illustrated by the similarity in structure between the unit-level and system-level log le analyzers. Note also that when we move to system testing, it is not necessary to remove all the logging code that may remain in the

machine key(Key); initial_state not_in; from not_in, on add(Key,succ), to in; from not_in, on delete(Key,fail), to not_in; from not_in, on find(Key,fail), to not_in; from in, from in, from in, on add(Key,fail), to in; on delete(Key,succ), to not_in; on find(Key,succ), to in;

final_state Any. Figure 4. A log le analyzer for checking a dictionary object.

machine book_return(Barcode); initial_state idle; from idle, on checkin(Barcode), to returned; from returned, on checkout_record(Barcode, Borrower, Duedate), if (current_date(Today), Today > Duedate), to processing(Borrower, fine(Amount), unshelved), where ( fine_per_day_is(X), Amount is X * (Today - Duedate)); from returned, on checkout_record(Barcode, Borrower, Duedate), if (current_date(Today), Duedate >= Today), to processing(Borrower, done, unshelved); from on to from on to processing(Borrower, fine(Amount), Status), charge(Borrower, Amount), processing(Borrower, done, Status); processing(Borrower, Fine, unshelved), bookstatus(Barcode, shelved), processing(Borrower, Fine, shelved);

final_state(Borrower, done, shelved). Figure 5. A log le machine for testing two library system requirements.

assembled units from unit testing. Log le lines generated by this logging code will be interspersed with other log le lines, but the unit testing machines can be appended to the machines for the system to produce an analyzer which conrms both unit-level and system-level tests.

The logging policy may therefore be seen as the informal link between the implementation and the formal part of the methodology. Log le analysis decreases the risk of informal specications by simplifying this informal link, while still providingformal tools which can be used directly by developers.

5. Methodological Issues
Here we discuss some issues to do with the methodology of testing using log le analysis. We concentrate on three main groups of issues: those concerning logging policies, those concerning false negative and positive results, and those concerning the so-called probe effect by which logging may interfere with the software under test.

5.2. False Negatives and False Positives


Every verication method, whether formal or informal, must guard against the possibility of false negatives (programs being judged as incorrect which are actually correct) and false positives (the converse). Here we discuss these possibilities for log le analysis. There are four main reasons why a log le analyzer may not accept a log le: The analyzer specication may be incorrect. The logging policy may be incorrect. The implementation may not implement the logging policy correctly. The implementation may be functionally incorrect. Of these, the rst three are all to some degree false negatives. When a log le is rejected, users should keep in mind that any of these four situations may be the case; the correction of the error may require changes in the analyzer, the logging policy, the logging code or the non-logging code. A more serious problem is that of false positives. If a log le is accepted by its analyzer, this does not necessarily mean that the run reveals no errors. There could be complementary errors in the implementation per se, instrumentation of the implementation, logging policy, and/or analyzer specication which cancel each other out. Only a thorough system of document reviews (see last section), incorporating the analyzer specication, logging policy, and logging code, can mitigate the risk of these false positives.

5.1. Logging Policies and Instrumentation


The program verication methodology we have been describing is partly formal and partly informal. The formal part is the analyzer specication; the informal part is the policies that are followed for writing lines to the log le. While the presence of this informal part means that the methodology is not as rigorous as formal verication, we argue that the loss of rigour is not signicant in practice. Simple log le analysis done as part of a personal software process or by an individual tester does not need explicit logging policies. However, as reliance on log le analysis increases within a software development organization, logging policies should be stated more clearly and explicitly in design documents. Consider the following text from a hypothetical design document for the heater monitor: Immediately after receiving a report of the current temperature, the system should log a line of the form temp N , where N is the temperature received in degrees Celsius. The reader may agree that it would be difcult to misunderstand this text, and easy to review whether an implementation actually meets this design criterion (by inspecting the code near the message-handling code). Many software development organizations already implement some form of document review procedure; to maximize reliability, reviews of logging code should be integrated with these existing procedures. The instrumentation of code to perform logging would normally be done by hand, so logging policies should be as straightforward as possible; for instance, to log inputs and outputs, changes in key internal variables, and parameters and return values of key procedure calls. However, instrumentation errors are not as serious as in distributed debugging [17], since such errors will be caught as log le analysis failures (unless there happen to be complementary errors in analyzer specications; see Section 5.2).

5.3. Efciency and the Probe Effect


The probe effect [19] is the effect that test harnesses or monitoring software have on the behaviour of the system under test. Obviously, we want to minimize the probe effect as much as possible. In log le analysis, we have to deal with two main kinds of probe effects: those to do with efciency and those to do with recompilation. This section discusses the tradeoffs between these probe effects. It may not be feasible to let logging code remain in production software. The main reason for this is efciency: logging involves expensive write operations, typically to disk, and log les on disk may take up valuable space. The radical solution to this problem is to recompile software for

production in such a way as to compile out all logging code. However, some clients insist (correctly) that delivered production code be the exact code which was tested. This is because recompilation in a different manner may exacerbate subtle bugs, such as those due to word-alignment problems. Maker [18] has studied the use of (informal) logging in software. His work suggests some possible solutions to this paradox. Global boolean variables may be set, either at compile time or run time, which cause logging routines to do nothing, or to output the assembled message to an internal circular buffer. This would allow logging code to remain in production while cutting down on I/O inefciency. The price is some remaining compute-time inefciency. This may still not be satisfactory in all cases. However, with the continuing speed and efciency gains in processors, it may be more feasible now than in the past. The Pine mail client, for instance [2], allows users to set a debug level which, when on its highest level, produces copious output to a log le. This presumably means that many users are running software every day which frequently tests the value of an integer to decide whether to perform logging.

or system testing, and either a small number of simple properties (e.g. avoidance of memory leaks) or a large number of complex properties (e.g. system safety properties) can be checked for.

6.2. Limitations
From a formal methods point of view, the methodology fails in that it does not verify the program as a whole, but only conrms or denies that particular runs of the program reveal faults. This is the inherent failing of software testing, and is our inevitable fate as long as we keep using languages, packages and operating systems with ill-dened semantics not amenable to formal verication. Related to this objection is the fact that it is not possible to conrm or deny every program property we might want to specify. This is certainly true of temporal properties involving eventually and innitely often operators, for instance, since no log le can give more than a nite snapshot of a portion of the run of a program. Techniques such as Dillon et al.s Graphical Interval Logic [7] are more appropriate here, although their application to testing would again require rigorous languages. However, if our logging policy includes that we log each input and output, we can conrm or deny all static input/output properties of the program for individual runs. The simple state machine structure of analyzers does not appear to be a limitation. Each analyzer can be a possibly innite set of machines, each of which may have an innite number of states, theoretically allowing them even to fully simulate the programs under test. From a traditionalist point of view, the methodology has the disadvantage of requiring programmers and testers to learn a new language (LFAL or something like it) and construct more carefully the logging code they add to their programs. We plan to study LFAL and its possible alternatives with a view to maximizing its acceptability. Finally, we have no guarantee that a log le reports faithfully and efciently on the behaviour of the program. We have discussed these limitations in Sections 5.1 and 5.3.

6. Scope and Limitations


In this section, we summarize the advantages we see to the methodology of testing using log le analysis. We then explore some of its limitations, and nally summarize the scope of the methodology.

6.1. Advantages
The main advantages of the methodology that we see are the following. Formal log le analysis allows us to achieve a greater level of rigour in testing practice, without requiring a programming language with a rigorous semantics; in fact it is applicable to all programming languages capable of le input/output. It casts formal specications in the form of relatively simple state-machine-based agents (the analyzers) which process concrete computational artifacts (the log les). These analyzers are compositional (the concatenation of two analyzers performs the functions of both its components) and apply uniformly to all levels of testing. It adds to, but does not signicantly disturb, current practice in testing. It appears to be exible in the extent of its application; for instance, either a single individual or a large organization can use it, it can be used for either unit

6.3. Scope
The preceding discussion suggests that log le analysis can best be used at present in environments with some or all of the following properties. Higher reliability or greater rigour in testing are desired. Programmers are familiar with state machine concepts and pattern-matching languages such as Prolog, or are able to learn.

There is some leeway to experiment with new elements to the development methodology. Either real-time performance constraints are not heavy, or the risk of removing or compiling out logging code is deemed to be low. We can only speculate at present how often these properties obtain. We wish to explore these ideas further in a joint university/industry research project now getting under way.

a more general, formal analysis of log les, independent of source language, which informs the user only of errors detected. However, ideas from the earlier work (such as the use of timestamps for merging logging data from different processes) will be essential if we are to extend this work to distributed systems. Some of the tasks we perform here with log les can also be performed by assertion code in programs. Makers work on Nana [18] and Rosenblums on practical programming with assertions [23] are relevant here. However, note that for tasks which involve the coordination of an arbitrarily large amount of data, such as a memory leak checker, the use of assertions within program code may involve extra programming and a greater risk of probe effects. Finally, Chechik and Gannon [6] consider C programs annotated with expressions in a symbolic language related to SCR-style specications [12]. Some of the annotations are comparable to the write statements needed to produce a log le, and others put conditions on the states of variables; the result is a general technique for formal program verication. Their work requires information about source code structure obtained from such tools as the Gnu C Compiler, whereas we are concerned with techniques for analyzing log les produced by programs regardless of source language.

7. Related Work
A log le analyzer can be seen as a specication of a test oracle, or of the system under test itself. Hence it is appropriate to compare this work with work on test oracles. Several researchers [21, 22, 5, 24, 9, 20] have worked on generating test oracles from formal specications. Such specications are oriented toward describing the behaviour of the system, and can be used for many other purposes, such as model-checking or formal verication. Luckham et al. [17], in particular, deal specically with Anna, an extension of Ada which includes specication text; they describe tools to transform an Anna program into one which auto-detects errors and falls into a debugger if any are detected. Here, in contrast, we have proposed direct specication of (essentially) test oracles, in order to reduce the conceptual complexity of the task. A formal specication of a program must also handle the problem of mapping between the I/O name space of the program and the name space of the specication [20]. Here, we have proposed a language-independent methodology in which programs are instrumented to generate distinguished outputs (the log les) with a trivial name space mapping to the oracles. We also deal with the verication and tracking of internal implementation details (such as memory allocation), a task that is difcult to do using formal specications without generating log-le-like output. Customized test oracle specications have been used in some application areas, such as protocol testing. Bochmann et al. [27] report on one scheme in which ESTELLE specications are translated into Pascal oracles. This paper can be seen as a generalization and formal denition of such techniques. The use of log les in debugging of distributed software systems also has a long history. Bates and Wileden [3] deal with questions of ltering logged output and clustering lowlevel events into higher-level events. Several researchers [11, 14, 26, 16] extend such work to deal with issues such as the detection of specic error classes, the automatic instrumentation of program code, the use of debugging output to deterministically re-execute programs, and more sophisticated ltering and presentation of debugging output. Here, our focus is on user instrumentation of code, and on

8. Conclusions and Future Work

We have described some tools for the automatic, formal analysis of log les, and studied the methodology of applying them and the associated issues. The software specications which we write with these tools are relatively compact, compositional, applicable in a uniform way at both unit- and system-level testing, independent of the source language of the software under test, and directly usable in verifying test runs. We conclude that testing using formal log le analysis is a potentially useful methodology for obtaining greater software reliability in many practical settings. Because it uses both formal components (analyzers) and informal components (logging policies), its level of rigour sits somewhere between that of formal verication and development and that of traditional methods. We are embarking on a project with industrial partners to study these ideas. We rst plan to study current practices of log le use, and identify the most approachable technologies on which to build a log le analyzer language; we then plan to study log le analysis on a pilot project, in order to rene the methodology, identify tradeoffs and improve tool support. We would also like to study the use of log le analysis in wider areas, such as the testing of distributed systems.

9. Acknowledgments
Thank you to Martin Stanmore, Phil Maker, Jeff Joyce of Hughes Aircraft, Jack Morrison of Sun BOS Product Assurance, and Janette Wong of IBM Toronto for informal discussion of the use of log les in their respective organizations and projects. The ideas reported on in this paper were formulated while the author was working on the FormalWare project at the University of British Columbia, Hughes Aircraft of Canada Limited Systems Division (HCSD), and MacDonald Dettwiler (MDA). Thanks especially to Richard Yates of MDA for our many discussions concerning these ideas. Thanks also to Mike Donat, Phil Gray, Dan Hoffman, Jeff Joyce, Hanan Lutyya, Phil Maker, and Gail Murphy for helpful comments and suggestions. The FormalWare project is nancially supported by the BC Advanced Systems Institute (BCASI), HCSD, and MDA. The author is currently supported by a grant from NSERC and by his startup grant from the Faculty of Science, University of Western Ontario.

References
[1] J. H. Andrews. Theory and practice of log le analysis. Technical Report 524, Department of Computer Science, University of Western Ontario, May 1998. [2] Anonymous. Pine information center. Home page for the Pine mailer software, by the Pine Development Team. URL www.washington.edu/pine/., 1998. [3] P. Bates and J. C. Wileden. An approach to high-level debugging of distributed systems. ACM SIGPLAN Notices, 18(8):107111, August 1983. [4] B. Beizer. Software Testing Techniques. Van Nostrand Reinhold, New York, 2 edition, 1990. [5] D. B. Brown, R. F. Roggio, J. H. C. II, and C. L. McCreary. An automated oracle for software testing. IEEE Transactions on Reliability, 41(2), June 1992. [6] M. Chechik and J. Gannon. Automatic verication of requirements implementation. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA), 1994. [7] L. K. Dillon, G. Kutty, L. E. Moser, P. M. Melliar-Smith, and Y. S. Ramakrishna. A graphical interval logic for specifying concurrent systems. ACM Transactions on Software Engineering and Methodology, 3(2):131165, April 1994. [8] M. R. Donat. Automating formal specication-basedtesting. In TAPSOFT: 7th International Joint Conference on Theory and Practice of Software Engineering, April 1997. [9] R.-K. Doong and P. G. Frankl. The ASTOOT approach to testing object-oriented programs. ACM Transactions on Software Engineering and Methodology, 3(2):101130, April 1994. [10] D. Harel. Statecharts: A visual formalism for complex systems. Science of Computer Programming, 8:231274, 1987.

[11] D. Heimbold and D. Luckham. Debugging Ada tasking programs. IEEE Software, 2(2):4757, March 1985. [12] K. Heninger. Specifying software requirements for complex systems: New techniques and their applications. IEEE Transactions on Software Engineering, SE-6(1):212, January 1980. [13] D. Hoffman and P. Strooper. Classbench: A framework for automated class testing. Software Practice and Experience, 27(5):573597, May 1997. [14] J. Joyce, G. Lomow, K. Slind, and B. Unger. Monitoring distributed systems. ACM Transactions on Computer Systems, 5(2):121150, 1987. [15] C. Kaner, J. Falk, and H. Q. Nguyen. Testing Computer Software. Van Nostrand Reinhold, New York, 2 edition, 1993. [16] T. Kunz, J. Black, D. Taylor, and T. Basten. Poet: Targetsystem-independent visualizations of complex distributedapplication executions. The Computer Journal, 40(8):499 512, September 1997. [17] D. Luckham, S. Sankar, and S. Takahashi. Two-dimensional pinpointing: Debugging with formal specications. IEEE Software, 8(1):7484, January 1991. [18] P. Maker. Nana: Improved support for assertions and logging in C and C++. Technical Report 12-95, School of Information Technology, Northern Territory University, Darwin, NT, Australia, September 1995. [19] C. E. McDowell and D. P. Helmbold. Debugging concurrent programs. ACM Computing Surveys, 21(4):593622, December 1989. [20] T. O. OMalley, D. J. Richardson, and L. K. Dillon. Efcient specication-based oracles for critical systems. In Proceedings of the California Software Symposium, 1996. [21] D. K. Peters and D. L. Parnas. Using test oracles generated from program documentation. In Proceedingsof the International Symposium on Software Testing and Analysis, 1984. [22] D. J. Richardson, S. L. Aha, and T. O. OMalley. Specication-based test oracles for reactive systems. In Proceedings of the 14th International Conference on Software Engineering, Melbourne, Australia, May 1992. [23] D. S. Rosenblum. A practical approach to programming with assertions. IEEE Transactions on Software Engineering, 21(1):1931, January 1995. [24] S. Sankar, A. Goyal, and P. Sikchi. Software testing using algebraic specication based test oracles. Technical Report CSL-TR-93-566, Computer Systems Laboratory, Stanford, April 1993. [25] D. Sannella and A. Tarlecki. Toward formal development of programs from algebraic specications: Implementations revisited. Acta Informatica, 25(3):233281, 1988. [26] K.-C. Tai, R. H. Carver, and E. E. Obaid. Debugging concurrent ada programs by deterministic execution. IEEE Trans. Softw. Engineering, 17(1):4563, January 1991. [27] G. von Bochmann, R. Dssouli, and J. R. Zhao. Trace analysis for conformance and arbitration testing. IEEE Transactions on Software Engineering, 15(11), November 1989.

You might also like