Testing Using Log File Analysis: Tools, Methods, and Issues
Testing Using Log File Analysis: Tools, Methods, and Issues
James H. Andrews Dept. of Computer Science University of Western Ontario London, Ontario, Canada N6A 5B7 [email protected]
Abstract
Large software systems often keep log les of events. Such log les can be analyzed to check whether a run of a program reveals faults in the system. We discuss how such log les can be used in software testing. We present a framework for automatically analyzing log les, and describe a language for specifying analyzer programs and an implementation of that language. The language permits compositional, compact specications of software, which act as test oracles; we discuss the use and efcacy of these oracles for unit- and system-level testing in various settings. We explore methodological issues such as efciency and logging policies, and the scope and limitations of the framework. We conclude that testing using log le analysis constitutes a useful methodology for software verication, somewhere between current testing practice and formal verication methodologies.
1. Introduction
It is clear that many aspects of the desired behaviour of software can be given formal specications. What is not so clear is how those specications can be connected to actual programs in an automated way for practical software engineering purposes. Most formal verication or development methods assume languages with well-dened semantics, or consider only well-dened subsets of languages. However, most programs today are still developed using languages without well-dened semantics. In such languages, we cannot formally and reliably predict observables (e.g. program outputs) from program code; the only observables we can get hold of are those obtained by individual runs of the program itself. Yet even these observables are valuable. After all, programmers have always depended on them in order to test
and debug programs, following established, practical testing techniques [4, 15]. Of particular value is a kind of output le generally referred to as a debug le or log le. Such a le records events such as inputs and outputs, values of variables, and parameters and returns of function calls, sequentially as the program is running. Programmers inspect log les to identify and diagnose problems in code. We may refer to this practice as (informal) log le analysis. A way that formal methods can help in program testing using log le analysis naturally suggests itself. First, specify what the program should log to the log le and how. Next, or concurrently, formally specify the format of an acceptable log le. Run the program repeatedly on different inputs; see whether the log les meet the specication by doing a formal log le analysis. In cases in which they do not, correct the error and re-run. This paper explores some tools and methods for doing this testing using log le analysis, and the issues arising from doing it. Section 2 gives a summary of a formal framework for log le analysis. It denes the notion of a log le analyzer using state machine concepts, and discusses a prototype implementation. Sections 3 and 4 study by examples the use of this framework in unit testing and system testing, respectively. Section 5 discusses some issues to do with development methodology, such as the avoidance of false negatives and positives, and how and when logging should be turned off. Section 6 explores the advantages, scope and limitations of the framework. Section 7 discusses related work, and Section 8 gives conclusions and prospects for future work.
allel state machines making transitions based on lines from the log le, and we dene a log le analyzer informally in this way. We then describe a textual language for specifying log le analyzers, and a prototype implementation of the language.
An identifying name; A (possibly innite) set of machine states, of which one is dened as the initial state, and a subset of which is dened as the nal states; A set of log les lines which the machine notices; and
temp 21 temp 20 temp 19 heater on temp 19 temp 20 temp 21 heater off temp 21
temp 21 malloc 2096 temp 19 malloc 2088 malloc 1016 heater on temp 19 temp 21 free 2088 heater off temp 21 Figure 1. Left: a simple log le. Right: a more complex log le.
heatermonitor: temp N [N>=20] off heater off should _be_off temp N [N<20] should _be_on heater on on temp N [N>=20] temp N [N<20]
machine heatermonitor; initial_state off; from if from if from to from if from if from to off, on temp(N), (N >= 20), to off; off, on temp(N), (N < 20), to should_be_on; should_be_on, on heater(on), on; on, on temp(N), (N < 20), to on; on, on temp(N), (N >= 20), to should_be_off; should_be_off, on heater(off), off;
machine memcheck(Ptr); initial_state unalloc; from on to from on to unalloc, malloc(Ptr), alloc; alloc, free(Ptr), unalloc;
final_state unalloc.
final_state Any. Figure 3. The denition of the heatermonitor and memcheck log le machines in the log le analyzer language.
is specied in LFAL as the machine heatermonitor at the left of Figure 3. In the LFAL syntax, names, states and log le lines are represented by rst order terms over keywords, strings, numbers and variables, with keywords being used as function symbols. (A variable is a sequence of alphanumeric characters and underscores beginning with an uppercase letter.) Each log le line is represented by a term in the language; for instance, the line temp 20 is represented by the term temp(20). In heatermonitor, note the use of the variable N to match the current temperature read from the log le. Similarly, the declaration final state Any has the effect of declaring any state to be a nal state, since any term can match the variable Any. At the right of Figure 3 is an LFAL specication of a class of log le machines named memcheck(Ptr), where the variable Ptr can be replaced by any term. memcheck(t), for any term t, notices lines of the form malloc t and free t. It changes its state from alloc to unalloc and back, depending on whether it has deduced that t is a pointer to a block of memory which is currently allocated or unallocated. The fact that only unalloc is allowed to be a nal state signies that at the end of a log le run, every block of memory that was allocated must have been freed. The log le analyzer specication consisting of the heatermonitor specication plus the memcheck specication accepts correct log les of the format shown on the right of Figure 1.
If any errors are found, correct them and iterate. The log le analyzer describes succinctly the situations in which a key is in or not in the dictionary, and the situations in which the various methods should or should not succeed. Almost as important, it does not need to mention the equal number of cases in which errors should be reported. The compiled analyzer will handle this itself.
code may be in separate modules. Furthermore, there may be hundreds of such requirements in the requirements document of a typical library system.
3.3. Discussion
A specication for an analyzer is abstract enough to be taken as a formal specication of the behaviour of the dictionary class itself, as in algebraic approaches [25]. Thus not only can the analyzer be used for testing a specic package, it can be used as a standard by which any implementation of the class can be judged, regardless of the source language of the implementation. Note also that given a log le analyzer, we are free to explore any automated or non-automated methods for running test cases, such as randomized testing, testing to satisfy coverage criteria, or generation of test cases from another formal specication [8]. We are limited in our analysis only by what we have chosen to log. Does this approach to unit testing scale up? There seems to be no inherent barrier to testing larger and more complex units. Instances of similar techniques have been used for almost a decade in the specic domain of conformance testing for communications protocol software [27], where specications of units are large and complex. See [1] for an analyzer which can be used to test implementations of the OSI ACSE protocol.
4.2. Discussion
We have already seen another simple example of system testing using log le analysis, in the heater monitor program with memory leak checking. There, we are checking global properties of a program which cannot necessarily be checked by unit testing alone, since memory allocation might be done by many units. One of the strengths of the log le analysis approach is its uniformity, as illustrated by the similarity in structure between the unit-level and system-level log le analyzers. Note also that when we move to system testing, it is not necessary to remove all the logging code that may remain in the
machine key(Key); initial_state not_in; from not_in, on add(Key,succ), to in; from not_in, on delete(Key,fail), to not_in; from not_in, on find(Key,fail), to not_in; from in, from in, from in, on add(Key,fail), to in; on delete(Key,succ), to not_in; on find(Key,succ), to in;
machine book_return(Barcode); initial_state idle; from idle, on checkin(Barcode), to returned; from returned, on checkout_record(Barcode, Borrower, Duedate), if (current_date(Today), Today > Duedate), to processing(Borrower, fine(Amount), unshelved), where ( fine_per_day_is(X), Amount is X * (Today - Duedate)); from returned, on checkout_record(Barcode, Borrower, Duedate), if (current_date(Today), Duedate >= Today), to processing(Borrower, done, unshelved); from on to from on to processing(Borrower, fine(Amount), Status), charge(Borrower, Amount), processing(Borrower, done, Status); processing(Borrower, Fine, unshelved), bookstatus(Barcode, shelved), processing(Borrower, Fine, shelved);
final_state(Borrower, done, shelved). Figure 5. A log le machine for testing two library system requirements.
assembled units from unit testing. Log le lines generated by this logging code will be interspersed with other log le lines, but the unit testing machines can be appended to the machines for the system to produce an analyzer which conrms both unit-level and system-level tests.
The logging policy may therefore be seen as the informal link between the implementation and the formal part of the methodology. Log le analysis decreases the risk of informal specications by simplifying this informal link, while still providingformal tools which can be used directly by developers.
5. Methodological Issues
Here we discuss some issues to do with the methodology of testing using log le analysis. We concentrate on three main groups of issues: those concerning logging policies, those concerning false negative and positive results, and those concerning the so-called probe effect by which logging may interfere with the software under test.
production in such a way as to compile out all logging code. However, some clients insist (correctly) that delivered production code be the exact code which was tested. This is because recompilation in a different manner may exacerbate subtle bugs, such as those due to word-alignment problems. Maker [18] has studied the use of (informal) logging in software. His work suggests some possible solutions to this paradox. Global boolean variables may be set, either at compile time or run time, which cause logging routines to do nothing, or to output the assembled message to an internal circular buffer. This would allow logging code to remain in production while cutting down on I/O inefciency. The price is some remaining compute-time inefciency. This may still not be satisfactory in all cases. However, with the continuing speed and efciency gains in processors, it may be more feasible now than in the past. The Pine mail client, for instance [2], allows users to set a debug level which, when on its highest level, produces copious output to a log le. This presumably means that many users are running software every day which frequently tests the value of an integer to decide whether to perform logging.
or system testing, and either a small number of simple properties (e.g. avoidance of memory leaks) or a large number of complex properties (e.g. system safety properties) can be checked for.
6.2. Limitations
From a formal methods point of view, the methodology fails in that it does not verify the program as a whole, but only conrms or denies that particular runs of the program reveal faults. This is the inherent failing of software testing, and is our inevitable fate as long as we keep using languages, packages and operating systems with ill-dened semantics not amenable to formal verication. Related to this objection is the fact that it is not possible to conrm or deny every program property we might want to specify. This is certainly true of temporal properties involving eventually and innitely often operators, for instance, since no log le can give more than a nite snapshot of a portion of the run of a program. Techniques such as Dillon et al.s Graphical Interval Logic [7] are more appropriate here, although their application to testing would again require rigorous languages. However, if our logging policy includes that we log each input and output, we can conrm or deny all static input/output properties of the program for individual runs. The simple state machine structure of analyzers does not appear to be a limitation. Each analyzer can be a possibly innite set of machines, each of which may have an innite number of states, theoretically allowing them even to fully simulate the programs under test. From a traditionalist point of view, the methodology has the disadvantage of requiring programmers and testers to learn a new language (LFAL or something like it) and construct more carefully the logging code they add to their programs. We plan to study LFAL and its possible alternatives with a view to maximizing its acceptability. Finally, we have no guarantee that a log le reports faithfully and efciently on the behaviour of the program. We have discussed these limitations in Sections 5.1 and 5.3.
6.1. Advantages
The main advantages of the methodology that we see are the following. Formal log le analysis allows us to achieve a greater level of rigour in testing practice, without requiring a programming language with a rigorous semantics; in fact it is applicable to all programming languages capable of le input/output. It casts formal specications in the form of relatively simple state-machine-based agents (the analyzers) which process concrete computational artifacts (the log les). These analyzers are compositional (the concatenation of two analyzers performs the functions of both its components) and apply uniformly to all levels of testing. It adds to, but does not signicantly disturb, current practice in testing. It appears to be exible in the extent of its application; for instance, either a single individual or a large organization can use it, it can be used for either unit
6.3. Scope
The preceding discussion suggests that log le analysis can best be used at present in environments with some or all of the following properties. Higher reliability or greater rigour in testing are desired. Programmers are familiar with state machine concepts and pattern-matching languages such as Prolog, or are able to learn.
There is some leeway to experiment with new elements to the development methodology. Either real-time performance constraints are not heavy, or the risk of removing or compiling out logging code is deemed to be low. We can only speculate at present how often these properties obtain. We wish to explore these ideas further in a joint university/industry research project now getting under way.
a more general, formal analysis of log les, independent of source language, which informs the user only of errors detected. However, ideas from the earlier work (such as the use of timestamps for merging logging data from different processes) will be essential if we are to extend this work to distributed systems. Some of the tasks we perform here with log les can also be performed by assertion code in programs. Makers work on Nana [18] and Rosenblums on practical programming with assertions [23] are relevant here. However, note that for tasks which involve the coordination of an arbitrarily large amount of data, such as a memory leak checker, the use of assertions within program code may involve extra programming and a greater risk of probe effects. Finally, Chechik and Gannon [6] consider C programs annotated with expressions in a symbolic language related to SCR-style specications [12]. Some of the annotations are comparable to the write statements needed to produce a log le, and others put conditions on the states of variables; the result is a general technique for formal program verication. Their work requires information about source code structure obtained from such tools as the Gnu C Compiler, whereas we are concerned with techniques for analyzing log les produced by programs regardless of source language.
7. Related Work
A log le analyzer can be seen as a specication of a test oracle, or of the system under test itself. Hence it is appropriate to compare this work with work on test oracles. Several researchers [21, 22, 5, 24, 9, 20] have worked on generating test oracles from formal specications. Such specications are oriented toward describing the behaviour of the system, and can be used for many other purposes, such as model-checking or formal verication. Luckham et al. [17], in particular, deal specically with Anna, an extension of Ada which includes specication text; they describe tools to transform an Anna program into one which auto-detects errors and falls into a debugger if any are detected. Here, in contrast, we have proposed direct specication of (essentially) test oracles, in order to reduce the conceptual complexity of the task. A formal specication of a program must also handle the problem of mapping between the I/O name space of the program and the name space of the specication [20]. Here, we have proposed a language-independent methodology in which programs are instrumented to generate distinguished outputs (the log les) with a trivial name space mapping to the oracles. We also deal with the verication and tracking of internal implementation details (such as memory allocation), a task that is difcult to do using formal specications without generating log-le-like output. Customized test oracle specications have been used in some application areas, such as protocol testing. Bochmann et al. [27] report on one scheme in which ESTELLE specications are translated into Pascal oracles. This paper can be seen as a generalization and formal denition of such techniques. The use of log les in debugging of distributed software systems also has a long history. Bates and Wileden [3] deal with questions of ltering logged output and clustering lowlevel events into higher-level events. Several researchers [11, 14, 26, 16] extend such work to deal with issues such as the detection of specic error classes, the automatic instrumentation of program code, the use of debugging output to deterministically re-execute programs, and more sophisticated ltering and presentation of debugging output. Here, our focus is on user instrumentation of code, and on
We have described some tools for the automatic, formal analysis of log les, and studied the methodology of applying them and the associated issues. The software specications which we write with these tools are relatively compact, compositional, applicable in a uniform way at both unit- and system-level testing, independent of the source language of the software under test, and directly usable in verifying test runs. We conclude that testing using formal log le analysis is a potentially useful methodology for obtaining greater software reliability in many practical settings. Because it uses both formal components (analyzers) and informal components (logging policies), its level of rigour sits somewhere between that of formal verication and development and that of traditional methods. We are embarking on a project with industrial partners to study these ideas. We rst plan to study current practices of log le use, and identify the most approachable technologies on which to build a log le analyzer language; we then plan to study log le analysis on a pilot project, in order to rene the methodology, identify tradeoffs and improve tool support. We would also like to study the use of log le analysis in wider areas, such as the testing of distributed systems.
9. Acknowledgments
Thank you to Martin Stanmore, Phil Maker, Jeff Joyce of Hughes Aircraft, Jack Morrison of Sun BOS Product Assurance, and Janette Wong of IBM Toronto for informal discussion of the use of log les in their respective organizations and projects. The ideas reported on in this paper were formulated while the author was working on the FormalWare project at the University of British Columbia, Hughes Aircraft of Canada Limited Systems Division (HCSD), and MacDonald Dettwiler (MDA). Thanks especially to Richard Yates of MDA for our many discussions concerning these ideas. Thanks also to Mike Donat, Phil Gray, Dan Hoffman, Jeff Joyce, Hanan Lutyya, Phil Maker, and Gail Murphy for helpful comments and suggestions. The FormalWare project is nancially supported by the BC Advanced Systems Institute (BCASI), HCSD, and MDA. The author is currently supported by a grant from NSERC and by his startup grant from the Faculty of Science, University of Western Ontario.
References
[1] J. H. Andrews. Theory and practice of log le analysis. Technical Report 524, Department of Computer Science, University of Western Ontario, May 1998. [2] Anonymous. Pine information center. Home page for the Pine mailer software, by the Pine Development Team. URL www.washington.edu/pine/., 1998. [3] P. Bates and J. C. Wileden. An approach to high-level debugging of distributed systems. ACM SIGPLAN Notices, 18(8):107111, August 1983. [4] B. Beizer. Software Testing Techniques. Van Nostrand Reinhold, New York, 2 edition, 1990. [5] D. B. Brown, R. F. Roggio, J. H. C. II, and C. L. McCreary. An automated oracle for software testing. IEEE Transactions on Reliability, 41(2), June 1992. [6] M. Chechik and J. Gannon. Automatic verication of requirements implementation. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA), 1994. [7] L. K. Dillon, G. Kutty, L. E. Moser, P. M. Melliar-Smith, and Y. S. Ramakrishna. A graphical interval logic for specifying concurrent systems. ACM Transactions on Software Engineering and Methodology, 3(2):131165, April 1994. [8] M. R. Donat. Automating formal specication-basedtesting. In TAPSOFT: 7th International Joint Conference on Theory and Practice of Software Engineering, April 1997. [9] R.-K. Doong and P. G. Frankl. The ASTOOT approach to testing object-oriented programs. ACM Transactions on Software Engineering and Methodology, 3(2):101130, April 1994. [10] D. Harel. Statecharts: A visual formalism for complex systems. Science of Computer Programming, 8:231274, 1987.
[11] D. Heimbold and D. Luckham. Debugging Ada tasking programs. IEEE Software, 2(2):4757, March 1985. [12] K. Heninger. Specifying software requirements for complex systems: New techniques and their applications. IEEE Transactions on Software Engineering, SE-6(1):212, January 1980. [13] D. Hoffman and P. Strooper. Classbench: A framework for automated class testing. Software Practice and Experience, 27(5):573597, May 1997. [14] J. Joyce, G. Lomow, K. Slind, and B. Unger. Monitoring distributed systems. ACM Transactions on Computer Systems, 5(2):121150, 1987. [15] C. Kaner, J. Falk, and H. Q. Nguyen. Testing Computer Software. Van Nostrand Reinhold, New York, 2 edition, 1993. [16] T. Kunz, J. Black, D. Taylor, and T. Basten. Poet: Targetsystem-independent visualizations of complex distributedapplication executions. The Computer Journal, 40(8):499 512, September 1997. [17] D. Luckham, S. Sankar, and S. Takahashi. Two-dimensional pinpointing: Debugging with formal specications. IEEE Software, 8(1):7484, January 1991. [18] P. Maker. Nana: Improved support for assertions and logging in C and C++. Technical Report 12-95, School of Information Technology, Northern Territory University, Darwin, NT, Australia, September 1995. [19] C. E. McDowell and D. P. Helmbold. Debugging concurrent programs. ACM Computing Surveys, 21(4):593622, December 1989. [20] T. O. OMalley, D. J. Richardson, and L. K. Dillon. Efcient specication-based oracles for critical systems. In Proceedings of the California Software Symposium, 1996. [21] D. K. Peters and D. L. Parnas. Using test oracles generated from program documentation. In Proceedingsof the International Symposium on Software Testing and Analysis, 1984. [22] D. J. Richardson, S. L. Aha, and T. O. OMalley. Specication-based test oracles for reactive systems. In Proceedings of the 14th International Conference on Software Engineering, Melbourne, Australia, May 1992. [23] D. S. Rosenblum. A practical approach to programming with assertions. IEEE Transactions on Software Engineering, 21(1):1931, January 1995. [24] S. Sankar, A. Goyal, and P. Sikchi. Software testing using algebraic specication based test oracles. Technical Report CSL-TR-93-566, Computer Systems Laboratory, Stanford, April 1993. [25] D. Sannella and A. Tarlecki. Toward formal development of programs from algebraic specications: Implementations revisited. Acta Informatica, 25(3):233281, 1988. [26] K.-C. Tai, R. H. Carver, and E. E. Obaid. Debugging concurrent ada programs by deterministic execution. IEEE Trans. Softw. Engineering, 17(1):4563, January 1991. [27] G. von Bochmann, R. Dssouli, and J. R. Zhao. Trace analysis for conformance and arbitration testing. IEEE Transactions on Software Engineering, 15(11), November 1989.