Advanced Unit Testing, Part IV - Fixture Setup - Teardown, Test Repetition and Performance Tests
Advanced Unit Testing, Part IV - Fixture Setup - Teardown, Test Repetition and Performance Tests
Page 1
» Languages » C# » General
Licence
Advanced Unit Testing, Part IV - Fixture First Posted 8 Oct 2003
Views 112,325
Setup/Teardown, Test Repetition And Bookmarked
151 times
This article extends the unit testing framework, adding fixture setup/teardown capability and performance (time and
memory) measurement/testing.
Contents
Introduction
Drawbacks
General Issues
Just In Time Compiler / Assemblies
Other Execution Time Problems
Garbage Collection
Collections Such As ArrayList And Hashtable
Extending MUTE
Step 1: Define An Attribute
Step 2: Define The Attribute Functionality
Accessing Attribute Parameters
Step 3: Implement The Runner, Class, And Method Extensions
Runner Extensions
Class Extensions
Method Extensions
Fixture Setup And Tear Down
Why Would You Use A Fixture Setup?
A Simple Case Study
Why Not Just Use The Test Fixture's Class Constructor?
An Example: Measuring Processing Time
Processing Time
Testing Processing Time
Test Repetition
What Other Uses Are There?
Memory Utilization
New/Delete vs. Garbage Collection
Unit Testing As Documentation
Manual Cleanup
Directed Cleanup
Automatic Cleanup
Memory Testing
Conclusion
Part I
Part II
Part III
Introduction
Part IV of this series introduces the final set of extensions to the basic unit testing application. These extensions are:
I've worked on a lot of applications that interface with hardware and other applications that require optimizing
analysis algorithms such as network tracing and real time image processing. I think this has given me a different
perspective with regards to unit testing that you won't find in the mainstream discussions. There's certainly an
argument to be made as to whether functionality like this should even be part of a unit test attribute suite, instead
implemented as assertions in the test code. My argument for including this functionality as part of unit test
attributes is the following:
Drawbacks
There are several drawbacks with execution time and memory testing.
General Issues
The execution and memory tests are implemented around the delegate call to the unit test (utd):
startMem=GC.GetTotalMemory(true);
startTime=HiResTimer.Ticks;
try
{
utd();
}
catch(Exception e)
{
throw(e);
}
finally
{
stopTime=HiResTimer.Ticks;
endMem=GC.GetTotalMemory(true);
executionTime=(stopTime==startTime ? 1 : stopTime-startTime);
}
This means that what's really being measured includes not only the function under test, but the wrapper that calls
that function. Obviously, this has unintentional side-effects, especially when the unit test itself executes memory
and/or time consuming code not part of the actual code under test. The best way to handle this situation is to
implement a two stage test sequence--the first stage does the setup and the second stage invokes the method(s)
under test.
Code that activates the JIT compiler (for example, with generics, from what I've read, the JIT compiler will replace
the generic IL with the specific type and compile to native code the first time the generic time is constructed) results
in a first time performance hit. This is true for all Microsoft intermediate language (MSIL) code--it gets translated to
native processor instructions by the JIT compiler the first time the code is loaded. You can see this performance
difference using MUTE--the first time you run the tests, the performance is notably slower than the subsequent runs.
Obviously, issues such as other worker threads, other applications and services, network performance, server
performance, etc., all affect execution time. Most of the execution time issues are addressed by the ability to specify
a test repetition count. The test runner throws out the best and worst time samples and reports the average time.
Garbage Collection
Environments that implement garbage collection (GC) make it nearly impossible to accurately track the memory
allocated by a function within a thread. Calling the GC.Collect() method or other functions also does not
guarantee a correct value because the garbage collection runs on an internal CLR thread. You can see this
happening in MUTE. If you change the code to call GC.Collect() before the delegate call:
...
GC.Collect();
startMem=GC.GetTotalMemory(true);
startTime=HiResTimer.Ticks;
try
{
utd();
}
...
the performance of the tests degrade considerably and relative to the number of allocations that have been made.
As the size of a collection grows, space to maintain the list's elements is increased (it's capacity, in other words).
When the Clear() function is used, the objects contained by the list are de-referenced (let's assume for the sake of
argument that nothing else is referencing the objects in the list) and the GC can reclaim them. However, the
internal buffers used by the collection are not reclaimed. In the case of the ArrayList collection, you can
manually reclaim the buffer space using the Capacity property, and setting it to zero which sets the internal
buffers to the default array size, which is 16. The Hashtable collection (and any collection implementing
IDictionary) does not have a corresponding function.
For example:
Vendor vendor;
[Test]
public void AllocationTest()
{
vendor=new Vendor();
for (int i=0; i<100000; i++)
{
Part p=new Part();
p.Number=i.ToString();
vendor.Add(p);
}
vendor.Clear();
}
}
the above code allocates about 10MB of memory. When vendor.Clear is called, there still remains about 2.7MB
of allocated memory! The Vendor class maintains both an ArrayList and a Hashtable (sort of like the way a
SortedList works). When the ArrayList Capacity property is reset: parts.Capacity=0, the allocated
memory is further reduced to 2.2MB. Unfortunately, there is no way to reclaim the buffers used by the Hashtable.
Personally, I think this points to a problem with the way collections are implemented in the .NET framework. It
should be possible to reclaim the buffers. Let's say that the next list of parts that the vendor object manages
contains 10 parts (perhaps because the part list has been filtered). If the first list contained 100,000 parts, there's
2MB being wasted on maintaining a collection of 10 parts. Now, you all say "woohoo" because you've got 1G of RAM
on your system. Well, I come from the days when memory was expensive, both in physical dollars and in usage.
One of the reasons we have so much bloat in our applications is because of sloppy implementations like
Hashtable. Time to write my own collection classes, I say.
Extending MUTE
Extending the unit test framework is very simple, as demonstrated in this section.
Define the new attribute in the UnitTest assembly, Attributes.cs file. If the attribute (let's say it is the "CodeProject"
attribute) is a class attribute, define it as follows:
Conversely, if it is an attribute associated with a method (let's say it is the "Bob" attribute), it is defined as follows:
In both cases, the above example assumes that only one instance of the attribute is associated with any given class
or method. This is obviously the case for attributes that don't have any parameters. If your attribute takes
parameters, then you may want to set AllowMultiple to true, as it the case with the Requires attribute. This
attribute also demonstrates managing attribute parameters:
{
priorTestMethod=methodName;
}
}
Define what the attribute does in the UTCore assembly, TestUnitAttribute.cs file. All attributes are derived from the
TestUnitAttribute class. (Yes, this should be refactored, splitting the implementation into a base class and
some interfaces, I think). For example, the "CodeProject" class attribute would be created as:
}
}
The SelfRegister method provides the attribute with the means to set the state in one or more of:
Obviously, if the attribute is associated with a class, then the method item is not valid. The following two rules apply
(and also indicate where some refactoring would make things a bit easier to use):
1. Since there's a one-to-one correlation between a test fixture and a class, I usually put class attribute options
in the TestFixture object
2. Since there's a many-to-one correlation between method attributes and the method, I put method attribute
options in the MethodItem object.
This is a cheap and dirty way of handling new attributes, and should really be refactored so that there's more of a
messaging mechanism used. The class and method attributes could then be independently managed, and the
messaging could be used to provide custom extensions without changing the core fixture and method classes. Any
takers?
The TestUnitAttribute class already has the attribute object initialized before the framework calls
SelfRegister. To access the attribute, cast it to the appropriate UnitTest assembly attribute and extract the
desired information. For example:
As I just said, this is a cheap and dirty way of doing things. But hey, isn't that the XP approach?
Runner Extensions
This step is only necessary if you want to change the way in which the tests inside the fixture are run. For example,
in the previous article, I discussed running tests in order as part of a test process. Typically though, tests are run in
an unpredictable order, although consistent. A runner extension might truly randomize the test order. Other
extensions might support multithreaded testing, in which several test fixtures are run simultaneously in order to test
semaphores, mutexes, etc. Anyways...
In TestFixture.cs, there is a call to the test runner factory, which creates the appropriate test runner depending on
the fixture (class) attributes:
Modify the CreateTestFixtureRunner factory if necessary. The current implementation supports running a process (a
sequence of tests) and running tests independently of each other. This is a bare-bones implementation:
All custom test runners must be derived from TestFixtureRunner and implement two functions (hmmm, do you
smell an interface here instead???):
The TestFixtureRunner class implements the RunTest method, which should always be used to run the the
actual unit test. It requires an instance to the class containing the unit test, constructed by calling:
object instance=tf.SetUpClass();
and the TestAttribute of the method under test. This is the [Test] attribute associated with the method,
regardless of any other attributes that may also be associated with the method.
Iterating through all the tests in the test fixture is straightforward, and at minimum:
Class Extensions
Adding attributes that extend a class (and therefore the test fixture) can be added in the TestFixture.cs file, to the
SetUpClass and TearDownClass methods:
Currently, these simply instantiate the class and invoke the fixture set up and tear down methods, if defined. Again,
this code should be refactored to use a messaging or event mechanism to allow for easy extension of the fixture
attributes.
Method Extensions
Additional functionality specified by method attributes are either handled in the MethodItem.cs file or as part of a
new test runner. If you're extending the method invocation directly, this would be done in the Invoke method.
Note however that attributes that test for a certain condition, such as memory usage, processing time, handles
used, etc., are actually implemented as part of the RunTest method found in the TestFixtureRunner.cs file. Tests
should set the method's TestAttribute state and result message so that the GUI can properly display the
results:
ta.State=TestAttribute.TestState.Fail;
ta.Result="... your message ...";
The test fixture contains a suite of tests, each of which operates on a common set of data;
The set of data is very large and might takes a lot of time to load for each test;
When interfacing to hardware, there will be, most likely, a setup and teardown process that is independent of
tests being performed;
Any time a common setup and/or teardown is required that is independent of the test functions;
Starting a separate process required to support the tests
An application that I've developed for one of my clients involves interfacing to different hardware modules using TCP/
IP. There are usually 30 to 60 of these modules sitting on the network, each configured to do different things--
handle bill acceptors, unlock turnstiles and doors, report alarms, provide punch-clock services, report on system
status, etc. Instead of having all this hardware laying around at home, I have a simulator that I wrote that runs as a
separate application, either locally or on a separate computer. The unit tests that verify the packet I/O between the
application and the modules requires starting up the simulator and shutting it down when the tests are complete.
This is easily handled in the test fixture setup and teardown functions, and saves a lot of time as compared to doing
this for each test in the fixture.
There is some merit to this, but there are several problems with this approach:
It breaks the model of using attributes to designate special code to be run at certain times by the test runner
It breaks the symmetry with regards to the test setup and tear down functions
In C#, there is no corresponding destructor, so it isn't possible to have the test fixture tear down in the
destructor
Performance measurements illustrates the usefulness of this feature. For this example, I'll be extending the case
study I've developed in the previous articles.
[FixtureSetUp]
public void TestFixtureSetup()
{
vendor=new Vendor();
for (int i=0; i<100000; i++)
{
Part p=new Part();
p.Number=i.ToString();
vendor.Add(p);
}
}
The above function creates 100,000 parts and associates them with a vendor. The remainder of the test fixture
measures the performance of:
[Test]
public void RandomAccessTestByPart()
{
int n=rnd.Next(0, 100000);
Part p=new Part();
p.Number=n.ToString();
bool found=vendor.Contains(p);
Assertion.Assert(found==true, "Expected to find the part!");
}
[Test]
public void RandomAccessTestByIndex()
{
int n=rnd.Next(0, 100000);
Part p=vendor.Parts[n];
Assertion.Assert(p.Number==n.ToString(),
"Parts not in the same order as when added!");
}
[Test]
public void RandomAccessTestByNumber()
{
int n=rnd.Next(0, 100000);
bool found=vendor.Contains(n.ToString());
Assertion.Assert(found==true, "Expected to find the part!");
}
The example above illustrates that a fixture setup and tear down capability has its uses also, in addition to the test
setup and tear down capability.
Processing Time
Measuring the processing time of a function is not straightforward. First off, you can't use the
DateTime.Now.Ticks property because it doesn't have the necessary resolution. While
TimeSpan.TicksPerSecond reports an interval of 100ns, this is not the resolution of the
DateTime.Now.Ticks property. A simple test illustrates this fact. Using the class:
// sync
// wait
The result is that q=156250, giving a resolution of 15.625 milliseconds. Instead, the QueryPerformanceCounter
and QueryPerformanceFrequency functions have to be used instead:
[DllImport("Kernel32.dll")]
private static extern bool
QueryPerformanceFrequency(out long lpFrequency);
This results in an resolution of about 569ns (at least, on my computer). Much better!
Validating the processing time is dubious because processing time varies so much depending on the machine, what
it's doing and other technologies with which the unit tests are interfacing. However, this does not mean that testing
the processing time of a function is without merit when used appropriately. Several appropriate applications come
to mind, such as:
I have dealt with unit testing in each of these cases--a network analysis application for satellite switch rings, bit rate
degradation resulting from rain fade in an Internet over satellite simulator, and real time updating of status
information to a database. While the performance of an algorithm varies from machine to machine, having a
minimum "operations per second" criteria is very useful, especially when tweaking some low-level code that ends up
having major repercussions in the performance of an algorithm.
The MinOperationsPerSecond attribute can be applied to any unit test to validate performance. For example:
[Test]
[MinOperationsPerSecond(150000)]
public void RandomAccessTestByPart()
{
int n=rnd.Next(0, 100000);
Part p=new Part();
p.Number=n.ToString();
bool found=vendor.Contains(p);
Assertion.Assert(found==true, "Expected to find the part!");
}
The above unit test verifies that a random access test can be performed at a rate of at least 150,000 operations per
second.
Test Repetition
Performance testing can definitely benefit from repeat testing to average out the vagaries of measuring time in a
multi-tasking operating system. The Repeat attribute informs the test runner that a test should be repeated by
the specified count, optionally with a delay between each repetition. For example:
[Test]
[MinOperationsPerSecond(150000)]
[Repeat(100)]
public void RandomAccessTestByNumber()
{
int n=rnd.Next(0, 100000);
bool found=vendor.Contains(n.ToString());
Assertion.Assert(found==true, "Expected to find the part!");
}
The above code will run 100 times. From the test results:
it's pretty clear that the implementation is has a severe problem (in this case, I implemented a really dumb function
that walks through each element in the collection of parts until a match is found).
As I mentioned in the introduction, I do a lot of work with hardware, and there's simply no other way to test
hardware than to repeat something over and over again. More times than I'd like to remember, I've had problems
in my code because one out of every thousand times, there would be a hardware glitch that reported erroneous
values. Other uses abound--there's nothing like physically unplugging the network cable or pulling the power plug
on the server to see how your software on the client side handles the fault. Monitoring network loading is another
application which requires repetition. The uses abound if one stops thinking in terms of rigid test-once analysis.
Memory Utilization
As I discussed in the introduction, memory allocation is pretty much impossible to track in a garbage collecting
environment. A GC environment also creates a dilemma when monitoring memory, and a little analysis of the
problem is helpful at this point so we can select the appropriate solution.
In a classical memory management scheme, where the programmer is required to free allocations, memory has only
two states:
allocated
unallocated
In system that use garbage collection, the programmer doesn't need to free allocations. Memory still has two
states:
referenced
unreferenced
but these states are not the same as the allocated/unallocated states. In terms of physical memory, a GC system
has three states:
allocated (referenced)
allocated (unreferenced)
unallocated (unreferenced)
It is the allocated but unreferenced state that causes so much confusion when determining how much memory is "in
use" at any given time. This memory is allocated but awaiting to be reclaimed by the GC. Does this memory count
toward the unallocated total or does it count toward the allocated total? Depending on what the intent of monitoring
memory is, the answer is different. Is a memory test supposed to check that:
The problem with attempting to get a true count of the allocated memory in a GC system is that the test, by its very
nature, interferes with the very thing we're trying to test! Like Schroedinger's cat, neither alive nor dead until we
open the box and look, allocated but unreferenced memory is in this quasi-state of being neither allocated nor
unallocated. Once we call GC.GetTotalMemory(true); any unreferenced memory is (ideally) reclaimed and we
have a true (again ideally) count of the available memory (so, I guess the cat is always dead after we open the box)
. Therefore, in an ideal world, this code:
...
startMem=GC.GetTotalMemory(true);
startTime=HiResTimer.Ticks;
try
{
utd();
}
catch(Exception e)
{
throw(e);
}
finally
{
stopTime=HiResTimer.Ticks;
endMem=GC.GetTotalMemory(true);
...
would measure how much memory still remains referenced after the utd() delegate call. However, this doesn't tell
us anything about the memory utilization while the function was running, in terms of the amount of memory that it
allocated, referenced, and subsequently de-referenced. Also, the world is not ideal. Rather than reclaiming all
unreferenced memory, the GC starts a separate thread. The function GetTotalMemory(true) merely waits a
short interval. So:
Another problem is that the GC reports only on the memory that it manages. The GC is oblivious to unmanaged
memory such as bitmaps, COM objects, etc. In my article on IDispose I demonstrate this using a 3MB JPG
image. The GC reports zero memory utilization while the object is referenced! And worse, without properly
disposing the object, physical memory will continue to be utilized until none is left and the GC finally starts
reclaiming it. Bitmaps and the like are an interesting problem in themselves though. They're sort of a "quasi-
managed" resource since the wrapping class implements the IDispose interface and therefore the unmanaged
resources are cleaned up when the managed resource is reclaimed. This binding between managed/unmanaged
resources makes the issue of resource management yet again more confusing.
It becomes clear that using the GC to test memory allocations is pointless. It is inaccurate, affects other
performance measurements, and incomplete.
Remember that part of the purpose of a unit test is to guide the programmer to properly implement the functionality
under test. With regards to memory utilization, the unit test needs to consider the nature of the GC and the nature
of the object under test. What really needs to be determined is whether the implementation:
needs to support a manual cleanup in cases where the resources are allocated completely externally from
the .NET framework, such as in a COM object;
needs to support "directed" cleanup in cases where manually cleaning up managed resources improves overall
performance;
can rely entirely on the GC to eventually get around to performing cleanup.
This criteria gives us a clearer picture of what the purpose of memory testing is within the concept of a unit test.
Manual Cleanup
Manual cleanup is needed for resources that are allocated completely outside of the domain of the .NET framework.
This typically means COM objects or other third party programs which allocate resources and require the application
to specifically free these resources. Since the GC functions are useless in tracking this kind of memory, we have to
rely on system diagnostics to tell us how much memory is being used by these functions. Because these resources
are completely unmanaged by the GC, 2ssueis no binding managed resource which implements IDispose, and
therefore the programmer must wrap the resource in a class that either implements IDispose or provides some
other mechanism to free up the resources. The unit test should include whatever code is necessary to ensure that
the application interfaces with the third party functionality so that resources are reclaimed.
Directed Cleanup
Directed cleanup handles cases where unmanaged resources are already wrapped by classes in the .NET framework
(or by the application), thus becoming "managed". A bitmap or other GDI resource is an example of this. It is often
necessary to manually direct the reclamation of the unmanaged portion of the managed resource so that memory
and/or handles do not continue to be allocated without limit. Waiting for all physical memory plus all virtual memory
to be consumed before the GC starts reclaiming resources results in very poor performance of not only your
application, but the entire system. The unit test needs to be written in such a manner as to "document" the need
for this implementation.
It is important to recognize that for directed cleanup unit tests, we do not want the GC to run. If the GC were to
start reclaiming memory, then the unmanaged resources, being wrapped in a managed object, would be freed.
Rather, the unit test should ensure that the directed cleanup implementation is correct.
Automatic Cleanup
In this case, the application is going to rely on the GC to perform all cleanup whenever it decides to start collection.
The unit test does not need to measure memory or resource utilization. This "don't test" approach should only be
taken when the resources are fully managed by the GC--there are no objects that interface to or wrap unmanaged
resources. The only exception to this that I can think of has to do with managing large collections. In this case,
directed cleanup of the collection would improve memory utilization. However, because the .NET collection classes
don't provide for a complete reclamation of memory in a manual way, this is sort of pointless for now. Hopefully,
when generics are implemented and we can migrate to an STL approach for containers, the .NET collection classes
can be thrown away.
Memory Testing
If you buy into the three cases (manual, directed, and automatic) that I described above, then it should be clear that
the memory functions the GC provides are not appropriate, since the only thing we're really interested in is tracking
unmanaged resources, whether wrapped by a managed object or not. To do this, we simply need to watch the
process memory using a simple helper class:
using System.Diagnostics;
public class ProcessMemory
{
public static int WorkingSet
{
get {return Process.GetCurrentProcess().WorkingSet;}
}
}
which returns our process' physically allocated memory (we're going to ignore virtual memory allocations). The
MaxKMemory is used to specify the maximum amount of memory that a function is allowed to allocate on the
process heap (not the GC pool) without failing. For example:
[Test]
[MaxKMemory(1000)]
public void UnmanagedTest()
{
ClassBeingTested cbt=new ClassBeingTested();
cbt.LoadImage("fish.jpg");
cbt.Dispose();
}
The above code verifies that after the image has been loaded and disposed of, less than 1,000K of memory remains
allocated (1MB). What this test is really doing is:
(to keep the download size small, I have not included the fish.jpg file).
Conclusion
I completely agree that the usefulness of some of these tests are dubious for most applications. In my little corner of
the world however, I find them very helpful. And the real point here is that the intent of unit testing should be to
provide the programmer with a suite of tools to choose from that try to automate as best as possible different
testing requirements. I believe MUTE does this, and provides a good framework (albeit in need of some refactoring!)
for programmers to continue extending it for their own needs.
License
This article has no explicit license attached to it but may contain usage terms in the article text or the download files
themselves. If in doubt please contact the author via the discussion board below.
Marc Clifton Marc is the creator of two open source projets, MyXaml, a declarative (XML) instantiation
engine and the Advanced Unit Testing framework, and Interacx, a commercial n-tier RAD
application suite. Visit his website, www.marcclifton.com, where you will find many of his
articles.
Architect
Interacx
United States
Member