Automated System Testing using Visual GUI
Automated System Testing using Visual GUI
Abstract—Software companies are under continuous pressure Automated testing has been proposed as one solution to the
to shorten time to market, raise quality and lower costs. More problems with manual regression testing since automated tests
automated system testing could be instrumental in achieving can run faster and more often, decreasing the need for test case
these goals and in recent years testing tools have been developed
to automate the interaction with software systems at the GUI
selection and thereby raising quality, while reducing manual
level. However, there is a lack of knowledge on the usability effort. However, most automated test techniques, e.g. unit test-
and applicability of these tools in an industrial setting. This ing [6], [7], Behavioral Driven Development [8], etc., approach
study evaluates two tools for automated visual GUI testing on testing on a lower system level that has spurred an ongoing
a real-world, safety-critical software system developed by the discussion regarding if these techniques, with certainty, can be
company Saab AB. The tools are compared based on their
properties as well as how they support automation of system
applied on high-level system tests, e.g. system tests [9], [10].
test cases that have previously been conducted manually. The This uncertainty has resulted in the development of automated
time to develop and the size of the automated test cases as test techniques explicit for system and acceptance tests, e.g.
well as their execution times have been evaluated. Results show Record and Replay (R&R) [11]–[13]. R&R is a tool-supported
that there are only minor differences between the two tools, one technique where user interaction with a System Under Test’s
commercial and one open-source, but, more importantly, that
visual GUI testing is an applicable technology for automated
(SUT) GUI components are captured in a script that can later
system testing with effort gains over manual system test practices. be replayed automatically. User interaction is captured either
The study results also indicate that the technology has benefits on a GUI component level, e.g. via direct references to the
over alternative GUI testing techniques and that it can be used for GUI components, or on a GUI bitmap level, with coordinates
automated acceptance testing. However, visual GUI testing still to the location of the component on the SUT’s GUI. The
has challenges that must be addressed, in particular the script
maintenance costs and how to support robust test execution.
limitation with this technique is that the scripts are fragile
to GUI component change [14], e.g. API, code, or GUI layout
Keywords-Visual GUI testing; Empirical; Industrial Study; change, which in the worst case can render entire automated
Tool Comparison;
test suites inept [15]. Hence, the state-of-practice automated
test techniques suffer from limitations and there is a need
I. I NTRODUCTION for a more robust technique for automation of system and
Market trends with demands for faster time-to-market and acceptance tests.
higher quality software continue to pose challenges for soft- In this paper, we investigate a novel automated testing
ware companies that often work with manual test practices technique, which we in the following call visual GUI testing,
that can not keep up with increasing market demands. Com- with characteristics that could lead to more robust system test
panies are also challenged by their own systems that are automation [16]. Visual GUI testing is a script based testing
often Graphical User Interface (GUI) intensive and therefore technique that is similar to R&R but uses image recognition,
complex and expensive to test [1], especially since software instead of GUI component code or coordinates, to find and
is prone to changing requirements, maintenance, refactoring, interact with GUI bitmap components, e.g. images and buttons,
etc., which requires extensive regression testing. Regression in the SUT’s GUI. GUI bitmap interaction based on image
testing should be conducted with configurable frequency [2], recognition allows visual GUI testing to mimic user behavior,
e.g. after system modification or before software release, on treat the SUT as a black box, whilst being more robust to
all levels of a system, from unit tests, on small components, to GUI layout change. It is therefore a prime candidate for better
system and acceptance tests, with complex end user scenario system and acceptance test automation. However, the body of
input data [3], [4]. However, due to the market imposed time knowledge regarding visual GUI testing is small and contain
constraints many companies are compelled to focus or limit no industrial experience reports or other studies to support
their manual regression testing with ad hoc test case selection the techniques industrial applicability. Realistic evaluation on
techniques [5] that do not guarantee testing of all modified industrial scale testing problems are key in understanding
parts of a system and cause faults to slip through. and refining this technique. The body of knowledge neither
351
352
automation, including tools for GUI-interaction [22], but to Pre-Study
the authors’ knowledge there is no research using visual GUI Initial tool analysis
testing for acceptance testing. Industrial context analysis
Experiments
III. C ASE S TUDY D ESCRIPTION
Comparable tool Company and
The empirical study presented in this paper was conducted properties and system context
information
in a real-world, industrial context, in one business area of the Experimental results
company Saab AB, in the continuation of this paper referred to
as Saab. Saab develops safety critical air traffic control systems Industrial-Study
Manual system test
that consist of several individual subsystems of which a key participation
one was chosen as the subject of this study. The subsystem
Classified test cases
has in the order of 100K Lines of Code (LOC), constituting
roughly one third of the functionality of the system it is part Test case Selection and
Automation
of, and is tested with different system level tests, including 50
manual scenario based system test cases. At the time of the Collected Development and Execution metrics
study the subsystem was in the final phase for a new customer
release that was one reason why it was chosen. Other reasons Data analysis
for the choice included the subsystem size in LOC, the number
of manual test cases, and because it had a non-animated Conclusions
GUI. With non-animated we mean that there are no moving
graphical components, only components that, when interacted
with, change face, e.g. color. Decision support information for
Fig. 1. Overview of research methodology (square nodes show activities/steps
what subsystem to include in the study was gathered through and rounded ones outcomes).
document analysis, interviews and discussions with different
software development roles at Saab.
CommercialTool was selected for this study because Saab the visual GUI testing tools. Both of the visual GUI testing
had been contacted by the tool’s vendor and been provided tools were then used to automate five, carefully selected,
with a trial license for the tool that made it accessible. It representative, test case scenarios (ten percent) of the manual
is a mature product for visual GUI testing having been on test suite during which metrics on script development time,
the market since more than 5 years. The second tool, Sikuli, script LOC and script execution time were collected.
was chosen since it seemed to have similar functionality as In the following sections the two phases of the methodology
CommercialTool and, if applicable, would be easier to refine will be described in more detail.
and adapt further to the company context. The company was
also interested in the relative cost benefits of the tools, i.e. if A. Pre-study
the functionality or support of CommercialTool would justify Knowledge about the industrial context at Saab was ac-
its increased up-front cost. quired through document analysis, interviews and discussions
The methodology used in the study was divided into two with different roles at the company. The company’s support
main phases, shown in Figure 1, with three steps in each phase. made it possible to identify a suitable subsystem for the study,
Phase one of the study was a pre-study with three different based on subsystem size, number of manual test cases, GUI
steps. An initial tool analysis compared the tools based on properties, criticality, etc., and to identify the manual test
their static properties as evaluated through ad hoc script practices conducted at the company.
development and review of the tools’ documentation. This was In parallel with the industrial context analysis, static prop-
followed by a series of experiments with the goal of collecting erties of the studied tools were collected, through explorative
quantitative metrics on the strengths and weaknesses of the literature review of the tools’ documentation and ad hoc script
tools. The experiments also served to provide information development. The collected properties were then analyzed
about visual GUI testing’s applicability for different types of according to the quality criteria proposed by Illes et al. [23],
GUIs, e.g. animated with moving objects and non-animated derived from the ISO/IEC 9126 standard supplemented with
with static buttons and images, which would provide decision criteria to define tool vendor qualifications. The criteria refer
support for, and possibly rule out, what type of system to to tool quality and are defined as Functionality, Reliability, Us-
study at Saab in the second phase of the study. In parallel with ability, Efficiency, Maintainability, Portability, General vendor
these experiments an analysis of the industrial context at Saab qualifications, Vendor support, and Licensing and pricing.
was also conducted. Phase two of the study was conducted The tools were also analyzed in four structured experiments
at Saab and started with a complete manual system test of all where scripts were written in both tools, with equivalent
the 50 test cases of the studied subsystem. This took 40 hours, instructions to make the scripts comparable, and then executed
spread over five days, during which the manual test cases were against controlled GUI input. The GUI input was classified
categorized based on their level of possible automation with into two groups, animated GUIs and non-animated GUIs, cho-
352
353
MacBook Pro
Computer 1
VNC connection Tools (Sikuli and
User account 1 User account 2 CommercialTool)
Screen from
CommercialTool user account 2
GUI input VNC Connections
applications,
VNC server
Sikuli + VNC SUT
viewer Commands to
Computer 2 Computer 3 Computer 4
user account 2
Subsystem Subsystem
Simulators
part A part B
LAN connection
sen to cover and evaluate how the tools perceivably performed
for different types of industrial systems. The ability to handle Fig. 3. Visualization of the test system setup.
animated GUIs is critical for visual GUI testing tools since
they apply compute-intensive image recognition algorithms
that might not be able to cope with highly dynamic GUIs. paired with a third party VNC viewer application. The VNC
Eight scripts were written in total, four in each tool, and viewer application was run on one user account connected to
each one was executed in 30 runs for each experiment. The a VNC server on a second user account on the experiment
experiments have been summarized in the following list: computer, visualized in Figure 2.
• Experiment 1: Aimed to determine how well the tools Finally the visual GUI testing tools were also analyzed in
could differentiate between alpha-numerical symbols by terms of learnability since this aspect affects the technique’s
adding the numbers six and nine in a non-animated desk- acceptance, e.g. if the tool has a steep learning curve it is
top calculator by locating and clicking on the calculator’s less likely to be accepted by users [25]. The learnability was
buttons. evaluated in two ad hoc experiments using Sikuli, where two
• Experiment 2: Aimed to determine how the tools could individuals with novice programming knowledge, at two dif-
handle small graphical changes on a large surface, tested ferent occasions, had to automate a simple computer desktop
by repeated search of the computer desktop for a specific task with the tool.
icon to appear that was controlled by the researcher.
• Experiment 3: Aimed to test the tools image recognition
B. Industrial Study
algorithms in an animated context by locating the back The studied subsystem at Saab consisted of two computers
fender of a car driving down a street in a video clip in with the Windows XP operating system, connected through a
which the sought target image was only visible for a few local area network (LAN). The LAN also included a third
video frames. computer running simulators, used during manual testing
• Experiment 4: Also in an animated context, aimed to to emulate domain hardware controlled by the subsystem’s
identify how well the tools could track a moving object GUI. The GUI consisted primarily of custom-developed GUI
over a multi-colored surface in a video clip of an aircraft, components, such as buttons and other bitmap graphics, and
represented by its textual call-sign, moving across a radar was non-animated. During the study a fourth computer was
screen. also added to the LAN to run the visual GUI testing tools and
The four experiments cover typical functionality and behavior VNC, visualized in Figure 3. VNC is scalable for distributed
of most software system GUIs, e.g. interaction with static systems so the level of complexity of the industrial test system
objects such as buttons or images, timed events and objects setup, Figure 3, was directly comparable to the complexity of
in motion, to provide a broad view of the applicability of the the experimental setup used during the pre-study, Figure 2.
tools for different systems. Experiment 4 was selected since In the first step of the industrial study the researchers
it is similar to one of the systems developed by the company. conducted a complete manual system test of the chosen
The experiments were run on a MacBook Pro computer, subsystem with two goals. The first goal was to categorize
with a 2.8GHz Intel Core 2 Duo processor, using virtual the manual test cases as fully scriptable, partially scriptable or
network computing (VNC) [24], which was a requirement not scriptable based on the tool properties collected during the
for CommercialTool. CommercialTool is designed to be non- pre-study. The categorization provided input for the selection
intrusive, meaning that it should not affect the performance of of representative manual test cases to automate and showed if
the SUT, and to support testing of distributed software systems. enough of the manual test suite could be automated for the
This is achieved by performing all testing over VNC and automation to be valuable for Saab.
support for it is built into the tool. Sikuli does not have VNC All the subsystem’s manual test cases were scenario based,
support so to equalize the experiment conditions Sikuli was written in natural language, including pre- and post-conditions
353
354
for each test case and were organized in tables with three Test case Physical Run-time Simulator
columns. Column one described what input to manually give computers config.
Test case 1 2 3 A
to the subsystem, e.g. click on button x, set property y, etc. Test case 2 2 0 B
Column two described the expected result of the input, e.g. Test case 3 2 2 A
button x changes face, property y is observed on object z, Test case 4 2 0 A
Test case 5 3 0 A
etc. The last column was a check box where the tester should
report if the expected result was observed or not. The test case TABLE I
table rows described the test scenario steps, e.g. after giving P ROPERTIES OF THE MANUAL TEST CASES SELECTED FOR AUTOMATION .
T HE NUMBER OF PHYSICAL COMPUTERS DOES NOT INCLUDE THE
input x, observing output y and documenting the result in the COMPUTER USED TO RUN THE VISUAL GUI TESTING TOOLS .
checkbox on row k the scenario proceeded on row k+1, etc.,
until reaching the final result checkbox on row n. Hence, the
test scenarios were well defined and documented in a way
suitable as input for the automation. the manual test cases and also had the most complex GUIs.
The second research purpose of conducting the manual In addition, simulators C and D had very similar functionality
system test was to acquire information of how the different to A and B and had no unique GUI components not present
parts of the subsystem worked together and what or which in A or B and were therefore identified as less important and
test cases provided test coverage for which part(s) of the possible to automate.
subsystem. Test coverage information was vital in the manual Once the representative test cases had been selected from the
test case selection process to ensure that the selected test manual test suite they were automated in both of the studied
cases were representative for the entire test suite so that the tools during which metrics were collected for comparison
results could be generalized. Generalization of the results was of the tools and the resulting scripts. Metrics that were
required since it was not feasible to automate all 50 of the collected included script development time, script LOC and
subsystem’s manual test cases during the study. script execution time.
Five test cases were selected for automation with the goal IV. R ESULTS
of capturing as many mutually exclusive GUI interaction
Below the results gathered during the study are presented
types as possible, e.g. clicks, sequences of clicks, etc., to
divided into the results gathered during the pre-study and the
ensure that these GUI interaction types, and in turn test cases
results gathered during the industrial phase of the study.
including these GUI interaction types, could be automated.
GUI interaction types with properties that added complexity A. Results of the Pre-study
to the automation were especially important to cover in the The pre-study started with a review of the studied visual
five automated test cases, the most complex properties have GUI testing tools’ documentation from which 12 comparable
been listed below: static tool properties relevant for Saab were collected. The
1) The number of physical computers in the subsystem the 12 properties are summarized in Table II that shows which
test case required access to. property had impact on what tool quality criteria defined by
2) Which of the available simulators for the subsystem the Illes et al. [23], described in section III. The table also shows
test case required access to. what tool was the most favorable to Saab in terms of a given
3) The number of run-time reconfigurations of the subsys- property, e.g. CommercialTool was more favorable in terms of
tem the test case included. real-time feedback than Sikuli. The favored tool is represented
in the table with an S for Sikuli, CT for CommercialTool and
The number of physical computers would impose complex-
(-) if the tools were equally favorable.
ity by requiring additional VNC control code and interaction
In the following section each of the 12 tool properties are
with a broader variety of GUI components, e.g. interaction
discussed in more detail, compared between the tools and
with custom GUI components in subsystem part A and B
related to what tool quality criteria they impact.
and the simulators. Simulator interaction was also important
Developed in. CommericalTool is developed in C#, whilst
to cover in the automated test cases since if some simulator
Sikuli is developed in Jython (a Python version in Java), which
interaction could not be automated neither could the manual
is relevant for the portability of the tools since Commercial-
test cases using that simulator. Run-time reconfiguration in
Tool only works on certain software platforms whilst Sikuli is
turn added complexity by requiring the scripts to read and
platform independent. Sikuli, being open source, also allows
write to XML files. In Table I the five chosen test cases have
the user to expand the tool with new functionality, written in
been summarized together with which of the three properties
Jython, whilst users of CommercialTool must rely on vendor
they automate. The minimum number of physical computers
support to add tool functionality.
required in any test case were two and maximum three whilst
Script Language syntax. The script language in Sikuli
the maximum number of run-time configurations in any test
is based on Python, extended with functions specific for
case were also three. There were four simulators, referred to as
GUI interaction, e.g. clicking on GUI objects, writing text
A, B, C and D, but only simulators A and B were automated
in a GUI, waiting for GUI objects, etc. Sikuli scripts are
in any script because they were the most commonly used in
written in the tool’s Integrated Development Environment
354
355
(IDE) and because of the commonality between Python and Property Commer- Sikuli Impacts Favored
other imperative/Object-Oriented languages the tool has both cialTool tool
Developed in C# Jython F/P/VS S
high usability and learnability with perceived positive impact Script language syn- Custom Python F/U/M S
on script maintainability. The learnability of Sikuli is also tax
supported by the learnability experiments conducted during the Supports imports No Java and F/U/E/ S
Python VS
pre-study, described in Section III, where novice programmers Image representation Text- Images F/U/M/P S
were able to develop simple Sikuli scripts after only 10 in tool IDE Strings
minutes of Sikuli experience and advanced scripts after an Real-time script exe- Yes No U/M CT
cution feedback
hour.
Image recognition 7 5 F/R/U CT
CommercialTool has a custom scripting language, modelled sweeps per second
to resemble natural language that the user writes in the tool’s Image recognition Multiple Image F/R/U/ CT
IDE, which has a lot of functionality, but the tool’s custom failure mitigation algorithms simi- E/M/P
to choose larity
language has a higher learning curve than Sikuli script. The from configu-
usability of CommercialTool is however strengthened by the ration
script language instruction-set that is more extensive than the Test suite support Yes Unit F/U/M/P -
tests
instruction-set in Sikuli, e.g. including functionality to analyze only
audio output, etc. Both Sikuli and CommercialTool do however Remote SUT connec- Yes No F/U/P -
support all the most common GUI interaction functions and tion support
Remote SUT connec- Yes No F/U/P S
programming constructs, e.g. loops, switch statements, excep- tion requirement
tion handling, etc. Cost 10.000 Free U/LP S
Supports imports. Additional functionality can be added to Euros per
license per
Sikuli by user-defined imports written in either Java or Python computer
code to extend the tool’s usability and efficiency. Commercial- Backwards compati- Guaranteed Uncertain F/M/ CT
Tool does not support user-defined imports and again users bility GVQ
must rely on vendor support to add tool functionality. TABLE II
Image representation in tool IDE. Scripts in Commercial- R ESULTS OF THE PROPERTY COMPARISON BETWEEN C OMMERCIALT OOL
AND S IKULI . C OLUMN I MPACTS : F - F UNCTIONALITY, R - R ELIABILITY,
Tool refers to GUI interaction objects (such as images) through
U - U SABILITY, E - E FFICIENCY, M - M AINTAINABILITY, P -
textual names whilst Sikuli’s IDE shows the GUI interaction P ORTABILITY, GVQ - G ENERAL V ENDOR QUALIFICATIONS , VS - V ENDOR
objects as images in the script itself. The image presentation in S UPPORT, LP - L ICENSING AND PRICING . C OLUMN FAVORED TOOL : S -
Sikuli’s IDE makes Sikuli scripts very intuitive to understand, S IKULI , CT - C OMMERCIALT OOL , (-) - E QUAL BETWEEN THE TOOLS
also for non-developers, which positively affects the usability,
maintainability and portability of the scripts between versions
of a system. In particular this makes a difference for large
match in the GUI. Hence, Sikuli has less failure mitigation
scripts with many images.
functionality that can have negative effects on usability, relia-
Real-time script execution feedback. CommercialTool pro-
bility, etc.
vides the user with real-time feedback, e.g. what function of
Test suite support. Sikuli does not have built in support to
the script is currently being executed and success or failure
create, execute or maintain test suites with several test scripts,
of the script. Sikuli on the other hand executes the script and
only single unit tests. CommercialTool has such support built
then presents the user with feedback, i.e. post script execution
in. A custom test suite solution was therefore developed during
feedback. This lowers the usability and maintainability of test
the study that uses Sikuli’s import ability to run several test
suites in Sikuli since it becomes harder to identify faults.
scripts in sequence, providing Sikuli with the same function-
Image recognition sweeps per second. Sikuli has one image
ality, usability, perceived maintainability and portability.
recognition algorithm that can be run five times every second
Remote SUT connection support / requirement. Sikuli does
whilst the image recognition algorithm in CommercialTool
not have built in VNC support, a property that is not only
runs seven times every second. CommercialTool is therefore
supported by CommercialTool but also required by the tool to
potentially more robust, e.g. to GUI timing constraints, and
operate. Sikuli was therefore paired with a third party VNC
have higher reliability and usability, at least in theory, than
application as described in Section III, to provide Sikuli with
Sikuli for this property.
the same functionality, usability and portability as Commer-
Image recognition failure mitigation. CommercialTool has
cialTool.
several image recognition algorithms with different search
Cost. The studied tools differ in terms of cost since Sikuli
criteria that give the tool higher reliability, usability, efficiency,
is open source with no up-front cost whilst CommercialTool
maintainability and portability by providing automatic script
costs around 10.000 Euros per ‘floating license’ per year. A
failure mitigation. Script failure mitigation in Sikuli requires
floating license means that it is not connected to any one user
manual effort, e.g. by additional failure mitigation code or by
or computer but only one user can use the tool at a time, hence
setting the similarity, 1 to 100 percent, of a bitmap interaction
the Licensing and pricing quality criterion in this case affects
object required for the image recognition algorithm to find a
the usability of CommericalTool since some companies may
355
356
Experiment Type Desc. CT Sikuli success rate in the experiments with animated GUIs and
success success
rate (%) rate (%)
showed to be easier to adapt, only requiring small efforts to be
1 non-animated Calculator 100 50 extended with additional functionality. In addition, Sikuli was
2 non-animated Icon finder 100 100 considered marginally favored according to the tool quality
3 animated Car Finder 3 25
4 animated Radar trace 0 100
criteria defined by Illes et al. and is therefore perceived as a
better candidate for future research.
356
357
CT Sikuli
250
case time time time time Steps
(min) (sec) (min) (sec)
200
ATC-2 195 405 233 200 390 228 4
ATC-3 285 390 368 260 338 345 16
ATC-4 205 80 80 180 110 92 9
150
ATC-5 120 90 115 150 154 169 8
Total: 17 hours 17.93 899 15 18.00 1046
40 min- min- LOC hours min- LOC
utes utes 55 utes
100
min-
utes CommercialTool Dev. Time Sikuli Dev. Time
TABLE IV
M ETRICS COLLECTED DURING TEST CASE AUTOMATION . CT STANDS FOR
C OMMERCIALT OOL , ATC FOR AUTOMATED TEST CASE AND TC STEPS Fig. 4. Boxplot showing development time of the five scripts in each tool.
FOR THE NUMBER OF TEST STEPS IN THE SCENARIO OF THE MANUAL
TEST CASE .
357
358
was determined as unsuitable for the automation of the sub- writing visual GUI testing scripts was related to the effort
system test cases since they had to interact with components required to make the scripts robust to unexpected system
not developed by Saab, e.g. interaction with custom and OS behavior. Unexpected system behavior can be caused by faults
GUI components. These interactions required access to GUI in the system, related or unrelated to the script, and must be
component references that could not be acquired. The GUI handled to avoid that these faults are overlooked or break the
components in the SUT, e.g. the simulators, windows in the test execution. Other unexpected behavior can be caused by
OS, etc., did not always appear in the same place on the events triggered by the system’s environment, e.g. warning
screen when launched. This behavior also ruled out R&R messages displayed by the OS. Hence, events that may appear
with coordinate interaction as an alternative for the study. anywhere on the screen. These events can be handled with
Evaluation of visual GUI testing showed that it does not suffer visual GUI testing but are a challenge for R&R since the events
from R&R’s limitations and therefore works in contexts where location, the coordinates, are usually nondeterministic. Script
R&R cannot be applied. Visual GUI testing is applicable on robustness in visual GUI testing can be achieved through ad
different types of GUIs, evaluated in the pre-study experiments hoc failure mitigation but is a time-consuming practice. A
and in industry, which showed that both studied tools had new approach, e.g. a framework or guidelines, is therefore
high success-rates with non-animated GUIs and that Sikuli required to make robust visual GUI test script development
also had good success-rate on animated GUIs as well. Hence, more efficient. Hence, another subject for future research.
this study shows that visual GUI testing works for tests on non- The cost of automating the manual test suite for the studied
animated GUIs and perceivably also for animated GUIs. Non- subsystem was estimated to 20 business days, which is a
animated GUI applicability is however a subject for future considerable investment, and to ensure that it is cost-beneficial
deeper research. the maintenance costs of the suite therefore have to be small.
The purpose of automation of manual tests is to make Small is in this context measured compared to the cost of
the regression testing more cost-efficient by increasing the manual regression testing, hence the initial investment and
execution speed and frequency and lower the required manual the maintenance costs have to break even with the cost of
effort of executing the tests cases. Estimations based on the the manual testing within a reasonable amount of time. The
collected data show that a complete automatic test suite for maintenance costs of visual GUI testing scripts when the
the studied subsystem would execute in three and a half hours, system changes are however unknown and future research is
which constitutes a 78 percent reduction compared to manual needed.
test execution with an experienced tester. Hence, the automated Our results show that visual GUI testing is applicable for
test suite could be run daily, eliminating the need for partial system regression testing of the type of industrial safety critical
manual system tests, reduce cost, increase test frequency and GUI based systems in use at Saab. The technique is however
lower the risk of slip through of faults. Mitigation of slip limited to find faults defined in the scripted scenarios. Hence,
through of faults is however limited with this technique by visual GUI testing cannot replace manual testing but minimize
the test scenarios since faulty functionality not covered by it for customer delivery. Visual GUI testing also allows tests
the test scripts would be overlooked, whilst a human tester to be run more often and are more flexible than other GUI
could still detect them through visual inspection. Hence, the testing techniques, e.g. coordinate based R&R, because of
automated scripts cannot replace human testers and should image recognition that can find a GUI component regardless
rather be a complement to other test practices, such as manual of its position in the GUI. Furthermore, R&R tools that
free-testing. The benefit of visual GUI testing scripts compared require access to the GUI components, in contrast to visual
to a human tester in terms of test execution is that the scripts GUI testing, are not easily applicable at this company since
are guaranteed to run according to the same sequence every their systems have custom-developed GUIs as required in
time, whilst human testers are prone to take detours and make their domain. We have also seen that visual GUI testing can
mistakes during testing, e.g. click on the wrong GUI object, be applied for automated acceptance testing. Being able to
etc., which can cause faults to slip through. continuously test the system with user-supplied test data could
Scenario based system tests are very similar to acceptance have very positive results on quality.
tests and based on the results of this study it should there- Evaluating a technique’s applicability in a real-world con-
fore be concluded as plausible to automate acceptance tests text is a complex task. We have opted on a multi-step case
with visual GUI testing. This conclusion is supported by the study that covers multiple different criteria that gives the
research of similar GUI testing techniques, e.g. R&R, which company better decision support on which to proceed. Even
has been shown to work for acceptance test automation [12], though the test automation comparison is based on a limited
[22]. Further support is provided by the fact that some of number of test cases the research was designed so that these
the manual test cases, categorized as fully scriptable, for the test cases are representative of the rest of the manual test suite.
studied subsystem had been developed with customer specific Still, this is a threat to the validity of our results. Our industrial
data. The results of this study therefore provide initial support partner is more concerned with the amount of maintenance that
that visual GUI testing can be used for automated acceptance will be needed as the system evolves. If these costs are high
testing in industry. they will seriously limit the long-term applicability of visual
During the study it was established that the primary cost of GUI testing.
358
359
VI. C ONCLUSION 331–357, 2007.
[2] R. Miller and C. Collins, “Acceptance testing,” Proc. XPUniverse, 2001.
In this paper we have shown that visual GUI testing tools [3] P. Hsia, D. Kung, and C. Sell, “Software requirements and acceptance
are applicable to automate system and acceptance tests for testing,” Annals of software Engineering, vol. 3, no. 1, pp. 291–317,
industrial systems with non-animated GUIs with both cost and 1997.
[4] P. Hsia, J. Gao, J. Samuel, D. Kung, Y. Toyoshima, and C. Chen,
potentially quality gains over state-of-practice manual testing. “Behavior-based acceptance testing of software systems: a formal sce-
Experiments also showed that the open source tool that was nario approach,” in Computer Software and Applications Conference,
evaluated can successfully interact with dynamically changing, 1994. COMPSAC 94. Proceedings., Eighteenth Annual International.
IEEE, 1994, pp. 293–298.
animated GUIs that would broaden the number and type of [5] T. Graves, M. Harrold, J. Kim, A. Porter, and G. Rothermel, “An em-
systems it can be successfully applied to. pirical study of regression test selection techniques,” ACM Transactions
We present a comparative study of two visual GUI testing on Software Engineering and Methodology (TOSEM), vol. 10, no. 2, pp.
184–208, 2001.
script tools, one commercial and one open source, at the [6] M. Olan, “Unit testing: test early, test often,” Journal of Computing
company Saab AB. The study was conducted in multiple steps Sciences in Colleges, vol. 19, no. 2, pp. 319–328, 2003.
involving both static and dynamic evaluation of the tools. One [7] E. Gamma and K. Beck, “Junit: A cook’s tour,” Java Report, vol. 4,
no. 5, pp. 27–38, 1999.
of the company’s safety critical subsystems, distributed over [8] D. Chelimsky, D. Astels, Z. Dennis, A. Hellesoy, B. Helmkamp, and
two physical computers, with a non-animated GUI, was chosen D. North, “The rspec book: Behaviour driven development with rspec,
and 10 percent, 5 out of 50, representative, manual, scenario- cucumber, and friends,” Pragmatic Bookshelf, 2010.
[9] E. Weyuker, “Testing component-based software: A cautionary tale,”
based, test cases were automated in both tools. A pre-study Software, IEEE, vol. 15, no. 5, pp. 54–59, 1998.
helped select the relevant test cases to automate as well as [10] S. Berner, R. Weber, and R. Keller, “Observations and lessons learned
evaluate the strengths and weaknesses of the two tools on key from automated testing,” in Proceedings of the 27th international
conference on Software engineering. ACM, 2005, pp. 571–579.
criteria relevant for the company. [11] A. Adamoli, D. Zaparanuks, M. Jovic, and M. Hauswirth, “Automated
Analysis of the tools properties show differences in the tools gui performance testing,” Software Quality Journal, pp. 1–39, 2011.
functionality but overall results show that both studied tools [12] J. Andersson and G. Bache, “The video store revisited yet again:
Adventures in gui acceptance testing,” Extreme Programming and Agile
work equally well in the industrial context with no statistically Processes in Software Engineering, pp. 1–10, 2004.
significant differences in either development time, run time [13] A. Memon, “Gui testing: Pitfalls and process,” IEEE Computer, vol. 35,
or LOC of the test scripts. Analysis of the subsystem test no. 8, pp. 87–88, 2002.
[14] M. Jovic, A. Adamoli, D. Zaparanuks, and M. Hauswirth, “Automating
suite show that up to 98 percent of the test cases can be fully performance testing of interactive java applications,” in Proceedings of
or partially automated using visual GUI testing with gains to the 5th Workshop on Automation of Software Test. ACM, 2010, pp.
both cost and quality of the testing. Execution times of the 8–15.
[15] E. Sjösten-Andersson and L. Pareto, “Costs and benefits of structure-
automated test cases are 78% lower than running the same test aware capture/replay tools,” SERPS’06, p. 3, 2006.
cases manually and the execution requires no manual input. [16] T. Chang, T. Yeh, and R. Miller, “Gui testing using computer vision,”
Our analysis shows that visual GUI testing can overcome in Proceedings of the 28th international conference on Human factors
in computing systems. ACM, 2010, pp. 1535–1544.
the obstacles of other GUI testing techniques, e.g. Record and [17] R. Potter, Triggers: Guiding automation with pixels to achieve data
Replay (R&R). R&R either requires access to the code in order access. University of Maryland, Center for Automation Research,
to interact with the System Under Test (SUT) or is tied to Human/Computer Interaction Laboratory, 1992, pp. 361–382.
[18] L. Zettlemoyer and R. St Amant, “A visual medium for programmatic
specific physical placement of GUI components on the display. control of interactive applications,” in Proceedings of the SIGCHI
Visual GUI testing is more flexible, interacting with GUI conference on Human factors in computing systems: the CHI is the
bitmap components through image recognition, and robust to limit. ACM, 1999, pp. 199–206.
[19] A. Memon, M. Pollack, and M. Soffa, “Hierarchical gui test case
changes and unexpected behavior during testing of the SUT. generation using automated planning,” Software Engineering, IEEE
Both of these advantages were important in the investigated Transactions on, vol. 27, no. 2, pp. 144–155, 2001.
subsystem since it had custom GUI components and GUI [20] P. Brooks and A. Memon, “Automated gui testing guided by usage
profiles,” in Proceedings of the twenty-second IEEE/ACM international
components that changed position between test executions. conference on Automated software engineering. ACM, 2007, pp. 333–
However, more work is needed to extend the tools with ways 342.
to specify and handle unexpected system events in a robust [21] A. Memon, “An event-flow model of gui-based applications for testing,”
Software Testing, Verification and Reliability, vol. 17, no. 3, pp. 137–
manner; the potential for this in the technique is not currently 157, 2007.
well supported in the available tools. For testing of safety- [22] C. Lowell and J. Stell-Smith, “Successful automation of gui driven
critical software systems there is also a concern that the acceptance testing,” Extreme Programming and Agile Processes in
Software Engineering, pp. 1011–1012, 2003.
automated tools are not able to find defects that are outside the [23] T. Illes, A. Herrmann, B. Paech, and J. Rückert, “Criteria for software
scope of the test scenarios, such as safety defects. Thus any testing tool evaluation. a task oriented view,” in Proceedings of the 3rd
automated system testing will still have to be combined with World Congress for Software Quality, vol. 2, 2005, pp. 213–222.
[24] T. Richardson, Q. Stafford-Fraser, K. Wood, and A. Hopper, “Virtual
manual system testing before delivery but the main concern network computing,” Internet Computing, IEEE, vol. 2, no. 1, pp. 33–
for future research is the maintenance costs of the scripts as 38, 1998.
a system evolves. [25] L. Fowler, J. Armarego, and M. Allen, “Case tools: Constructivism and
its application to learning and usability of software engineering tools,”
Computer Science Education, vol. 11, no. 3, pp. 261–272, 2001.
R EFERENCES
[1] P. Li, T. Huynh, M. Reformat, and J. Miller, “A practical approach to
testing gui systems,” Empirical Software Engineering, vol. 12, no. 4, pp.
359
360