A Metaheuristic Approach to Automatic Test Case Generation for

Transcription

Diplomarbeit
A Metaheuristic Approach to
Automatic Test Case Generation
for GUI-Based Applications
Sebastian Bauersfeld
bauersfeld (at) informatik.hu-berlin.de
22. August 2011
Gutachter:
Prof. Dr. Klaus Bothe,
Dr. Joachim Wegener
Datum der Abgabe:
22. August 2011
2
Acknowledgements
It is a pleasure to thank the people who helped me to accomplish this thesis.
I would like to express my sincere gratitude to Dr. Joachim Wegener, who inspired
and encouraged me in the beginning and gave valuable input throughout the course
of this work.
I would like to thank Dr. Stefan Wappler who helped me with his experience and
incredibly detailed feedback.
I wish to thank my parents for their support throughout the last weeks of writing.
They made my life much easier during that period of time.
I
II
Contents
Contents
1. Introduction
1.1. Abstract . .
1.2. Motivation .
1.3. Objectives .
1.4. Outline . .
.
.
.
.
1
1
1
3
5
2. Introduction to Metaheuristics
2.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2. Ant Colony Optimization . . . . . . . . . . . . . . . . . . . . . . . .
7
7
11
3. Related Work
15
4. The
4.1.
4.2.
4.3.
Approach
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Description of the individual Steps . . . . . . . . . . . . . . . . . . .
Applying the Approach . . . . . . . . . . . . . . . . . . . . . . . . .
19
19
20
22
5. Sequence Generation with ACO
5.1. The Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3. Adjusting the Metaheuristic Optimization . . . . . . . . . . . . . . .
25
25
27
28
6. The Fitness Function
6.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
31
35
7. Operating the SUT
7.1. Scanning the GUI . . . . . . . . . . . . . . . . . . . . . .
7.2. Deriving Actions . . . . . . . . . . . . . . . . . . . . . .
7.3. Executing Actions . . . . . . . . . . . . . . . . . . . . .
7.3.1. Simulating Input versus Invoking Event Handlers
7.3.2. Naming Scheme . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
37
37
41
47
47
48
8. Implementation
8.1. The Framework . . . . . . . .
8.2. Java Agents . . . . . . . . . .
8.3. Implementation of the Fitness
8.3.1. The Concept . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
51
51
53
54
55
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . .
. . . . . .
Function
. . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
III
Contents
8.3.2. Bytecode Instrumentation .
8.3.3. The ASM Framework . . .
8.4. Operating the SUT . . . . . . . . .
8.4.1. Accessing the SWT Classes
8.4.2. Generating Inputs . . . . .
8.4.3. Replaying Sequences . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
58
59
61
61
63
63
9. Experiment
9.1. Setup and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2. Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
65
69
10.Conclusion and Future Work
10.1. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.2. Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
71
72
Appendices
75
A. Java Bytecode Instructions
75
B. Miscellaneous
83
Bibliography
Erklärungen
Selbstständigkeitserklärung . . . . . . . . . . . . . . . . . . . . . . . . . .
Einverständniserklärung . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IV
V
IX
IX
IX
1. Introduction
1.1. Abstract
Testing applications with a Graphical User Interface (GUI) is an important, though
challenging and time-consuming task. The state of the art in the industry are still
scripting and capture and replay tools, which simplify the recording and execution
of input sequences, but do not support the tester in finding fault-sensitive test cases.
While search-based test case generation strategies, such as Evolutionary Testing, are
well-researched for various areas of testing, relatively little work has been done on
applying these techniques to an entire GUI of an application. This work presents
an approach to finding input sequences, using a metaheuristic algorithm named Ant
Colony Optimization.
1.2. Motivation
Software testing is an important and widely-used quality assurance technique applied in the industry. Modern software systems comprise various components, which
interact with each other to accomplish tasks. The correct behaviour of these components is often verified through unit, integration and system tests. Many of today’s
applications have a special component in the form of a Graphical User Interface
(GUI). A GUI is an interface which consists of control elements called widgets, like
for example buttons, menu items and text boxes. The GUI is often the only part
of the software that the user interacts with. It is thus necessary to thoroughly test
this interface in order to ensure the quality of the product. This is done by creating
test cases in the form of input sequences. An input sequence for a GUI application is a sequence of actions, like a click on a button control, or a drag and drop
operation. Figure 1.1 shows an input sequence for Microsoft Word, which causes
the currently opened document to be printed out. In the current industrial practice
a typical testing scenario performed throughout the development process of a GUI
application can look as follows: The testers start by designing an initial test suite
with several test cases. Often these suites comprise common scenarios, like printing
a document, filling in forms and committing the content to the database and so
forth. The test cases are created with the help of scripting or capture and replay
tools. With a scripting tool the testers have to write a script consisting of explicit
actions to be executed on the System Under Test (SUT). A capture and replay tool
facilitates the creation of such a script by recording the actions that the human
1
1. Introduction
clickMenu ( " F i l e " ) , clickMenu ( " P r i n t " ) , pressKey ( Tab ) ,
type ( " 22 " ) , pressKey ( Tab ) , type ( " 44 " ) , c l i c k B u t t o n ( "OK" )
Figure 1.1.: Input sequence that causes Microsoft Word to print pages 22 to 44 of
the current document.
tester performs. Recorded or scripted sequences may then be replayed, for example
to perform daily regression tests. However, due to the fact that the interface of the
SUT undergoes various changes throughout the development process, many of these
scripts will break, because they rely on widgets whose name or position has changed,
or which have been removed. This means the testers have to constantly repair the
scripts in order to maintain the test suite. This is labour-intensive with consequently
high costs [Mem01]. Considering these difficulties, techniques for automatic test case
generation are quite desirable.
One way of dealing with the task of automatically generating test cases, is to transform it into an optimization problem. The idea is to define a quality criterion or
fitness function and search for test cases which maximize this function. Since the
search space of all possible test cases is often large and has a complex structure,
one could try to exploit metaheuristic techniques. There has been a lot of research
about this idea in a field commonly known as Evolutionary Testing [McM04]. For
example: Wegener [Weg01] performs temporal testing with Genetic Algorithms. He
tries to find input data with extreme execution times (either high or low). Wappler
[Wap07] generates unit tests for classes by employing strongly-typed Genetic Programming to find method call sequences causing high code coverage of the classes
under test. Windisch et al. [WWW07] perform structural testing, employing Particle Swarm Optimization to find arguments to functions, so that branch coverage
gets maximized.
Recently, metaheuristic techniques have also been applied to GUI testing [MT11,
HCM10], but the research is still quite sparse. Automatic testing of GUI applications
poses several difficult challenges among which are
CH1 the huge amount of possible sequences. At each state of the SUT, there are
many alternative actions to choose from, which leads to an exceptionally large
search space. In addition it is computationally expensive to generate and evaluate sequences, since the SUT needs to be started and all the actions in the
sequence need to be executed. This requires efficient algorithms which explore
the search space in an intelligent manner to find good sequences.
CH2 the lack of well-studied quality criteria. What characterizes a “good” and faultsensitive test sequence in the context of GUI testing?
CH3 the technical difficulty of generating inputs. In order to click buttons, perform
drag and drop operations or input text, one needs to be able to
2
1.3. Objectives
1. scan the GUI to determine the visible widgets and their properties (e.g.
the positions of buttons, menu items etc.),
2. derive a set of reasonable actions at each execution stage (e.g. a visible,
enabled button is clickable)
3. and execute, record and replay these later on.
1.3. Objectives
The vision of a future framework for GUI testing could look as follows: Given a
GUI application and a test oracle – which determines whether a sequence has been
properly executed by the application – this framework automatically generates a
fault-sensitive test suite and returns the list of detected errors, without human intervention. The development of such a framework is an ambitious task. This work
contributes a first step towards accomplishing this task by presenting an approach
for the automatic generation of single input sequences for GUI-based applications.
To achieve this goal, it addresses the aforementioned challenges by
1. introducing and motivating a metaheuristic algorithm named Ant Colony Optimization, suitable for finding an input sequence with a high fitness value.
2. introducing and motivating a fitness function for input sequences, based on the
Call Tree Size metric [MM08].
3. presenting techniques for executing, recording and replaying complex actions
on applications’ GUIs.
All of the abovementioned objectives are implemented in a framework which is presented and tested in a first experiment. This framework focuses on Java applications
based on the Standard Widget Toolkit. The SUT utilized in the experiment is the
Classification Tree Editor (CTE) (see Figure 1.3), a graphical editor for classification
trees, developed by Berner & Mattner Systemtechnik GmbH.
Figure 1.2 shows an abstract version of the optimization process used by the presented approach. It sports the fitness function and the optimization algorithm working together in order to generate and improve sequences on each iteration. Ideally,
it eventually finds a sequence with the optimal fitness value. This process, which
will be explained in detail throughout the next chapters, is the main contribution
of this work. Contrary to previous approaches it neither requires a model of the
GUI1 nor existing human input sequences or similar handcrafted artifacts and is
thus completely automatic.
1
For example in the form of a finite state machine, which provides a list of possible actions that
may be executed in each state, etc.
3
1. Introduction
Optimization
Algorithm
generate +
execute
sequence
SUT
"optimal"
Sequence
learn
rate
sequence
Fitness
Function
Figure 1.2.: Optimization Process.
Figure 1.3.: The Classification Tree Editor, which is the SUT for this work.
4
1.4. Outline
1.4. Outline
The next chapter explains the concepts behind metaheuristic optimization techniques
and introduces the Ant Colony Optimization algorithm. Chapter 3 gives an overview
of existing approaches to automatic GUI testing. Chapter 4 presents the approach
applied in this work. It explains the sequence generation process and the particular
steps involved, which are elaborated in the following three chapters. Chapter 5 discusses how Ant Colony Optimization is used in this process in order to find sequences
with high fitness values. Chapter 6 motivates and defines the fitness function and
chapter 7 presents the techniques used to scan and operate the GUI, e.g. how to
perform clicks, type text into text fields or perform drag and drop operations. Chapter 8 explains the implementation of the features presented throughout chapters 4
to 7 and introduces the framework which has been developed during the course of
this work. Chapter 9 presents the results of an experiment, where the framework
is applied to the Classification Tree Editor and compared to a random sequence
generation strategy. Chapter 10 reviews the approach and discusses future work.
5
1. Introduction
6
This chapter gives a short introduction to metaheuristic techniques and introduces
the Ant Colony Optimization algorithm.
2.1. Overview
Optimization is the process of finding a solution with the highest value according to
a given criterion. For example: Figure 2.1 shows the two-dimensional sinc function
with its local and global optima. One could try to find the global maximum (x, y)∗ =
arg max(x,y)∈S sinc(x, y), with S = [−20, 20]×[−20, 20]. The set S is the search space,
the tuples (x, y) ∈ S are the candidate solutions or individuals and the function itself
is the objective or fitness function. To solve this problem one could make use of classic
optimization algorithms like Gradient Descent or Newton’s Method which employ the
∂f
gradient ∇f = ( ∂f
∂x , ∂y ) of a fitness function f to direct their search process. They
usually expect a start position s0 = (x0 , y0 ) ∈ S and use ∇f to create new and better
candidate solutions in the neighbourhood of s0 . After a number of iterations and
depending on the quality of s0 , they will eventually find a local or global optimum.
In order to achieve this they make assumptions about the function, in particular
that it is possible to calculate its gradient. Unfortunately, problems exist, where,
contrary to the abovementioned example, the fitness function is discontinuous, nondifferentiable and lacks a closed-form expression. In these cases one cannot make use
of classical search algorithms, but has to resort to different techniques.
Metaheuristics belong to the subfield of stochastic optimization [Luk09] and make
few or no assumptions about the problem at hand. They employ a certain degree of
randomness to find optimal or near-optimal solutions to hard problems. Algorithm 1
presents the skeleton of a simplistic strategy named Hill-Climbing (HC) which is the
metaheuristic equivalent to Gradient Descent. It starts with a given individual and
employs a mutation operator which makes small, random modifications to individuals. In case the fitness of the modified individual exceeds the one of the original,
HC accepts it as the current solution, otherwise it keeps the original. This process is
repeated until certain stopping criteria are met. The essential parts of metaheuristic
algorithms are now described using the example of the Knapsack Problem.
7
Figure 2.1.: The sinc function with local and global optima.
Algorithm 1: Hill-Climbing
1
2
3
4
5
6
7
8
8
Input: start
/* initial candidate solution */
Output: best individual found (local optimum)
begin
current ← start
repeat
ind ← mutate(current)
if f itness(ind) > f itness(current) then
current ← ind
until stopping criteria met
return current
2.1. Overview
The Knapsack Problem. This is a well-known optimization Problem and has been
shown to be NP-hard: Given a set I of items, a weight functionPw : I → R, a value
function v :P
I → R and a limit l ∈ R, find a subset K ⊆ I with u∈K w(u) ≤ l that
maximizes u∈K v(u). Intuitively, the problem is about filling a bag with items, so
that the value of the bag is maximal and its weight stays below a limit l.
Representation. In order to apply metaheuristics, one first needs to define what
the candidate solutions look like. This is an important step, since other parts of
the metaheuristic depend upon the structure of the solutions, like for example the
fitness function or the search operators. In the above example a candidate solution
is a set s = {u1 , u2 , · · · , un } ∈ P(I) of items. Many metaheuristics work with vector
representations, but more complex structures are possible. In Genetic Programming
for example, the candidate solutions are often represented as trees.
Fitness Function. The fitness function determines the quality of a candidate solution and is usually of the form f : S → R, where S is the search space. This function
plays a central role in the optimization process, since it guides the algorithm towards
the interesting regions of the search space [BSS02]. For the Knapsack Problem one
could define f as
P
f (s) :=
u∈s v(u)
p
P
, if
u∈s w(u) ≤ l
, else
where p ∈ R would be a small value to penalize infeasible solutions, i.e. solutions
where the bag’s weight exceeds l. As we can see here, the fitness function for this
problem does not have a closed mathematical form and in contrast to for example
the sinc function in Figure 2.1, it is not possible to calculate a gradient.
Good fitness functions often satisfy the smoothness criterion [Luk09]: Solutions that
lie close to each other in the search space, tend to have a similar fitness value. This
does not mean that the function needs to be as smooth as the one depicted in the
upper left of Figure 2.2, but it should not exhibit an extremely “hilly” character,
as the one depicted in the lower left. This criterion is not sufficient for a metaheuristic to perform well, since “deceptive” or “needle in a haystack” environments
are highly smooth, yet can be very challenging for algorithms like HC, since they
either lead it away from the optimum, or do not give enough information to direct
the optimization.
The search landscape defined by the fitness function usually dictates the applicability of certain classes of metaheuristics. For example: A simple local optimization
algorithm like HC would probably perform poorly on a multimodal landscape like
the one generated by the sinc function. Unfortunately, it is generally not possible to
visualize the entire landscape, for example due to high dimensionality. Hence, a lot
of experience is involved with choosing the appropriate algorithm.
9
Unimodal
Needle in a Haystack
Noisy
Deceptive
(or “Hilly” or “Rocky”)
Figure
Search
Landscapes
[Luk09].
Figure2.2.:
6 Four
example
quality functions.
Algorithm
10 Hill-Climbing
with Random
Restarts
Operators.
A metaheuristic
usually
employs one ore more operators. Operators are
1: Tfunctions
← distribution
of possible
time or
intervals
that create,
modify
select individuals. To apply HC, one needs to define
a mutation operator, which is also used by many other metaheuristics. This operator
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
S usually
← somemakes
initial small
random
candidate
solution
changes
to individuals.
Intuitively, it generates similar solutions
Best
←
S
in the environment of the given one. Algorithm 2 shows a possible implementation
repeat
of this operator for the Knapsack Problem.
time ← random time in the near future, chosen from T
repeat
Algorithm 2: Mutation operator for the Knapsack Problem.
R ← Tweak(Copy(S))
Input:
s
candidate
if Quality(R)
>/*
Quality(S)
thensolution represented as a fixed list */
Output:
S ← Rslightly modified copy of s
1 begin
until
S is the ideal solution, or time is up, or we have run out of total time
if
> Quality(Best) then
2 Quality(S)
s0 = copy(s)
Best
←
S
3
index ← random index within [0, length(s) − 1]
S
random
solution
4 ← some
s0 [index]
← candidate
random item
u∈I
0
until5 S is return
the idealssolution
or we have run out of total time
return Best
like Genetic
Algorithms
often make
use of long,
additional
operators
like
If Metaheuristics
the randomly-chosen
time intervals
are generally
extremely
this algorithm
is basically
crossover,
selection
and an
operator
forare
creating
initialwe’re
candidate
solutions.
one big
Hill-Climber.
Likewise,
if the
intervals
very short,
basically
doing random search
(by resetting to random new locations each time). Moderate interval lengths run the gamut between
the two. That’s good, right?
It Termination
depends. Consider
FigureSince
6. The
figure,
a situation where
Criterion.
onefirst
only
has alabeled
limitedUnimodal,
amount ofiscomputational
re- HillClimbing is close to optimal, and where Random Search is a very bad pick. But for the figure
sources, one needs to define a termination criterion, that is, a function that deterlabelled Noisy, Hill-Climbing is quite bad; and in fact Random Search is expected to be about
mines when to stop the optimization process. This can be quite difficult, since often
19
10
2.2. Ant Colony Optimization
the fitness of the best individual is unknown. Examples for termination criteria
are:
1. Maximum amount of time reached.
2. Maximum number of generated candidate solutions reached.
3. Maximum number of bad moves reached. A bad move is when the algorithm
generates a new individual, whose fitness does not exceed that of the current
maximal solution. For example: When HC reaches a local optimum, it will not
be able to improve any further.
The choice of the termination criterion can have a big impact on the efficiency and
effectiveness of the optimization process. If the process terminates prematurely, the
resulting candidate solution will not be of high quality. If the process runs too long
without making any improvements, it wastes resources.
Exploration versus Exploitation HC is a local optimization algorithm. This is because it searches for new candidate solutions only in a very small area, namely the
environment of the current solution. In addition, it only accepts solutions if they
are better than the current one, which means it cannot go “down the hill”. Such a
strategy is said to be exploitative and tends to get stuck in local optima. The other
extreme are explorative algorithms. A representative of this class is Random Search
(RS). RS generates candidate solutions completely at random with a uniform distribution over the search space. Both algorithms have their advantages and downsides.
For example: RS might be more successful in a “needle in a haystack environment”,
whereas HC would perform better in a unimodal environment as depicted in Figure 2.2. But quite often fitness landscapes like the one in Figure 2.1 require a hybrid
optimization algorithm that exposes both properties. This is where algorithms like
Simulated Annealing (SA) or Genetic Algorithms come into play. SA for example,
is a version of HC that allows it to go downhill for a certain amount of steps2 . This
way it does not get stuck in local optima as much as HC does and is considered to
be a global optimization algorithm. There is always a tradeoff between exploration
and exploitation in the design-process of a metaheuristic. This tradeoff depends on
the problem to be solved and is often difficult to figure out [Luk09].
This work adopts a metaheuristic technique named Ant Colony Optimization (ACO),
which has shown to be effective in solving hard combinatorial optimization tasks like
the Traveling Salesman Problem [DCG99]. The algorithm is inspired by the foraging
behaviour of ants. Figure 2.3 shows the double bridge experiment [GADP89], which
2
Depending on the current temperature τ .
11
Foraging
area
Foraging
area
Foraging
area
Nest
Nest
Nest
Figure 2.3.: Double bridge experiment with ants and a food source [DCG99].
sports an ant nest and a food source connected by paths of distinct length. It can be
observed, that after a short transitional phase, where the ants use all paths equally
(a), eventually the majority of the ants carries the food along the shortest path (b).
It was found that ants deposit a pheromone on the ground while walking. Their
walking direction, in turn, is influenced by other ants’ pheromone. The higher the
pheromone density on a certain path, the more likely it gets travelled. When the
experiment starts, no pheromone is on the ground and consequently the paths are
roughly travelled equally. The ants which took the shorter path, arrive earlier at the
food source. Once they picked up the food and prepare for return, they have to make
a decision on which path to take to go back to the nest. Since there is already some
pheromone on the short path, they tend to favor this one and hence, further increase
its pheromone density. This eventually makes the majority of the ants follow the
shorter path.
The Ant Colony Optimization algorithm uses a similar strategy, though without a direct equivalent for the ants. Algorithm 3 shows the basic approach. The metaheuristic is population-oriented, which means that it works with an entire pool of candidate
solutions. The solutions are called trails and a trail t = (c1 , c2 , · · · , cn ) ∈ C n consists of components c ∈ C from a component set (In the double bridge experiment
these components would relate to the edges of the paths). Each component ci is
associated with a pheromone value pi . In ACO the trails are constructed step by
step. Therefore, the algorithm iteratively selects components from the component
set. A Selection Rule determines how this is done. Usually, the components are
selected proportionate to their pheromone values. The overall procedure is as follows: The algorithm first creates a certain amount of trails, i.e. a population. After
12
the population has been generated, each trail is assessed with the help of the fitness
function. Then the pheromones of the components are updated. The Pheromone
Update Rule determines how this is done. Usually, components that are part of
high-rated trails, obtain better pheromone values than the ones that appear in lowrated trails. This leads to a higher utilization of those components within subsequent
generations. The optimization process usually starts with equal pheromone values
for each component. Thus, at the beginning of the optimization, it produces random
trails. Over time, certain components obtain higher pheromone values than others,
so that the optimization focuses on a certain area within the search space, hopefully
the one containing the best trail. It is important to understand that this does not
mean that toward the end only high-rated components are employed. In contrast to
algorithms such as Hill-Climbing, ACO is a global metaheuristic, which means that
it always samples from the entire search space and may always generate each possible
trail. However, the likelihood for the generation of trails with low-rated components
decreases over time. So essentially, the algorithm learns a probability distribution
over the search space [DS09] and ideally the area with the best trails has the highest
density.
Algorithm 3: Skeleton for the Ant Colony Optimization algorithm.
1
2
3
4
5
6
7
8
9
10
Input: C ← {c1 , c2 , · · · , cn }
/* Component Set. */
Input: p~ ← {p1 , p2 , · · · , pn }
/* initial pheromone values. */
Input: popsize
/* Number of trails in a population. */
Output: best trail found
begin
best ← repeat
for i ← 1 to popsize do
ti ← generateT rail()
/* select components based on p~ */
if f itness(ti ) > f itness(best) then
best ← ti
Update the pheromone px of each component cx , based on the fitness
values of the trails ti ∈ T in which the component appears
until stopping criteria met
return best
13
14
3. Related Work
This chapter presents brief descriptions of contributions related to the subject of
sequence generation for GUI applications.
Kasik and George: Toward Automatic Generation of Novice User Test Scripts.
Kasik and George[KG96] strive to generate novice user sequences by employing Genetic Algorithms. Their implementation scans the GUI to determine the set of alternative actions so that they can generate arbitrary feasible input sequences. They
reward sequences that stay on the same dialog, based on the observation that novice
users learn the behaviour of a GUI’s functions through experimentation with different parameters within the same dialog. Their program takes existing sequences
as input, into which the tester may insert a deviate command at the beginning or
end or somewhere in between. The sequence then gets extended with new actions at
the command’s index. The goal is to make the inserted subsequence look like it was
created by a novice user. Their implementation is also able to generate sequences
entirely from scratch. However, according to the authors this leads to quite random
results, which do not resemble novice user sequences. Their implementation offers
two possible modes: meander and pullback. Meander mode replays an existing sequence and turns control over to the Genetic Algorithm whenever it encounters a
deviate command. It does not return to the remainder of the sequence that follows
the deviate command. In pullback mode, the authors give reward for returning to
the sequence’s tail.
In order for the GUI scanning process to work, slight modifications need to be applied
to the SUT’s source code. The implementation works for applications that employ
Motif 1.2 on X11 to display their GUI. The authors do not mention which widget
types they support. However, they say that the GUI is operated with keystrokes only,
thus they seem not to consider mouse operations. Mutation and crossover operators
are not explained in the paper, but the user may provide his own implementation
for them. Since the type of used test applications is not mentioned, it is hard to tell
how well their implementation will perform on real world subject applications.
Huang et. al.: Repairing GUI Test Suites Using a Genetic Algorithm. Huang et
al. [HCM10] use Genetic Algorithms to fix broken test suites. Their work consists
of two steps: 1. Generating a test suite and 2. repairing the suite in case it contains
infeasible sequences. They work with an approximate model of the GUI called Event
Flow Graph (EFG). An EFG is a directed graph whose nodes are the actions that
15
3. Related Work
File
Open
Save
Help
Contents
About
Paste
Edit
Cut
Copy
Figure 3.1.: An Event Flow Graph of a typical main menu. The nodes correspond
to clicks on menu items.
a user can perform (e.g. clicks on menu items, etc.). A transition between action
x and action y means: y is available after the execution of x. Figure 3.1 shows an
EFG for the main menu of a typical GUI application. By traversing the edges of this
graph one can generate sequences offline. For example: When clicking on the menu
entry “Help”, a drop down menu appears, which contains the “About” entry, which
in turn can be clicked.
In the first step they try to find a covering array to sample from the sequence space
in order to generate their initial test suite. A covering array CA(N, t, k, v) is an
N × k array (N sequences of length k over v symbols). In such an array all ttuples over the v symbols are contained within every N × t sub-array. So, instead
of trying all permutations of actions (which are exponentially many), only the set
of sequences that contains all t-length tuples in every position are used as the test
suite. The parameter t determines the strength of the sampling process. Their array
is constrained by the EFG, meaning that certain combinations of actions are not
permitted. Since it is hard to find such a constrained covering array, they employ
a special metaheuristic based on Simulated Annealing [GCD09]. This way they get
their initial test suite which, due to the fact that the EFG is only an approximation
of the GUI, contains infeasible input sequences. For example: In Figure 3.1 we could
generate s = (Edit, P aste). However, since in most applications the paste menu
entry is disabled or invisible until a copy operation has occurred, the execution of s
is likely to fail.
16
In step two they identify and discard the infeasible sequences. By doing that they
lose coverage with respect to the covering array. Hence, they use a Genetic Algorithm
which utilizes the EFG to generate new sequences offline, which will then be executed
and rewarded depending on how many of their actions are executable and on how
much coverage they restore. Infeasible sequences are penalized with a static value.
The authors employ the GUITAR 3 framework to execute their sequences. The EFG
is generated automatically with the help of a GUI-Ripper [MBN03], but requires
human verification, because it might be incomplete. The approach has been tested
on small-sized synthetic subject applications, where the set of considered actions has
been restricted to clicks on button controls.
Andrews et al.: Testing Web Applications by Modeling with FSMs. Andrews et
al. [AOA05] test web applications with the help of hierarchical finite state machines
(FSMs). They model large web applications by building hierarchies of FSMs for
their subsystems, which they annotate with input constraints to reduce the amount
of possible inputs. They go on and derive sequences from the individual FSMs
to combine these and form complete test sequences. They created a simple web
application that they use as the SUT and generate test suites that satisfy node or
edge coverage. The annotated FSM hierarchy has to be modeled by hand prior
to using their framework. They provide a rough description on how this might be
achieved in an automatic way, but generally leave this problem for future research.
Artzi et al.: A Framework for Automated Testing of JavaScript Web Applications. Artzi et al. [ADJ+ 11] perform feedback-directed test generation for JavaScript
web applications. Their objectives are to find test suites with high code coverage
as well as sequences that exhibit programming errors, like invalid-html or runtime
exceptions. They developed a framework called Artemis which is able to trigger sequences of events by calling the appropriate event handlers and supplying them with
the necessary arguments. For the generation of the suites, they use prioritization
functions to focus on event handlers with low coverage. Their framework requires
access to the SUT’s source code, including any server-side components. In their
experiments they used small-sized web applications.
Marchetto and Tonella: Using search-based algorithms for Ajax event sequence
generation during testing. Marchetto and Tonella [MT11] generate test suites for
AJAX web applications using Hill-Climbing and Simulated Annealing. They execute
the applications to obtain an approximate model in the form of a finite state machine.
The states in this machine are instances of the application’s DOM-tree (Document
Object Model) and the transitions are events (messages from the server, user input).
From this FSM they can obtain the set of semantically interacting events. Two events
3
http://sourceforge.net/projects/guitar/
17
3. Related Work
e1 and e2 interact semantically, if states s0 , s1 , s2 exist, so that swapping the order of
the events upon execution, brings the system to a different state, i.e.: s0 →(e1 ;e2 ) s1
and s0 →(e2 ;e1 ) s2 , where s1 6= s2 . Their goal is to generate test suites that consist
of maximally diverse event interaction sequences, that is, sequences where each pair
of consecutive events is semantically interacting. Hence, they define several fitness
functions to describe the diversity of a test suite. They start with suites consisting
of short (length-2) event interaction sequences and use Hill-Climbing or Simulated
Annealing and their fitness functions to extend these sequences and thus the test
suites. For the construction of their FSM they employ execution traces generated
by humans, as well as static code analysis. Therefore, they need to instrument the
source code of the test applications. Since the resulting FSM is not guaranteed to be
complete or correct, it needs additional verification. They perform a case study with
two medium-sized AJAX applications with injected faults. For the execution of their
generated sequences they use the web testing tool Selenium 4 . In order to provide
input for text boxes and the like, they use a database of input values generated from
the input traces.
4
http://seleniumhq.org/
18
4. The Approach
This chapter presents the central idea of this thesis, the sequence generation process.
It is the starting point for the following chapters, which discuss the individual steps
of this process in detail.
4.1. Overview
Figure 4.1 shows the process applied in this work in order to generate test sequences.
It works as follows: 1) The SUT is started and 2) instrumented. This step is necessary
to obtain the fitness value of the generated sequence later on. 3) In order to be able
to generate actions, it is necessary to find the visible widgets and determine their
properties. Without this information we would not know where to click, where to
type text and so on. So the process gathers the widgets’ positions, their size and
their state, i.e. whether they are enabled and focused, etc. 4) From this information
one can derive a set of “reasonable” actions. For example: If there is a visible
and enabled button, placed on a foreground window and not covered by any other
control, then this button is clickable. And since the button’s coordinates are known,
one could perform a click on its center. But of course there may be various other
controls which could be clicked, right clicked or dragged. There might be a text
control which is currently focused so that text may be typed into it, etc. This leads
to an entire set of alternative actions which can be performed. 5) At this point the
process needs to make a decision about the action to be executed. This is a very
important step, because it is desirable to select promising actions which are likely
to produce sequences with high fitness values. 6) After an action has been selected
it is executed, that is, a click on a menu item is performed or text is typed or a
scrollbar is dragged, etc. Every time this happens, the state of the GUI possibly
changes, meaning that new controls appear, other controls disappear or change their
position or other attributes. Hence, the process needs to go through steps 3) to
6) again, in order to execute the next action. This way it is possible to generate
sequences of arbitrary length. Once a certain amount of actions has been executed,
the process continues with step 7) where it determines the quality of the generated
sequence with the help of the fitness function. 8) Now the SUT is stopped. Up to this
point we have successfully generated an input sequence and obtained its fitness value.
9) This information can be exploited to learn about promising and less promising
actions. For example: If the generated sequence obtained a high fitness value, then
19
4. The Approach
1)
start
SUT
9)
2)
3)
4)
instrument
SUT
scan
GUI
derive
actions
no
learn
rate
sequence
stop
SUT
8)
yes
5)
desired
length?
7)
select
action
execute
action
6)
Figure 4.1.: The sequence generation process.
the actions within this sequence might be suitable for the generation of other highrated sequences. On the other hand, if the sequence obtained a low fitness value,
then its actions might be less likely to produce good sequences. This information can
be saved to make better decisions in step 5) in the future. Now the whole process is
repeated, i.e. steps 1) to 9) are executed again. On each iteration the process learns
more about the SUT’s actions. Ideally, the resulting sequences get better and better
so that the best sequence will be found.
4.2. Description of the individual Steps
1) Preparing and Starting the SUT. In order for the sequence generation process
to work properly, it is necessary to assure that the SUT is always in the same initial
state when started. If this is not the case, then the execution of identical sequences
might lead to different results, which would disturb the optimization process. Therefore, it is necessary to delete all temporary, setup or document files created during
the last run. For example: The CTE saves settings files in its ./workspace directory,
which contain information about the last edited file or the size and position of editor
windows, etc. It will use this data to restore the settings from the last run. Furthermore it saves newly created files in the users home directory by default. The default
name for new files is default.cte, but if this file already exists – for example because
it was created during an earlier run – the name will be changed to something like
default1.cte. This will have an effect on the caption property of tabs which contain
newly created files and so forth. Step 1) takes care of deleting all of these files before
starting the SUT’s executable.
20
4.2. Description of the individual Steps
2) Instrumentation. In this step the SUT is instrumented, which is necessary for
the fitness function used in this work. This function is based on the dynamic CTS
criterion [MM08], which considers internal activities within the SUT. Therefore, the
SUT’s bytecode needs to be modified. Since this work targets Java applications,
the instrumentation can be accomplished during runtime. It would also be possible
to perform this step only once, before the start of the sequence generation process.
Therefore, all of the SUT’s modules would need to be modified. However, Java is
a dynamic language and the SUT might create classes during runtime or download
them from the internet. Consequently, these classes would not be instrumented
with this approach. Moreover, the runtime overhead associated with the dynamic
instrumentation technique used in this work, turned out to be small.
In case the applied fitness function does not require the SUT to be instrumented,
this step is not necessary.
Once the SUT has been initialized and the main window has popped up, the process
continues with step 3).
3) Scan GUI. In this step all visible widgets and their properties need to be determined. The technical feasibility of this step depends on the GUI framework that
the SUT is based on. The CTE uses the Standard Widget Toolkit which provides
all necessary methods to access the used widgets. However, custom widgets – e.g.
graphics – which are not maintained by any framework, might not be accessible and
cannot be included in the testing process.
4) Derive Actions. The information gathered in the previous step is used to derive
the set of alternative actions. Basically, it is always possible to perform clicks anywhere within the GUI of the SUT or to simulate arbitrary keystrokes, respectively.
However, it is desirable to form a set of “reasonable” actions. For example: It is
probably rather ineffective to click disabled menu or tool items and a click on the
upper right of a button is quite likely equivalent to a center click on this button.
The larger the set of alternative actions, the larger the search space (i.e. the amount
of possible input sequences) and the harder the optimization problem. Hence, this
set should be as small as possible. Ideally, it only contains the actions which expose
the faults of the SUT.
5) Select Action. This is were the optimization algorithm comes into play. It
considers the information learned from the fitness values of earlier sequences and
selects a promising action. Usually, step 5) and 9) cooperate in order to accomplish
this goal. This work makes use of the Ant Colony Optimization algorithm, though
other algorithms are conceivable.
21
4. The Approach
6) Execute Action. The selected action is executed and saved as part of the generated sequence. In order to identify, save and replay an action, it must be given a
unique name.
Steps 3), 4), 5) and 6) are repeated until the desired sequence length is reached.
7) Rate Sequence. This step uses the fitness function to assign a fitness value to
the sequence that has been generated.
8) Stop. In this step the SUT is terminated. In case the generated sequence caused
the SUT to crash or hang, it may be saved to a special directory for “suspicious”
sequences. The implementation presented in this work is also able to parse the
standard output and standard error streams of the SUT. In case these streams contain
“abnormal” output – where the definition of abnormal needs to be specified by the
tester – the sequence is also considered suspicious. These facilities provide a coarsegrained test oracle.
9) Learn. This step is optional and its application depends on the optimization
algorithm used during sequence generation. The dotted lines indicate that the step
may be performed after each iteration. However, it is also possible to first generate several sequences before performing this step. This is what the proposed ACO
algorithm, which will be introduced in the next chapter, does.
As stated above, steps 5) and 9) work together to improve the generated sequences
over time. Step 9) usually takes the rated sequences and performs a learn operation
to make better action selections in the future.
Steps 1) through 9) are repeated until a certain termination criterion is met. At the
end of the process the best sequence found will be returned.
4.3. Applying the Approach
The presented approach may be applied to different SUTs, independent of the operating system, the programming language that the SUT was developed with and the
GUI framework that it is based on. In order to apply the process to the SUT used in
this work, i.e. the Classification Tree Editor, it is necessary to specify certain parts
more precisely, specifically
1. the optimization algorithm, which corresponds to steps 5) and 9),
2. the fitness function, which corresponds to step 7)
5
5
and step 2), for its implementation, which will be presented in chapter 8.
22
4.3. Applying the Approach
3. and the steps 3), 4) and 6), used to operate the GUI of the SUT.
The following three chapters will elaborate these steps. The resulting implementation
will be presented in chapter 8 and will target Java applications based on the Standard
Widget Toolkit, running on Microsoft Windows XP.
23
4. The Approach
24
This chapter explains and motivates the application of the ACO algorithm in the
context of the sequence generation process.
5.1. The Concept
When used in the context of sequence generation, a possible application of ACO
could look as follows: The ACO component set C corresponds to the set of all
actions that can be performed on the SUT. The ACO trails, in turn, correspond to
the input sequences of the SUT. The idea is to assign a pheromone value to each
action and prefer actions with a high pheromone value during the trail construction.
Figure 5.1 highlights the steps of the sequence generation process where ACO is
applied. Step 5) relates to the Selection Rule mentioned in chapter 2. In step 9)
the Pheromone Update Rule is applied. The optimization process starts with equal
pheromone values for each action, hence, at the beginning it effectively produces
random sequences. Over time, certain actions obtain higher pheromone values than
others, so that the optimization focuses on certain areas within the search space,
hopefully the ones that contain the best sequences.
Representation Let A be the set of all actions that are executable on the SUT.
The ACO component set is C = A. A trail is a tuple t = (a1 , a2 , ..., an ) ∈ An which
corresponds to a length-n input sequence.
Selection Rule (Step 5). There are many selection rules available for the ACO
metaheuristic [DB05]. A common strategy is the proportionate random selection
rule, where the components ci are selected at random, but proportionate to their
pheromone pi . Essentially, this strategy samples from a univariate probability distribution over the available actions. Another option is to always pick the action with
the highest pi , which is an extremely exploitative strategy.
This work adopts a policy called pseudo-random proportionate selection [DB05],
which is a combination of the two abovementioned strategies. It selects with probability ρ the best action available and with probability 1 − ρ it performs a random
proportionate selection.
25
1)
start
SUT
9)
2)
3)
4)
instrument
SUT
scan
GUI
derive
actions
no
learn
rate
sequence
stop
SUT
8)
7)
yes
desired
length?
5)
select
action
execute
action
6)
Figure 5.1.: The sequence generation process. Steps 5) and 9) correspond to the
ACO Selection Rule and the Pheromone Update Rule, respectively.
Pheromone Update Rule (Step 9). The update of the components’ pheromone
values corresponds to step 9) of the sequence generation process. The works of Dorigo
et al. [DB05, DCG99] provide a detailed overview and descriptions of the various
update rules. In this work this step is processed after all trails within a population
have been constructed, that is after each generation. The dotted lines in Figure 5.1
between steps 8), 9) and 1) indicate, that step 9) and thus the pheromone update,
is not performed after each sequence construction, but after an entire generation of
sequences.
The components’ pheromones are updated with the fitness values of the trails that
they appear in. The rule applied in this work makes use of a learning rate α and
is listed in Algorithm 4. The algorithm calculates the average fitness score xrii of a
component ci within the current population. Then the corresponding pheromone
pi is updated as pointed out in line 11. Pheromones of components that do not
appear in any trail, are not updated. The higher the learning rate α, the more the
pheromone values are influenced by the results of the current population.
Instead of using all generated sequences in the population for the pheromone update,
we only select the k best-rated ones, as proposed by Dorigo et al. [DB05]. The
parameter k may be used to adjust the behavior of the optimization algorithm.
Termination Criterion The current termination criterion is simply a limit on the
number of generations.
26
5.2. Motivation
Algorithm 4: Pheromone update rule with learning rate [Luk09].
1
2
3
4
5
6
7
8
9
10
11
12
Input: C ← {c1 , c2 , · · · , cn }
/* components.
Input: p~ ← hp1 , p2 , · · · , pn i
/* pheromone values.
Input: T ← {t1 , t2 , · · · , tpopsize }
/* current population.
Input: α
/* learning rate
Output: updated p~
begin
~r ← hr1 , r2 , · · · , rn i /* total component scores, initially 0.
~x ← hx1 , x2 , · · · , xn i
/* component counts, initially 0.
for each tj ∈ T do
for each ci ∈ C do
if ci was used in tj then
ri ← ri + f itness(tj )
xi ← xi + 1
for each pi ∈ p~ do
if xi > 0 then
pi ← (1 − α) · pi + α · xrii
return p~
*/
*/
*/
*/
*/
*/
5.2. Motivation
Many different metaheuristic algorithms have been proposed, among which are Simulated Annealing, Tabu-Search, Genetic Algorithms and Evolution Strategies, to
name a few. These techniques have shown to be successful in finding good solutions
to hard problems [McM04]. However, each algorithm is well-suited for particular
problem classes, but may be ineffective for others which is what the No-Free-LunchTheorem [WM97] states. The application of ACO is now motivated for the particular
problem of generating test sequences to GUIs.
Let us define the problem more formally: We want to generate length-n sequences
s = (a1 , a2 , . . . , an ) ∈ An where A denotes the set of all actions that are executable
on the SUT. In this case the search space is An and the s ∈ An are the candidate
solutions. Some actions are only available in certain states of the SUT, hence, not all
permutations of actions (input sequences) are executable. Therefore, we distinguish
feasible and infeasible sequences. Let us denote the set of feasible sequences by
Sf easible . Then we have that Sf easible ⊆ An . Our goal is to find a sequence s∗ with
s∗ = (a1 , a2 , · · · , an ) ∈ Sf easible
=
arg max f itness(s)
s∈Sf easible
As we will see later on, this work employs a fitness function which needs to execute
a sequence in order to obtain its fitness value, thus the sequence must be feasible.
27
For infeasible sequences we could assign a penalty value, which is what Huang et al.
[HCM10] do. However, this leads to the following problem: Let s1 = (a, b, c, d) ∈
Sf easible and s2 = (z, b, c, d) ∈ An \Sf easible . The two sequences lie close to each other
in the search space, because they are almost identical. But unfortunately we have
that f itness(s2 ) f itness(s1 ), because s2 is infeasible and is assigned a poor fitness
value. So in this case the fitness function does not expose the smoothness criterion
mentioned earlier. Of course, it is not always necessary or even possible for a fitness
function to expose this criterion, but this might complicate the optimization process.
This is a common problem in search spaces where the set of solutions is subject to
hard constraints [Luk09]. Due to the mentioned problems, it is quite desirable to
avoid infeasible sequences and only adopt Sf easible as the search space.
The algorithms mentioned at the beginning of this section, usually employ a mutation
operator as presented in the introduction. A possible implementation of this operator
for input sequences could for example substitute one or more actions to obtain a
slightly different solution. However, this introduces the following problem: Since we
do not possess an exact model of the GUI, we do not know which actions can be
substituted for others to obtain a different, yet feasible sequence. For example: If
we substituted the action “clickMenu(“Print”)” in Figure 1.1 for a different action,
then the remainder of the sequence would become infeasible, since the print dialog
would never open. In the presence of constraints, the mutation operator is usually
destructive and difficult to implement [Luk09].
Ant Colony Optimization avoids these problems, because it constructs its solutions
step by step. It adopts Sf easible as its search space, which means that it generates
valid sequences only.
5.3. Adjusting the Metaheuristic Optimization
There is always a tradeoff between exploration and exploitation. The more explorative the algorithm is, the more it resembles a Random Search algorithm. The more
exploitative it is, the more it resembles a Hill-Climbing approach. The presented algorithm offers a set of parameters that can be used to tune the search for good input
sequences. Since we do not know what the structure of the search space (i.e. the
sequence space) looks like, we do not know the ideal parameter values. Thus we
have to experiment with different parameter combinations. The following list gives
a short overview of these parameters and the effects they can have.
Default Pheromone Values. It is difficult to predict the impact of this parameter
on the optimization process. If the initial pheromone values are high, then the
algorithm probably tends to be more explorative, especially at the beginning
of the optimization process [DS09]. This is due to the fact that, even if the
algorithm finds very good actions and increases their pheromones, other actions
28
5.3. Adjusting the Metaheuristic Optimization
will still have quite large values and hence are still likely to be selected. This
also means that the algorithm does not converge as fast.
ρ. This parameter affects the pseudo-proportionate random selection rule. The
higher its value, the more likely the algorithm will pick the action with the
highest pheromone value. This leads to a rather exploitative and local search
strategy [DS09].
α. This is the learning rate which determines how much the algorithm learns from
the solutions generated in the current generation, as it is described in line 11
of Algorithm 4. The higher its value, the more the algorithm tends to “forget”
about earlier generations and instead adjusts the pheromone values according
to the current one. Setting α = 1.0 will cause the algorithm to update the
pheromones using the fitness values of the current generation only. The lower
the value, the more time the algorithm will take to converge. This can be more
explorative. Setting α = 0 will result in a Random Search algorithm6 .
k. This is the parameter that determines which sequences are used for the pheromone
update. If k > 0 then only the top k sequences of the current population will
be used. If k = 0 then all sequences will be used. Higher values might result
in a more exploitative behavior [DS09].
6
Provided that the initial pheromone values are all equal.
29
30
This chapter presents the Call Tree Size criterion for input sequences and defines
and motivates the fitness function used in this work. This function corresponds to
step 7) of the sequence generation process as highlighted in Figure 6.1.
6.1. Definition
This work adopts the Call Tree Size (CTS) metric [MM08]. The goal is to find
sequences, which generate a large call tree upon execution on the SUT. A call tree is a
structure that displays calling relationships among methods of an executed program.
Each node represents a method. A directed edge between two nodes f and g means,
that the method represented by f called the method represented by g in the context
f . For example: Figure 6.2 shows a simple Java program which takes a list of
numbers and outputs their mean as well as their sample variance. The calc()method only calculates the mean and variance if more than one argument is given
(line 13), otherwise the mean is set to the value of the first parameter and the
variance is set to 0 (line 16). Hence, different inputs cause different sets of methods
to be called. This is reflected in Figure 6.3 where the two invocations of the program
result in two distinct call trees. In the second scenario the mean()-method is called
two times, namely in the contexts of calc and var. The goal is to generate call trees
of large size, more specifically call trees with many leaves. Since the second tree has
more leaves than the first, input b) is preferred over input a).
The call trees in Figure 6.3 are only simplified versions of the original ones, because
they lack some of the nodes that are generated by the Java library. Depending on the
implementations of parseDouble(), println() and pow(), additional nodes could
be involved. Aside from these, the original tree would also contain static initializers,
class loader methods, Virtual Machine initialization, shutdown and garbage collector
methods. Many of these are executed within distinct threads, which causes even
a simple program like the one in Figure 6.2 to be multithreaded. So the shown
call trees are actually only thread call trees of the main thread. To obtain the full
call tree the trees of the different threads will be merged into a single program call
tree. Therefore, an artificial root node, which connects all these trees, is introduced.
31
1)
start
SUT
9)
2)
3)
4)
instrument
SUT
scan
GUI
derive
actions
no
learn
rate
sequence
stop
SUT
8)
yes
5)
desired
length?
7)
select
action
execute
action
6)
Figure 6.1.: The fitness function corresponds to step 7) of the sequence generation
process.
Figure 6.4 depicts the merging process 7 . If two nodes in distinct trees have the same
predecessors, they will be merged.
Of course the discussed example is not a GUI application, but for GUIs the idea
is similar: Depending on the type of the actions executed on the SUT, different
methods in different contexts are invoked. For example: A sequence that prints out
the currently opened document, will address different functionality than a sequence
that navigates the settings dialog. So distinct sequences will most likely result in
distinct program call trees of distinct size. Even the order of actions in the sequence
can have an effect on the resulting call tree.
A first attempt to define the fitness function could look as follows:
f itness : Sf easible → N
f itness(s) := Number of leaves of the program call tree generated by s
where s ∈ Sf easible is a feasible length-n-sequence as defined in chapter 5. In each
iteration of the sequence generation process the SUT is started and performs a lot of
initialization work until the GUI is fully initialized and ready for use. So before the
first action is executed, the call tree has already reached a significant size. In this
work we are only interested in the leaves that are generated during the execution
of an input sequence. Thus it is necessary to alter the abovementioned definition.
To do this we determine the value CT Sstart , i.e. the number of leaves right before
7
This figure is only for comprehension. The implementation presented in chapter 8, does not
record separate thread call trees, but generates the program call tree all at once.
32
6.1. Definition
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
public c l a s s S t a t {
double [ ] data ;
public s t a t i c void main ( S t r i n g [ ] a r g s ) {
S t a t s = new S t a t ( a r g s ) ;
s . calc ();
}
public S t a t ( S t r i n g [ ] a r g s ) {
data = new double [ a r g s . l e n g t h ] ;
f o r ( int i = 0 ; i < a r g s . l e n g t h ; i ++)
data [ i ] = Double . p a r s e D o u b l e ( a r g s [ i ] ) ;
}
public void c a l c ( ) {
i f ( data . l e n g t h > 1 ) {
System . out . p r i n t l n ( mean ( ) + " , " + var ( ) ) ;
} else {
System . out . p r i n t l n ( data [ 0 ] + " , " + 0 . 0 ) ;
}
}
double mean ( ) {
double r e t = 0 . 0 ;
f o r ( double d : data ) r e t += d ;
return r e t / data . l e n g t h ;
}
double var ( ) {
double r e t = 0 . 0 ;
f o r ( double d : data ) r e t += Math . pow ( d − mean ( ) , 2 ) ;
return r e t / data . l e n g t h ;
}
}
Figure 6.2.: Java program which takes a list of numbers and outputs their mean and
sample variance.
33
a) java Stat 1.0
b) java Stat 1.0 2.0 3.0
main
main
calc
Stat()
calc
println
parse
Double
println
.
.
.
.
.
.
.
.
.
mean
mean
Stat()
var
parse
Double
.
.
.
pow
.
.
.
Figure 6.3.: Simplified call trees for two different inputs to the program in Figure
6.2 (the dots indicate additional Java library methods, which have been
omitted).
root
Thread 1
Thread 2
Thread 3
run1()
run1()
run2()
f1()
f2()
f2()
f3()
+
f1()
f2()
+
f3()
f5()
run1()
run2()
f3()
f1()
f2()
f3()
f4()
f2()
f3()
f4()
f6()
f5()
f6()
Figure 6.4.: Merging multiple thread call trees into a single program call tree.
34
6.2. Motivation
the execution of the first action. This can be done between steps 2) and 3) of the
sequence generation process. After the execution of the final action the number of
leaves is counted again in order to obtain CT Send . The fitness value of a sequence
s ∈ Sf easible is thus
f itness(s) := CT Send − CT Sstart .
6.2. Motivation
There are many criteria which can be used to define the quality of test sequences.
Common ones are code coverage criteria, like instruction or branch coverage. Other
examples are event handler coverage or criteria defined on models of the GUI. For
Event Flow Graphs or Finite State Machines one could count the amount of covered
nodes or edges [MSP01, Str06]. The main reason for adopting the CTS metric is
an experiment conducted by McMaster and Memon [MM08]. Their goal was to
minimize existing GUI test suites. Since a large suite takes a lot of time to execute,
which might not always be available – for example in regression test scenarios –
it is desirable to reduce its size. Ideally, the reduced suite should be as effective
in revealing faults as the original one. The authors started off with a given test
suite for a program. They executed each test sequence to obtain the program call
trees, which they merged into a single large suite call tree to obtain the CTS metric
for the entire suite.8 Then they went on to remove those test cases which did not
cause the tree to shrink significantly. This means that they kept only the sequences
which contributed the majority of the leaves. After this process, they calculated the
error detection rate of the original and reduced version and found that the latter
still revealed most of the known faults. They tested this approach on four distinct
Java applications, three of which had GUIs, and found that this technique let to a
measurable size reduction, while maintaining a high fault detection rate [MM08].
McMaster and Memon argue that the effectiveness of the CTS criterion stems from
its ability to capture context information. A large part of a GUI’s functionality is
usually implemented with the help of event handlers. Since GUIs often allow for
many different ways to access this functionality, these handlers may be triggered
in a variety of different contexts, each time possibly exhibiting a slightly different
behaviour. For example: Considering again the second call tree in Figure 6.3, we
can see that the method mean() is called in two different contexts. Calling a method
in different contexts can lead to different behaviour, due to different arguments or
internal state9 . So a broader call tree, i.e. one with a large amount of leaves, generally
8
Instead of using the notion of call trees the authors counted the number of distinct Maximum
Call Stacks (MCS). An MCS is a path through a call tree, starting from the root node and
ending at one of its leaves. So the number of MCSs generated by a sequence is equivalent to the
amount of leaves of its call tree.
9
In the given example, however, mean() always behaves the same way.
35
tends to have more contexts and hence tests more aspects of the SUT than a call
tree with fewer leaves. Of course this is only an assumption, but their experiment
shows that CTS can be suitable for test suite reduction [MM08].
In additon to these results the authors argue that the metric is particularly well
suited for GUI tests, due to the following reasons:
Libraries. Modern GUI applications usually comprise one or more third-party modules, like for example a GUI framework. The call tree can be collected for the
entire application. So instead of only testing first-party source code, the whole
system is considered during the test. This is interesting due to two reasons: 1)
Third-party modules could contain faults, too. 2): Errors might result from a
wrong or unintended usage of these modules.
Efficiency. It is possible to obtain a call tree for an application without introducing
excessive overhead. This remains valid even in multithreaded environments,
which is important, because many modern GUI applications are multithreaded,
due to the fact that they need to compute results and react to user inputs at
the same time.
Ease of Implementation. To collect the call tree of an application only requires
method entry and exit hooks. Those hooks often exist on most compilers or
runtime environments to allow for the application of profilers. This means
that this technique is not only restricted to Java applications. In addition, the
source code of the SUT is not required.
Due to these reasons and results the CTS metric is adopted for this work.
36
This chapter addresses the problem of generating and executing actions on the SUT,
that is, it elaborates steps 3), 4) and 6) highlighted in Figure 7.1.
Figure 7.2 shows a screenshot of the Classification Tree Editor 10 (CTE). The CTE is
the SUT for this work and is used to assess the performance of the resulting framework in the experiment chapter. It is a Java application with a GUI based on the
Standard Widget Toolkit 11 (SWT). It is a graphical editor designed for building and
modifying classification trees and offers functionality to derive test cases from them.
Its interface comprises classical controls like buttons, menu items, scrollbars as well
as custom ones to display the classification trees. Although this thesis concentrates
on the CTE as the main test application, the framework developed in this work can
be applied to other Java SWT applications as well (for example Eclipse 12 ). The
CTE’s target platform is Microsoft Windows XP and hence this work focuses on this
system, too. However, the functionality presented in this chapter is not limited to a
particular platform, programming language or toolkit.
7.1. Scanning the GUI
Before we can actually issue clicks, type in text, drag scrollbars, etc., we need to
determine the current state of the GUI, more precisely, we have to find the visible
widgets and their properties. For example: In order to click on a button control,
we first need to find the corresponding SWT Button object. After that we can go
on to determine its screen coordinates and dimensions, with which we could already
issue a click, for example to the center of the button. However, sometimes it is
necessary to check additional properties of a control before executing an action. So
one could check whether it is enabled, because clicking on a disabled button will
most likely not have any effect. Of course complex GUIs consist of many controls
and since we want to consider all possible actions, we need to find all these controls
and determine their properties. To capture the state of the entire GUI, this work
employs a structure called widget tree. In a widget tree each node corresponds to
a control element and its properties (which makes it somewhat similar to a DOM
tree of an HTML document). Figure 7.3 shows a simple SWT application and the
10
http://www.berner-mattner.com/en/berner-mattner-home/products/cte/
index-cte-ueberblick.html
11
http://www.eclipse.org/swt/
12
http://www.eclipse.org
37
1)
start
SUT
9)
2)
3)
4)
instrument
SUT
scan
GUI
derive
actions
no
learn
rate
sequence
stop
SUT
8)
yes
5)
desired
length?
7)
select
action
execute
action
6)
Figure 7.1.: Steps addressed in this chapter.
corresponding widget tree. The root element is an object called Display, which is
the main object of an SWT application. This object is not actually visible, but
it provides access to important system information, like the screen resolution, the
current control under the cursor, etc. All other controls of the application are its
direct or indirect children. In SWT a window is called a Shell. Figure B.3 in the
appendix shows additional SWT widgets and their names. In the example we can
see that the widget tree exhibits the GUI’s hierarchical nature and that each node
belongs to one of the widgets. The grey boxes indicate that each node also contains
the property values of the control it represents. For example: The button has a
caption property and a bounding rectangle, the text element has a text-property, to
name only a few. All of these properties can be accessed with the public methods
provided by the SWT objects 13 .
Since the state of the GUI changes throughout the execution of an input sequence,
it is necessary to create a new widget tree after each performed action. For example:
Clicking on the “File” menu item of the example program causes a drop down menu
to appear. Figure 7.4 shows how the widget tree changes and now contains nodes
for the various menu items within this drop down menu. In additon, the property
values of control elements may also change during execution: Dragging the thumb
of the slider control will change its bounding rectangle, etc.
Of course the program in Figure 7.3 is very simple and merely contains standard
widgets offered by the operating system’s window manager. For a more complex
application like the CTE, the widget tree is usually much larger. For example:
The widget tree for the screenshot in Figure 7.2 can be found in the appendix in
13
http://www.eclipse.org/swt/javadoc.php
38
7.1. Scanning the GUI
Drawing Area
Test Case Table
Figure 7.2.: Drawing area and test case table of the CTE.
Figure B.2. At the bottom we can see that it also contains the figures displayed
in the drawing area, which are custom widgets (framed green in the screenshot).
These figures are no SWT widgets, but are rendered to a SWT DrawingCanvas
object by employing the Draw2d 14 framework. It is possible to access each figure
and its properties via the DrawingCanvas object, because Draw2d cooperates with
SWT. But not all objects that humans perceive as widgets, are also prgrammatically
accessible. For example the test case table under the drawing area (framed red), is
a custom coded “control”. Since there is no framework which provides easy access to
the individual parts of the table, it is not possible to obtain information about it and
consequently it will be hard to include it in the testing process. The implementation
presented in this work only recognizes widgets maintained either by SWT or Draw2d.
But of course it would be possible to support additional frameworks, like Swing, the
WinAPI, GNOME, KDE or MacOSX’s Cocoa framework, to name a few. However,
a widget can only be inspected if it is maintained by a framework that provides
programmatic access to it. If this is not the case, it will be difficult to find this
widget and determine its properties.
14
http://www.eclipse.org/gef/draw2d/index.php
39
Display
Shell
Button
Scale
Text
MI
caption: "Button"
enabled: true
visible: true
hasFocus: true
rect: [180,100,260,130]
...
Menu
MI
MI
MI
caption: "File"
enabled: true
...
Figure 7.3.: Simple SWT application and the corresponding widget tree.
Display
Shell
Button
Text
Scale
MI
caption: "New"
enabled: true
...
Menu
MI
MI
MI
MI
MI
MI
Menu
MI
MI
MI
MI
MI
Figure 7.4.: Changed widget tree after a click on the “File” menu item.
40
MI
7.2. Deriving Actions
This section deals with step 4) of the sequence generation process. Whenever a
human looks at the graphical interface of the application he is working with, his
intuition tells him which actions are reasonable and which are not. He knows that
scrollbars need to be dragged, that buttons and menu items need to be clicked and
that it is possible to type text into a text box once it has the focus. Figure 7.5
shows examples of actions that could be performed on the CTE. Figure 7.6 depicts
another scenario, where a modal dialog blocks input to the underlying windows and
the amount of “reasonable” actions is drastically reduced. Again, the user employs his
intuition when considering actions. Of course he cannot know with a 100% certainty
what will happen if he clicks a disabled button or random locations within a static
label, because he does not know the application’s implementation. But his experience
with other applications tells him that most likely nothing will happen. This work
takes the same heuristic approach when considering the set of alternative actions.
Table 7.1 lists various SWT widgets and possible action types and the necessary
preconditions. The corresponding widgets can be found in the appendix in Figures
B.3 and B.4. The first row lists general preconditions for actions. In order to click
on a widget, or type in text, or drag, the widget itself and all its ascendents in the
widget tree, need to be visible and enabled.
widget type
all
#
/
action type
all
description
/
Button
(includes
check and
radio buttons)
1
2
execute
context menu
e.g. center click
e.g. right click
Text,
StyledText
3
set focus
activate the text
control, so that it
can receive keyboard input
preconditions
the widget itself
must be visible and
all parents in the
widget tree enabled;
for
actions
that
generate clicks, the
widget must not be
covered by others;
no modal window
exists, which blocks
input to the window containing the
control
a menu is associated
with the control
41
widget type
#
4
action type
type character
Text,
StyledText
5
6
type
capital
letter, number
or special sign
type word
7
mark all
8
delete
9
move
10
set focus
11
toggle size
12
drag
13
14
close
drag
15
mark exclusive
Shell
Slider, Scale,
Scrollbar
TreeItem
42
description
each character is a
separate action
“ ”
preconditions
has focus
type an entire
word,
different
words
correspond to different
actions
mark the entire
text
delete the character after the caret
or the marked
text
move the caret
up, down, left or
right
make the window
active
maximize or minimize the window
drag the window
to another position, each position corresponds
to a different action, e.g. upper
left, upper right,
lower left, ...
close the window
drag the thumb
to a position, distinct positions result in distinct actions, e.g. beginning, center, end
mark the item as
the currently active item within
the Tree
has focus
has focus
has focus
has focus
has focus
has focus; has title
area
has focus; has title
area
has focus
widget type
#
16
action type
mark
nonexclusive
17
open context
menu
description
if other items are
marked already,
then add this one
too (e.g. Ctrl +
Click)
open the context
menu
18
expand
expand the item
19
execute
20
21
execute
open
dropdown menu
22
23
24
increase
decrease
mark exclusive
25
mark
nonexclusive
26
open context
menu
execute the item,
e.g. double click
e.g. click
certain tool items
have a style that
includes an additional box, which
opens a drop
down menu when
clicked.
increase the value
decrease the value
mark the item as
the currently active item within
the Table
if other items are
marked already,
then add this one
too (e.g. Ctrl +
Click)
open the context
menu
27
execute
28
29
30
activate
activate
toggle size
31
close
TreeItem
ToolItem
Spinner
TableItem,
ListItem
TabItem
CTabItem
execute the item,
e.g. double click
activate the tab
activate the tab
maximize or minimize the tab
close the tab
preconditions
has focus; a menu is
associated with the
Tree
has focus; item has
children
has focus
has drop down style
has focus; menu associated with the List /
Table
has focus
43
widget type
#
32
action type
set focus
Combo
33
35
36
37
38
open
dropdown
type characters,
words,
delete, mark
etc.
item up
item down
execute
mark
39
context menu
40
41
execute
drag
42
43
delete
execute
34
Link
Figure
(Draw2d)
MenuItem
description
activate the text
area
open the dropdown list
see Text control
go one item up
pick the next item
e.g. click the link
make it the active
figure
open the context
menu
e.g. double click
drag the figure to
a location within
the DrawingCanvas, each position
corresponds to a
distinct action
delete the figure
e.g. click
preconditions
has focus; style allows modification to
the text box
the DrawingCanvas
which contains the
figure is associated
with a menu
has focus
item is not a separator
Table 7.1.: Action types.
Size of Search Space versus Test Granularity At this point, the following question
is interesting: Which are the most effective action types in terms of fault-revealing
capabilities? Each added action type increases the size of the search space and thus
makes the optimization task more difficult. If we search for length-10-sequences,
assuming only 5 alternative actions at each step, the search space will already consist
of roughly 10 million sequences. Given the fact that sequence evaluation is quite
slow, the set of action types to be applied, should be selected carefully. For example:
Looking at types 4 and 5, one could question whether it makes sense to generate a
distinct action for each letter that can be typed into a text field. Assuming the english
alphabet this would already generate 26 actions for a single text field and much more
if capital letters, numbers and special signs are allowed. On the one hand this gives
44
Figure 7.5.: Possible actions that can be performed on the CTE (green circles: left
clicks, yellow triangles: right clicks, blue arrows: drag and drop operations, green stars: double clicks). These are not all possible actions, but
only a selection, to preserve clarity.
Figure 7.6.: A modal window blocks input to the underlying windows. (green circles:
left clicks, violet circles text input).
45
Figure 7.7.: How many drag positions should be allowed?
our tests more granularity – because every possible word could be typed – but on the
other hand it drastically increases the search space. Thus it would probably make
more sense to pick only a subset of the possible characters or allow only the input of
a finite set of test words. Action types 14 and 16 introduce a similar problem: How
many positions should be allowed for drag operations (Figure 7.7)? For example:
We could allow to scroll a scrollbar only to either the first or the last position. This
might be sufficient for lists that contain only a small amount of items. Yet in large
lists, with thousands of items, we would not be able to access the majority of the
contents. But if we allow many positions the search space becomes much larger.
Another example: Dragging a window (see Figure 7.6) might not be a fault-sensitive
action. But given the fact that by moving the window we could potentially uncover
other widgets and thus enable us to perform actions that have not been available
before. However, due to the lack of empirical data it is quite difficult to determine a
reasonable set of action types to employ. This set is most likely quite specific to the
employed SUT. For example: In the CTE it is possible to mark the figures in the
drawing area. Once they are marked, they can be deleted by pressing the delete key.
We can also change their labels by typing in text. This behaviour is specific to the
CTE and other SUTs might provide different functionality. Thus there is no generic
set of “reasonable” actions that works perfect for all SUTs. In order to find a good
compromise between search space size and thorough testing, it might be necessary to
specify the set of possible action types prior to the optimization run. The approach
taken in this work is as follows: All action types described in Table 7.1 are used,
except for types 6, 8 and 9. For action types 4 and 5 we allow for the characters
“x, Y, $, 0, 9”. As for types 12 and 41 we allow three positions – upper left, upper
right, lower center. Finally, for type 14, three positions are allowed: start, center
and end.
46
7.3. Executing Actions
As seen in Table 7.1, actions are quite abstract entities. It is important not to confuse
them with the inputs that they generate. For example: Figure 7.3 shows a button
with the caption “Button”. In this particular situation, the action “execute button
“Button” ” and the input “left click x=220 y=115” might coincide. But if the window
is moved to another location or the screen’s resolution is changed, then the input
will target another control. Actions are much more robust to those kind of changes
than inputs. This is important, because having recorded a sequence of actions, we
want to be able to replay it, even if the windows and controls are not exactly at the
same positions that they used to be at when the sequence was recorded.
The action table only describes what the actions do, not how this is done. In the
approach presented in this work, each executed action generates a mouse or keyboard
input, or both. For example: Action type 1 is implemented by moving the cursor to
the center of the button’s bounding rectangle, simulating a left mouse button press
input and eventually simulating a left mouse button release input. It would, however,
be possible to implement this action in a different way: If the button has the focus,
then one could simply simulate a keystroke of the enter key. Another example: To
mark all characters in a text box we simulate “Control + A”. But it would also be
possible to move the caret to the beginning of the text, using the arrow keys, then
holding down the Shift key, moving the caret to the end and finally releasing the
shift key. This example shows that a single action might generate a complex series
of inputs.
7.3.1. Simulating Input versus Invoking Event Handlers
As shown above, the actions generate input that is very similar to the input that a
human tester would generate when using the GUI. This thesis makes the assumption
that this is important and necessary, because it is the most realistic way of operating
the GUI. However, there is a different approach. The SWTBot 15 scripting framework
for example, does not simulate keystrokes or mouse input, but directly invokes the
event handlers of the target controls. The idea behind this is as follows: Clicking on
a button causes a list of event handlers to be fired inside of the SUT, e.g. MouseMoveEvent(...), MouseDownEvent(), MouseUpEvent(), MouseClickEvent(), ... The
particular order is implementation-specific. So, instead of actually performing a click,
the SWTBot framework invokes the event handlers of the control in the order they
would be fired when performing a “real” click. This approach can be advantageous in
the case where it is not possible to calculate the bounding rectangle for the button.
For example: In SWT it is not possible to obtain the rectangle of the upper right
close button of a window. So instead of clicking that button, SWTBot would simply
15
http://www.eclipse.org/swtbot/
47
invoke the Close() Handler of the window16 . Yet this approach has a few important
downsides:
Inconsistencies. The invocation of event handlers might corrupt the SUT’s state.
If the event handlers are not invoked in the exact same order they would be
invoked upon “real” input, this might lead to “artificial faults”. For example:
If one performs a drag and drop operation, then the mouse cursor is moved
over many controls, potentially causing many other handlers to be fired until
it finally reaches the drop target. It is very difficult to calculate all necessary
handlers and their correct invocation order.
Limitations. Each state change in the GUI causes certain event handlers to be fired.
But the opposite does not hold. For example: Clicking on the “File” menu item,
causes the corresponding menu to be opened and fires the Arm() handler. But
invoking the Arm() handler does not cause the menu to be opened. In fact this
is one of SWTBot’s major problems: It is not possible to work with dynamic
menus.
Native Languages. Java’s reflection mechanism and other features, like Java agents,
greatly simplify the access to internal methods such as event handlers and
thus make the event handler method attractive. SUTs developed with native
languages, such as C/C++, often do not provide similar facilities. This makes
it difficult to apply this approach to those SUTs.
7.3.2. Naming Scheme
Since we want to be able to replay generated sequences and since the search algorithm
needs to assign pheromone values to particular actions in order to prioritize good
ones, we need a reliable way to identify actions. As already mentioned, actions should
not be identified by the raw inputs that they produce. Thus we cannot make use of
coordinate values. Let us consider again Figure 7.4 and assume that a click on the
“File” item followed by a click on the “Save” item has been performed. One way to
describe this sequence could look as follows:
1 . Execute ! D i s p l a y . S h e l l ( ’SWT Widget Test ’ ) . Menu . MenuItem ( ’ F i l e ’ )
2 . Execute ! D i s p l a y . S h e l l ( ’SWT Widget Test ’ ) . Menu
. MenuItem ( ’ F i l e ’ ) . Menu . MenuItem ( ’ Save ’ )
This is another situation where the widget tree comes in handy. We can see here
that it is possible to identify a widget by its access path. Display is the SWT main
object and the root of the widget tree, the main window Shell is its child and so
on. Because larger applications usually have several windows and because there are
multiple MenuItems, we have to disambiguate the children nodes. One way of doing
this is by augmenting the descriptor with property values, in this case the caption
16
Of course one can still simulate “Alt + F4”!
48
Figure 7.8.: The window caption property in the CTE cannot be used to disambiguate windows, because it contains the name of the currently open file
and thus changes throughout the run.
Display
Shell
Button
Text
Scale
MI
Menu
MI
MI
MI
MI
MI
MI
Menu
MI
MI
MI
MI
MI
MI
Figure 7.9.: A widget’s access path.
of the window or the name of the menu item. This works well in those cases, where
the properties used for disambiguation do not change their values. If, however, the
caption of the main form is dynamic, because it contains the current time or the name
of the currently opened file, then the actions are not recognized properly. Since the
latter is the case for the CTE (see Figure 7.8) this work employs a different naming
scheme. According to this scheme the above actions can be described as follows:
1 . Execute ! D i s p l a y [ 0 ] . S h e l l [ 0 ] . Menu [ 0 ] . MenuItem [ 0 ]
2 . Execute ! D i s p l a y [ 0 ] . S h e l l [ 0 ] . Menu [ 0 ] . MenuItem [ 0 ] . Menu [ 0 ] . MenuItem [ 2 ]
Throughout the tests with the CTE, it turned out, that the order of creation of the
widgets is relatively stable. This means that the order that the widgets appear in the
widget tree is stable, too. So instead of using properties like the caption or the item’s
names, we can simply employ the child index to disambiguate a widget’s children
(see Figure 7.9). However, this might not be the case with other applications, so the
decision on which naming scheme to use, has to be made on a per case basis.
49
50
8. Implementation
This chapter presents the framework which has been developed throughout this work
and explains the implementation of the fitness function and of the techniques used
to operate the SUT’s GUI.
8.1. The Framework
Figure 8.1 provides an overview of the framework’s structure. It consists of two
main components. The first one is the Starter which implements steps 1), 5), 8) and
9) of the sequence generation process. It contains the implementation of the ACO
algorithm and has the ability to record and replay sequences. The Starter executes
and terminates the SUT with the attached Agent, which is the second component.
The Agent implements steps 2), 3), 4), 6) and 7) of the sequence generation process.
It runs within the SUT’s Virtual Machine, performs the bytecode instrumentation
to obtain the call tree and thus the fitness values and scans and operates the GUI.
Both components communicate via a TCP/IP connection. Figure 8.2 shows how
they collaborate in order to generate sequences. The Starter requests the set of
alternative actions from the Agent, selects an action and instructs the Agent to
execute it. These steps are repeated until a sequence has been generated. Eventually,
the Starter requests its fitness value.
The Starter is the component that the tester works with. When the framework
is started the Starter’s terminal window appears (see Figure 8.3). This window
provides information about the course of the optimization process, e.g. the number
of the current generation, the rating of the best sequence found and the ratings of the
past few sequences. Once the optimization process started, it is difficult to work with
the machine that the framework is running on, because the mouse cursor constantly
moves and keyboard input is simulated. Hence, the Starter provides a special key
combination which stops the run and returns the control to the tester.
Prior to the start of the optimization process, the tester may supply a set of parameters in the form of a file named settings.xml. Figure B.1 in the appendix shows an
example of this file. It contains parameter values used by the ACO algorithm, e.g.
the number of generations, the population size, the sequence length, the learning
rate, etc. In addition, the tester can specify the files which are to be deleted upon
the start of the SUT.
51
8. Implementation
- operates the GUI
- instruments SUT and
obtains fitness value
SUT
Agent
TCP/IP
- starts SUT and Agent
- contains ACO
- records sequences
- replays sequences
Starter
Figure 8.1.: Main components of the framework.
Starter
Agent
get available
actions
actions
execute action
get available
actions
.
.
.
evaluate
sequence
fitness value
Figure 8.2.: Communication between the Starter and the Agent.
52
8.2. Java Agents
Figure 8.3.: The Starter displays information about the optimization process.
8.2. Java Agents
This work makes use of Java agents for the implementation of several features, hence,
this section briefly introduces this concept.
A Java agent is a module that can be attached to a Virtual Machine (VM) at start
or during runtime. Once attached, it has access to all loaded classes of the VM and
the ability to modify them at runtime. Figure 8.4 shows how to start a VM with
an attached agent via command line. Agents come in two flavours, either as native
implementations using the Java Virtual Machine Tool Interface (JVMTI) or as Java
implementations packaged in a jar-file. This work makes use of the latter, because
they are easier to implement and platform independent. Figure 8.5 shows a simple
agent and its premain()-method. This method is called before the main()-method
of the Java program running in the VM. It receives as parameters the program
arguments and an object of type Instrumentation. This object can be used to
access loaded classes, for example via getAllLoadedClasses(). In addition, it allows
to install a so called ClassFileTransformer via addTransformer(). Transformers
may change the bytecode of classes during the class loading process. Java agents,
both native and Java style, are often used by profilers like the Test and Performance
Tools Platform 17 or the Java interactive Profiler 18 .
17
18
http://www.eclipse.org/tptp/
http://jiprof.sourceforge.net/
53
8. Implementation
j a v a −j a v a a g e n t : a g e n t . j a r SUT
Figure 8.4.: Starting a Virtual Machine with an attached Java agent.
import j a v a . l a n g . i n s t r u m e n t . ∗ ;
public c l a s s Agent {
public s t a t i c void premain ( S t r i n g agentArgs , I n s t r u m e n t a t i o n i n s t ) {
for ( Class c l : i n s t . getAllLoadedClasses ( ) )
System . out . p r i n t l n ( "Name o f l o a d e d c l a s s : " + c l . getName ( ) ) ;
i n s t . addTransformer (new MyTransformer ( ) ) ;
}
}
Figure 8.5.: A simple Java agent, which prints out the currently loaded classes and
installs a custom ClassFileTransformer.
The framework’s Agent component is a Java style agent. It is necessary that the
Agent is attached to the SUT at each start. The CTE provides a file called cte.ini,
where it is possible to supply parameters for the Virtual Machine that the CTE
is running in. So in order to attach the Agent, we add the line highlighted in
Figure 8.6.
8.3. Implementation of the Fitness Function
This section presents the implementation of the fitness function introduced in chapter 6. All of this functionality resides in the framework’s Agent component.
In addition to their experimental results, McMaster and Memon [MM08] published
the tool that they used to record the program call trees, the JavaCCTAgent 19 . This
tool builds upon the Java Virtual Machine Tool Interface (JVMTI) to intercept
method calls and thereby gather the necessary information to build a call tree. The
19
http://sourceforge.net/projects/javacctagent/
−vmargs
−s e r v e r
−Xms100m
−Xmx1100m
−ea
−X b o o t c l a s s p a t h /p : asm−a l l − 3 . 3 . 1 . j a r
-javaagent:agent.jar
Figure 8.6.: Java VM parameters in the CTE initialization file cte.ini.
54
JVMTI is used by native Java agents. It provides callback functions for various
events, like method entry / exit, thread start / end, Virtual Machine startup / exit,
to name a few. The initial idea was to use the implementation of McMaster and
Memon for this work. However, it turned out that this has two drawbacks:
Overhead. When attached to the CTE, the agent significantly slowed down the
execution. The CTE often became unresponsive and sometimes even crashed.
This is due to the fact that JVMTI agents are implemented using native C code. This code communicates with the Virtual Machine via the Java Native
Interface (JNI). Unfortunately, calling native methods from Java and vice versa
is a quite expensive operation [KS01, WK00] and since the agent triggers two
native function invocations (method entry and exit) for each method call within
the VM, this introduces significant overhead. Moreover, since the program
constantly crosses the boundaries between native and non-native code, the
VM is unable to perform certain optimizations, like method inlining [Lia99].
Platform Dependency. Since the JVMTI is not available for all implementations of
the Java VM [Ora06], this approach does not work on all platforms.
For these reasons, the framework in this work uses a different solution. This solution
injects bytecode into the SUT during runtime and does not make use of native code,
which improves the performance and preserves platform independence.
8.3.1. The Concept
Before getting to the actual bytecode instrumentation, the overall idea of the process is explained with the help of comprehensible Java source code. The goal is to
instrument the SUT so that each method call can be detected. For example: A
straightforward approach to instrumenting the Java program in Figure 6.2 would be
to add callback invocations to the beginning and end of each method as depicted in
Figure 8.7. This way each method call is tracked and can be saved as a node in the
call tree.
Inst.enter() and Inst.leave() are the callback methods. They take as a parameter the identifier of the called method and are responsible for building the actual call
tree. Upon instrumentation, each method within the SUT will be assigned a method
id. This id is obtained by combining the name of the class, the name of the method
and its signature. For example: The method id for the main method would be
i n t MID_MAIN = " s t a t i c v o i d S t a t . main ( S t r i n g [ ] ) " . hashCode ( ) ;
This way every method possesses a unique identifier.
This first approach has a little flaw: In line 16 of Figure 6.2 the data array is accessed
without checking its length. If no parameters are supplied upon program invocation,
55
8. Implementation
double [ ] data ;
public s t a t i c void main ( S t r i n g [ ] a r g s ) {
Inst.enter(MID_MAIN);
S t a t s = new S t a t ( a r g s ) ;
s . calc ();
Inst.leave(MID_MAIN);
}
public S t a t ( S t r i n g [ ] a r g s ) {
Inst.enter(MID_STAT);
data = new double [ a r g s . l e n g t h ] ;
f o r ( i n t i = 0 ; i < a r g s . l e n g t h ; i ++)
data [ i ] = Double . p a r s e D o u b l e ( a r g s [ i ] ) ;
Inst.leave(MID_STAT);
}
Inst.enter(MID_CALC);
...
Inst.leave(MID_CALC);
}
. . .
}
Figure 8.7.: Modified version of the program in Figure 6.2. We added method calls
at the begin and end of each method to record invocations.
an exception is raised, which causes the program to crash. This prevents the invocation of Inst.leave(). So exceptions – as well as additional return statements –
might perturb the construction process of the call tree. Hence, it is necessary to adjust the instrumentations as depicted in Figure 8.8. The try-finally-handler around
the entire method body guarantees the proper invocation of both callbacks, no matter what happens inside of the body. These modifications neither affect the control
flow of the original code nor intercept any thrown exceptions. The example shows
only the instrumentation of the visible source code, but of course the Java library
methods pow(), println() and parseDouble() have to be and will be instrumented,
too.
With this infrastructure in place, Inst.enter() and Inst.leave() can work as
follows:
1. Upon start of the SUT, create a root node for the program tree. The nodes of
the program tree are objects of the class shown in Figure 8.9.
2. Upon start of each thread t, assign a reference activeN odet to this thread. This
reference will always point to the currently active node / method of the thread
within the program call tree. At thread-start, activeN odet is set to point to
the root node.
3. On every invocation of Inst.enter(mid), determine the currently active thread
t. Consider the node pointed to by activeN odet . Check whether one of its children already has a method id identical to mid. If so, set activeN odet to point
56
. . .
try{
Inst.enter(MID_CALC);
i f ( data . l e n g t h > 1 ) {
System . out . p r i n t l n ( mean ( ) + " , " + v a r ( ) ) ;
} else {
System . out . p r i n t l n ( data [ 0 ] + " , " + 0 . 0 ) ;
}
}finally{
Inst.leave(MID_CALC);
}
}
. . .
}
Figure 8.8.: Add a finally handler to prevent exceptions from perturbing the call
tree construction process.
public c l a s s Node{
public Node ( i n t methodId ) { . . . }
public synchronized L i s t <Node> g e t C h i l d r e n ( ) { . . . }
public synchronized void addChild ( Node node ) { . . . }
public Node g e t P a r e n t ( ) { . . . }
public i n t getMethodId ( ) { . . . }
}
Figure 8.9.: Node class for the call tree.
to this child. If not, then create a new child with mid as its method id and set
activeN odet to point to this child.
4. Upon execution of Inst.leave(mid), determine the currently active thread t.
Set activeN odet to point to its parent within the program call tree.
5. Since there are several threads accessing and modifying the program call tree,
make sure that access is synchronized.
The presented approach would introduce a new child for each recursive invocation.
Thus step 2 is refined to only generate a new child if the caller is different from the
callee. This is necessary, since recursive invocations might cause trees of excessive
size. Applied to the CTE, this technique generated call trees of moderate size 20 .
However, indirect recursions are not covered by the current approach and might be
problematic for other SUTs.
20
In the experiment all call trees consumed less then 200 MB of main memory for length-10 sequences.
57
8. Implementation
public
Code :
0:
3:
4:
5:
8:
9:
10:
13:
s t a t i c v o i d main ( j a v a . l a n g . S t r i n g [ ] ) ;
new
#1; // c l a s s S t a t
dup
aload_0
invokespecial
#2; // Method " " : ( [ Ljava / l a n g / S t r i n g ; ) V
astore_1
aload_1
invokevirtual
#3; // Method c a l c : ( ) V
return
Figure 8.10.: Output of the javap tool for the main method of the program in Figure 6.2.
8.3.2. Bytecode Instrumentation
So far only Java code has been used to illustrate the modifications. However, in
practice the SUT’s source code is often unavailable. The solution for Java applications is to access the corresponding bytecodes. This section gives an introduction to
the Java Virtual Machine (JVM) and the necessary steps involved with the bytecode
instrumentation.
The JVM is a stack-based abstract computing machine [LY99]. In contrast to other
architectures, it does not make use of registers. Instead, each thread in the JVM has
its own stack which is used for value storage, operations and method invocations.
Appendix A provides an overview of the supported instructions. For each instruction
it describes the necessary parameters and the state of the stack before and after its
execution.
Figure 8.10 shows the compiled version of the main method from Figure 6.2. The
output was generated by applying the javap 21 disassembler to the compiled version
of the Stat class. Addresses 0 to 5 correspond to the constructor invocation. The
new #1 instruction at address 0 creates an object of class Stat on the Java heap and
pushes the corresponding reference onto the current thread’s stack. The constant #1
was defined by the Java compiler upon compilation, and refers to the class Stat. The
next operation, dup, simply duplicates the topmost stack value, i.e. it pushes a copy
of the aforementioned reference onto the stack. This will be the this reference for
the constructor call. aload_0 pushes the reference of the args parameter onto the
stack, because the following constructor call takes it as a parameter. The constant
0 refers to the local variable with index 0. invokespecial #2 eventually invokes
the constructor which pops the args reference as well as the this reference off the
stack. The following instruction, astore_1 assigns the remaining object reference to
the local variable s at index 1 and pops it off the stack. aload_1 loads the reference
from s and pushes it onto the stack again, so that calc() can be invoked with
invokevirtual. The program terminates by returning from the main method.
21
http://download.oracle.com/javase/6/docs/technotes/tools/windows/javap.html
58
p u b l i c s t a t i c v o i d main ( j a v a . l a n g . S t r i n g [ ] ) ;
Code :
0:
bipush 123
2:
invokestatic #1; // Method I n s t . e n t e r : ( I )V
5:
new
#2; // c l a s s S t a t
8:
dup
9:
aload_0
10:
invokespecial
#3; // Method " " : ( [ Ljava / l a n g / S t r i n g ; ) V
1 3 : astore_1
1 4 : aload_1
15:
invokevirtual
#4; // Method c a l c : ( ) V
1 8 : bipush 123
2 0 : invokestatic #5; // Method I n s t . l e a v e : ( I )V
2 3 : goto 34
2 6 : astore_2
2 7 : bipush 123
2 9 : invokestatic #5; // Method I n s t . l e a v e : ( I )V
3 2 : aload_2
3 3 : athrow
3 4 : return
Exception table:
from to target type
0 18 26 any
26 27 26 any
Figure 8.11.: Output of the javap tool for the main method of the program shown
in Figure 8.7.
In order to get the modified version of the main method with the finally handler,
one needs to add the instructions highlighted in Figure 8.11, which implement the
calls to the methods Inst.enter() and Inst.leave() and the try-finally handler.
The modifications include an exception table, which instructs the JVM to jump to
the finally handler whenever an exception is raised by any of the instructions in
the method body. The instruction bipush 123 simply pushes main’s (in this case
fictional) method id as an argument to Inst.enter() onto the stack.
Figure 8.11 might suggest that the instrumentation process doubles the size of the
bytecode. However, this is only the case for very small methods, like the main()
method. For example: The increase of size for the calc() method is only approximately 30%.
8.3.3. The ASM Framework
To facilitate the steps listed above, this work makes use of ASM 22 , a framework dedicated to bytecode instrumentation. Section 8.2 introduced the concept of Java agents
and that they offer an interface called ClassFileTransformer. Implementations of
this interface must provide the following method:
22
http://asm.ow2.org/
59
8. Implementation
by te [ ] t r a n s f o r m ( C l a s s L o a d e r l o a d e r , S t r i n g className ,
C l a s s <?> c l a s s B e i n g R e d e f i n e d ,
ProtectionDomain protectionDomain ,
byte [ ] c l a s s f i l e B u f f e r )
A ClassFileTransformer object can be registered for bytecode modification via
Instrumentation.addTransformer() (see Figure 8.5). After this happened, the
object’s transform() method gets invoked each time a class is loaded into the JVM.
Among other arguments, transform() receives the bytecode of the loaded class
within classfileBuffer. It may either return the original bytecode or change the
contents of the buffer to return a modified class. Parsing and modifying the raw
bytecode is a difficult and error-prone task. This is where ASM comes into play.
It provides the classes and interfaces ClassReader, ClassVisitor, MethodVisitor
and ClassWriter, among others. The concept is similar to that of a SAX parser for
XML: A ClassReader object takes the raw bytecode, reads it sequentially and generates events by calling the appropriate methods on a ClassVisitor and MethodVisitors. For example: The ClassVisitor interface provides the methods visitField()
and visitMethod(). By implementing these methods, one can modify, add or remove existing methods or fields. The MethodVisitor interface provides the methods
visitMethodInsn(...) or visitJumpInsn(...) among many others. The visitJumpInsn(...)
is called whenever the ClassReader encounters a goto, if eq, if neq, if null, etc. instruction. visitMethodInsn(...) is called whenever it encounters an invokevirtual,
invokespecial and so forth. Of course the ClassReader provides these methods with
the necessary arguments, like the opcode of the instruction, the operands, the method
signature, etc. Figure 8.12 shows this process. By delegating (or omitting) calls to
a ClassWriter, an arbitrarily modified version of the original class can be generated.
For this work the task is relatively straightforward, since the modifications are purely
additive.
Instrumented Classes. Since the implementation for the generation of the program
call tree makes use of certain parts of the Java library, it is not possible to instrument
these classes without provoking infinite recursion. Hence, only the classes that are
loaded after the employed Java core classes, are instrumented. This means that
Object, String and certain container classes are not instrumented. The methods of
these classes will not be represented in the call tree.
Nondeterministic Behavior Throughout the tests with the framework and the fitness function, it was observed that identical input sequences for the CTE produced
call trees of different sizes, which introduced a certain nondeterministic factor. This
might be due to the fact that the environment (i.e. the environment provided by the
operating system) is slightly different in each run. For example: The control flow
within the SUT might depend on variables such as the system time, the available
60
8.4. Operating the SUT
ClassReader
ClassVisitor
visitMethod()
MethodVisitor
visitInsn()
visitInsn()
visitInsn()
.
.
.
visitMethod()
visitInsn()
visitInsn()
.
.
.
.
.
.
Figure 8.12.: Events generated during ASM bytecode instrumentation.
main memory and external files in the temporary files directory of the system, all of
which are subject to change. However, the size differences of the call trees used to
be small 23 . McMaster and Memon [MM08] reported similar effects.
This section explains the central ideas for the implementation of the features presented in chapter 7. All of this functionality resides in the framework’s Agent component.
8.4.1. Accessing the SWT Classes
Figure 8.13 shows the two important components which implement the functionality
used to operate the SUT. The WidgetTreeBuilder takes care of step 3) of the sequence
generation process and the ActionFinder deals with step 4).
In order to be able to build a complete widget tree of the GUI, the WidgetTreeBuilder needs access to the SWT main object called Display. This object provides
a method for finding all Shells (e.g. getShells()) and the Shells in turn provide a
23
The size difference between the call trees was usually below 0.5%.
61
8. Implementation
Agent
Widget
Tree
Builder
Action
Finder
SWT
Library
SUT
Figure 8.13.: The Agent component accesses the SWT classes of the SUT.
method getChildren() to access the child controls. Having access to the Display
object, means being able to generate a complete widget tree. Thus it is necessary
to link the implementation against the SWT library. More precisely, one needs to
link it against the same instance of the library that the SUT is using. Since the
WidgetTreeBilder depends on the library, it cannot work until the SUT loaded the
SWT module into the Virtual Machine. Hence, the linking process needs to be postponed. A straightforward way to accomplish this, is to write a custom classloader
which takes care of resolving the SWT-dependencies. The necessary steps involved
are:
1. Start the SUT with the attached Java agent.
2. Wait for the SWT classes to be loaded and acquire access to the SWT classloader.
3. Load the WidgetTreeBuilder classes using the custom classloader which links
these against the SWT classes by passing load requests to the SWT classloader.
4. Start the sequence generation process.
Step 2 can be implemented using the Instrumentation object obtained from the
agent’s premain() method. It provides the method getAllLoadedClasses() through
which the Display class and its Class object can be found. Through this Class object the corresponding SWT classloader can be obtained.
62
8.4.2. Generating Inputs
In order to generate inputs, the framework makes use of the Java.awt.Robot24 class.
It provides the methods keyPress(int), keyRelease(int), mouseMove(int, int),
mousePress(int) and mouseRelease(int), among others. With the help of these
five methods it is possible to generate the same set of mouse and keyboard inputs
that a human tester can generate.
8.4.3. Replaying Sequences
In order to verify whether the generated sequences execute properly or reveal faults,
the tester needs to replay them. The framework’s Starter component implements
this functionality. This is done as follows: The process is similar to the sequence
generation process. The SUT is started, the GUI is scanned and the set of alternative actions is determined. However, instead of performing step 5), where the
optimization algorithm selects an action, the framework checks whether the action
to be replayed is contained within the set of alternative actions. If so, it is executed.
These steps are repeated until the end of the sequence is reached and the replay
succeeded. If the action to be replayed is not among the set of alternative actions
then the replay failed.
Failed replays may occur due to the following reason: Throughout the optimization
process, sequences are usually generated in a rather fast way, i.e. the time between
two subsequently executed actions is small. For example: In the experiment of this
work, a delay of 80 ms has been used. On the one hand this delay slows down the
optimization process, on the other hand it is necessary, since the SUT needs time
to react to the inputs and to complete its internal operations before it changes the
state of the GUI. Upon replay, the tester usually wants to see what the sequence does
and will use a much higher delay to slow down the execution speed. This can lead
to problems. Depending on how long the framework waits after an action has been
executed, the resulting set of alternative actions might be different. For example:
Figure 7.6 shows the “New File” dialog of the CTE, which pops up after a click on
the “File” menu item followed by a click on the “New CTE Diagram” menu item. The
dialog is modal, meaning that it blocks input to the underlying main window and its
controls. When generating a sequence that opens this dialog, the framework waits
80 ms after the click to “New CTE Diagram”. After this time it rescans the GUI,
determines the alternative actions and executes the next action. The dialog might
need more than 80 ms to pop up. This means that the framework can still execute
actions on the main window after the click to “New CTE Diagram”. However, if the
tester decides to replay the generated sequence at a lower speed, e.g. with a 1 second
delay after each action, the dialog has enough time to pop up before the next action
is executed. Thus a sequence that used to perform actions on the main window, will
24
http://download.oracle.com/javase/1,5.0/docs/api/java/awt/Robot.html
63
8. Implementation
fail to replay. Hence, certain sequences can only be replayed at a certain speed. If
this happens too often, then one might need to adjust the delay value for sequence
generation. This can be done in the settings.xml file shown in Figure B.1.
64
9. Experiment
To get an impression of the framework’s performance, a first experiment has been
carried out. The intend was to answer the following two questions:
1. Does the quality of the sequences improve over time?
2. How well does the ACO algorithm compare to the Random Search algorithm,
i.e. an algorithm that generates sequences completely at random? Is the quality
difference between the best sequences statistically significant?
The first question is interesting because the framework uses a metaheuristic algorithm
and by that performs stochastic sampling [Luk09]. The Random Search algorithm
implements a static uniform distribution over the search space. So the average sequence quality within each generation should stay at a constant level. The ACO
algorithm also starts with a uniform distribution, because at the beginning all action pheromones are equal. Throughout the course of the optimization process, the
algorithm draws samples from this distribution, by generating new sequences. The
sequences’ fitness values, in turn, are used to adapt the distribution. Hence, over
time, the probability density over the regions with good sequences and thus the average sequence fitness in later generations, should increase. Ideally, the best sequence
is generated at the end of the run. If the presented approach and its implementation
work, these effects should be observable in the experiment.
The answer to the second question will give a first hint on whether it pays to apply
metaheuristics to this kind of problem, or whether a simple random strategy might
suffice.
9.1. Setup and Results
To answer both questions, the framework has been applied to the CTE. In order to
keep the size of the search space at a moderate level, the sequence length has been
set to 10. Since Random Search is merely a special case of Ant Colony Optimization,
it can be simulated by setting appropriate values for the ACO parameters. Table 9.1
shows the setup for both strategies. The stopping criterion used for both algorithms
was a limit on the number of generations.
Figures 9.1 and 9.2 show the course of the experiment for the RS and the ACO
run, respectively. Contrary to the random run, the average sequence fitness in the
65
9. Experiment
desc
k
α
ρ
popsize
generations
seqlength
aco
rnd
15
all
0.3
0.0
0.5
0.0
10
10
80
80
10
10
pheromone
default
4000
4000
Table 9.1.: Parameters of the runs. α is the learning rate, k refers to the k best
sequences selected for pheromone update on each generation and ρ is the
pseudo-proportionate selection probability.
Sequence Fitness
Fitness
80000
70000
60000
Fitness
50000
40000
30000
20000
10000
0
0
100
200
300
400
Sequence
500
600
700
800
Figure 9.1.: Course of the RS run.
generations of the ACO run, improves over time and eventually reaches its peak. At
the beginning the algorithms exhibit similar performances, by generating low quality
sequences with occasional outliers. Table 9.2 shows the results of both runs. ACO
outperformed RS with BestACO = 70370 compared to BestRS = 49056. Each run
generated 800 sequences, in 80 generations, within approximately 104 minutes.
As for the second question, it is interesting whether the results in Table 9.2 are
coincidental or representative. In order to prove significance, the above experiment
has been repeated 51 times and the fitness values of the best sequences have been
recorded. For each strategy the arithmetic fitness mean x, the estimated standard
error SEx and the corresponding confidence intervals for the significance level α = 1%
66
9.1. Setup and Results
Sequence Fitness
Fitness
80000
70000
60000
Fitness
50000
40000
30000
20000
10000
0
0
100
200
300
400
Sequence
500
600
700
800
Figure 9.2.: Course of the ACO run.
desc
ACO
RS
best
70370
49056
duration
≈ 104 minutes
≈ 104 minutes
Table 9.2.: Results of the runs.
67
9. Experiment
#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
26
27
28
29
30
31
32
33
34
35
Best Fitness
ACO
RS
73466.0 23178.0
54943.0 22546.0
53332.0 24275.0
67738.0 28214.0
47562.0 25817.0
50969.0 41181.0
53477.0 26401.0
47937.0 27673.0
53457.0 49447.0
49032.0 43095.0
85646.0 68032.0
83301.0 47056.0
53298.0 33986.0
47793.0 25830.0
54286.0 42963.0
54210.0 50378.0
73484.0
55217.0
56105.0
69057.0
38093.0
50284.0
74442.0
79564.0
76068.0
72459.0
63244.0
77944.0
76008.0
77025.0
76064.0
61205.0
56256.0
77359.0
23745.0
30220.0
57301.0
54265.0
40078.0
28079.0
27973.0
23735.0
25705.0
33485.0
34255.0
49056.0
49351.0
60893.0
25197.0
22323.0
51784.0
46426.0
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
70370.0
78471.0
60233.0
58845.0
58090.0
63610.0
81020.0
62057.0
75708.0
64592.0
75282.0
55594.0
62586.0
55702.0
59320.0
76083.0
31022.0
26827.0
44898.0
27875.0
28966.0
30185.0
29351.0
51421.0
47533.0
48465.0
48827.0
34935.0
66666.0
33855.0
39056.0
25056.0
Table 9.3.: Fitness values of the best sequences found on each run.
desc
ACO
RS
n
51
51
max
85646
68032
min
38093
22323
x
63800.63
37410.49
SEx
1619.92
1735.73
CI
[59621.24, 67980.02]
[32932.31, 41888.68]
α
1%
1%
Table 9.4.: Confidence intervals for the two strategies.
have been calculated by applying the following equations:
s
1
n
SEx =
n
P
(xi − x)2
i=1
√
n−1
CI = [x − SEx · z, x + SEx · z]
z = Φ−1 (0.995) ≈ 2.58
with Φ−1 being the inverse of the cumulative standard normal distribution function.
Table 9.3 shows the maximum sequence fitness of each run. Table 9.4 shows the
confidence intervals for both strategies. Since these do not overlap, the difference
between the mean fitness values of both algorithms is statistically significant.
68
9.2. Threats to Validity
9.2. Threats to Validity
Averaged over 51 runs, ACO significantly outperformed RS. However, the max and
min columns of Table 9.4 indicate that RS occasionally finds better sequences. The
experiment shown above, has been carried out with only a single SUT. In order
to prove that ACO is the superior algorithm, additional experiments with different
SUTs need to be conducted.
The ACO algorithm has been tested in earlier experiments with different parameter
settings. Throughout these experiments a lot of experience with the algorithm has
been gained. Hence, the presented parameter values might have been chosen in favor
of ACO.
69
9. Experiment
70
10. Conclusion and Future Work
This chapter reviews the presented approach and proposes topics for future research.
10.1. Conclusion
The goal of this work was to show that it is possible to automatically generate input
sequences for applications with a complex GUI. Therefore, a new approach has been
proposed, which exploits metaheuristic optimization techniques for the generation of
test sequences. The following features set this approach apart from earlier works in
this field:
1. Due to the fact that the technique does not make use of a model of the GUI or
existing input sequences or other handcrafted artifacts, it requires no human
intervention.
2. A novel fitness function, based on the Call Tree Size criterion [MM08], is
applied to guide the optimization process.
3. The source code of the SUT is not required and the instrumentation of the
SUT is performed automatically.
4. The GUI of the SUT is operated using a rich set of actions employing mouse
and keyboard inputs.
The presented approach has been implemented as a framework which employs the
Ant Colony Optimization algorithm to generate sequences with high fitness values.
This framework has been applied to the Classification Tree Editor, a Java application based on SWT. In a first experiment it significantly outperformed a random
sequence generation strategy. These results are encouraging and indicate the suitability of metaheuristic optimization techniques in the context of GUI testing. A
strength of the presented approach is its flexibility: It allows not only for the use
of different optimization algorithms and fitness functions, but can also be extended
to work on different platforms, with different programming languages and different
GUI toolkits.
71
10.2. Future Work
Generating Test Suites The approach presented in this work only generates single
input sequences. The next step towards a comprehensive future GUI testing framework is to generate entire test suites. A possible extension for the current technique
could look as follows: After generating the first sequence, one could strive to find a
second sequence which is different from the first one, in the sense, that it generates
a call tree with new, unseen leaves. Therefore, it would be necessary to adjust the
fitness function after each optimization run. The second sequence would then only
obtain fitness points if its call tree contains new leaves.
Fault Sensitivity This is one of the most important future steps in order to evaluate
the benefit that can be drawn from the presented technique. In order to find out
whether the generated sequences or suites are actually fault-revealing, the framework needs to be applied to a set of SUTs from different application domains, such
as graphical editors, type setting programs, internet browsers, etc. This requires a
database with known faults of these SUTs and information on how to reveal them.
Then it is possible to generate test suites and determine the actual amount of discovered faults. Once such an infrastructure has been established, different fitness
functions can be tested and compared to one another.
Other Fitness Functions The currently used fitness function is rather trivial, since
it simply counts the call tree’s leaves. But it might also be interesting to consider the
diversity of the methods contained within the call tree. For example: If we have a
tree with only 10 different methods but with a large number of leaves, then currently
we would prefer this tree over one with a 100 different methods, yet with a fewer
amount of leaves. However, the second, lower rated tree might execute more code
and address more aspects of the SUT. Another idea could be to reward the maximal
depth of the tree, with the intend to foster sequences that generate complex and long
call chains.
Other criteria could be interesting as well. For example the classical code coverage
criteria. Another idea could be to search for sequences which cause high memory
consumption of the SUT. However, the use of all of these criteria remains subjective
until their fault-revealing capabilities have been proved in an empirical study as
suggested in the previous paragraph.
Understanding and Improving the current Framework There are several parameters involved with the optimization process. Finding a good setup is a difficult task
and the optimal values vary with different SUTs. The parameter set employed in
the experiment is quite likely not the ideal one and the framework could probably
perform better with the right values. Hence, it could be interesting to perform a
72
10.2. Future Work
series of experiments with varying setups in order to determine the impact of the
different parameters on the efficiency and effectivity of the optimization process.
The ACO algorithm has been applied to a variety of different problems, often with
different selection rules and pheromone update strategies [DB05]. It could be worthwhile to experiment with these.
The current framework does not make use of a sophisticated termination criterion,
but generates a fixed amount of sequences. This is not optimal and may have stopped
the optimization in the experiment prematurely. A better criterion could improve
the effectivity as well as the efficiency. However, it is generally difficult to define such
a criterion, since the best possible fitness value is usually unknown ahead of time.
Other Algorithms There are various metaheuristic algorithms that have been successfully applied to many different tasks. Thus it is possible that one or more of
these are better suited for sequence generation than ACO. One of the challenges of
sequence generation is the following: The fitness value of a sequence also depends on
the order of the contained actions. ACO does not take this order into account when
updating the action pheromones. If it finds a sequence with a high fitness value,
then all contained actions also obtain a high pheromone value. But when generating
sequences this order matters and certain actions might only be effective in the presence of others, which means that linkage exists among them. ACO disregards linkage
among the trail components [Luk09]. Metaheuristics like the Bivariate Marginal Distribution Algorithm [PM99], which captures important pairwise dependencies among
components using a dependency graph, might be capable of addressing these problems. This way the likelihood of an action being selected, would depend on the
previously executed actions.
Oracle In the future the presented framework might be capable of generating faultsensitive test suites, but a human tester would still have to verify the sequences in
order to find possible faults. This is a time-consuming task. It could be interesting
to probe certain parts of the SUT, like for example log files, for erroneous outputs
to identify “suspicious” sequences automatically. More subtle faults however, like an
incorrect text color, are difficult to find. Therefore it must be possible for the tester
to specify conditions which need to be true before and / or after the execution of
certain actions. This way one could specify the behaviour of the desired system and
the framework could then test the SUT against these conditions and by trying to
find sequences which violate them.
Improving the Start Time of the SUT Currently, the SUT is restarted in each
iteration of the sequence generation process, with the intend to bring it to an initial
state. This is the highest cost associated with the optimization process. Starting
SUTs like the CTE or Eclipse takes several seconds, even on modern hardware.
73
Speeding up this process means improving the optimization efficiency. One way to
accomplish this could be to save the state of the loaded and initialized application
and restore it each time a new sequence needs to be generated. Virtual Machines or
solutions for single processes, like CryoPID 25 could help with this task.
Another option is the parallelization of the sequence generation process.
25
http://cryopid.berlios.de/
74
!"#"$%&'()*+($,-.'/0)',*-$1,.',-2.
3
!"#"$%&'()*+($,-.'/0)',*-$1,.',-2.
,.$ "$ 1,.'$ *6$ '5($
,-.'/0)',*-.$ '5"'$ 7"8($ 09$
'5($ !"#"$ %&'()*+(:$ "-$ "%.'/")'$ 7")5,-($ 1"-20"2($ '5"'$ ,.$ 01',7"'(1&
A.45,.$
Java
Bytecode
Instructions
(;()0'(+$ %&$ '5($ !"#"$ #,/'0"1$ 7")5,-(<$ 45($ !"#"$ %&'()*+($ ,.$ 2(-(/"'(+$ %&$ 1"-20"2($ )*79,1(/.$ '"/2(',-2$ '5($ !"#"
=1"'6*/7:$7*.'$-*'"%1&$'5($!"#"$9/*2/"77,-2$1"-20"2(<
$3-(4*-,)$
$5'7(/$%&'(.$
56)*+(
!"#$%&'(
$=(.)/,6',*-$
8'")9
:%(;*/(<!:";'(/<
""1*"+
>?
"//"&/(6:$,-+(;$!"#"10(
1*"+.$*-'*$'5($.'")8$"$/(6(/(-)($6/*7$"-$"//"&
"".'*/(
@>
"//"&/(6:$,-+(;:$#"10($!
.'*/(.$,-'*$"$/(6(/(-)($,-$"-$"//"&
")*-.'A-011
B3
!"-011
90.5(.$"$!"##$/(6(/(-)($*-'*$'5($.'")8
"1*"+
3C
!"*%E()'/(6
1*"+.$"$/(6(/(-)($*-'*$'5($.'")8$6/*7$"$1*)"1$#"/,"%1(
%&!'()
"1*"+AB
?"
!"*%E()'/(6
1*"+.$"$/(6(/(-)($*-'*$'5($.'")8$6/*7$1*)"1$#"/,"%1($B
"1*"+A3
?%
!"*%E()'/(6
1*"+.$"$/(6(/(-)($*-'*$'5($.'")8$6/*7$1*)"1$#"/,"%1($3
"1*"+A?
?)
!"*%E()'/(6
1*"+.$"$/(6(/(-)($*-'*$'5($.'")8$6/*7$1*)"1$#"/,"%1($?
"1*"+A>
?+
!"*%E()'/(6
1*"+.$"$/(6(/(-)($*-'*$'5($.'")8$6/*7$1*)"1$#"/,"%1($>
"-(F"//"&
%+
)*0-'$!""//"&/(6
)/("'(.$"$-(F$"//"&$*6$/(6(/(-)(.$*6$1(-2'5$*+"!,$"-+
)*79*-(-'$'&9($,+(-',6,(+$%&$'5($)1"..$/(6(/(-)($&!'()
G&!'()-.,(/$00$1$2$&!'()-.,(3H$,-$'5($)*-.'"-'$9**1
"/('0/-
%B
*%E()'/(6$!"I(79'&J
/('0/-.$"$/(6(/(-)($6/*7$"$7('5*+
"//"&1(-2'5
%(
"//"&/(6$!"1(-2'5
2('.$'5($1(-2'5$*6$"-$"//"&
".'*/(
>"
*%E()'/(6$!
.'*/(.$"$/(6(/(-)($,-'*$"$1*)"1$#"/,"%1($%&!'()
".'*/(AB
K%
*%E()'/(6$!
.'*/(.$"$/(6(/(-)($,-'*$1*)"1$#"/,"%1($B
".'*/(A3
K)
*%E()'/(6$!
.'*/(.$"$/(6(/(-)($,-'*$1*)"1$#"/,"%1($3
".'*/(A?
K+
*%E()'/(6$!
.'*/(.$"$/(6(/(-)($,-'*$1*)"1$#"/,"%1($?
".'*/(A>
K(
*%E()'/(6$!
.'*/(.$"$/(6(/(-)($,-'*$1*)"1$#"/,"%1($>
"'5/*F
%6
*%E()'/(6$!"I(79'&J:
*%E()'/(6
'5/*F.$"-$(//*/$*/$(;)(9',*-$G-*',)($'5"'$'5($/(.'$*6$'5(
3D$,-+(;
?D$,-+(;%&'(3:$,-+(;%&'(?
3D$,-+(;
.'")8$,.$)1("/(+:$1("#,-2$*-1&$"$/(6(/(-)($'*$'5(
45/*F"%1(H
%"1*"+
>>
"//"&/(6:$,-+(;$!"#"10(
1*"+.$"$%&'($*/$L**1("-$#"10($6/*7$"-$"//"&
%".'*/(
@K
"//"&/(6:$,-+(;:$#"10($!
.'*/(.$"$%&'($*/$L**1("-$#"10($,-'*$"-$"//"&
%,90.5
3B
!"#"10(
90.5(.$"$-.,($*-'*$'5($.'")8$".$"-$,-'(2(/$45#"(
)"1*"+
>K
"//"&/(6:$,-+(;$!"#"10(
1*"+.$"$)5"/$6/*7$"-$"//"&
)".'*/(
@@
"//"&/(6:$,-+(;:$#"10($!
.'*/(.$"$)5"/$,-'*$"-$"//"&
)5()8)".'
)B
*%E()'/(6$!"*%E()'/(6
)5()8.$F5('5(/$"-$+-6(*,7(8$,.$*6$"$)(/'",-$'&9(:$'5(
3D$%&'(
?D$,-+(;%&'(3:$,-+(;%&'(?
)1"..$/(6(/(-)($*6$F5,)5$,.$,-$'5($)*-.'"-'$9**1$"'
&!'()$G&!'()-.,(/$00$1$2$&!'()-.,(3H
+?6
CB
#"10($!"/(.01'
)*-#(/'.$"$+*0%1($'*$"$61*"'
+?,
M(
#"10($!"/(.01'
)*-#(/'.$"$+*0%1($'*$"-$,-'
+?1
M6
#"10($!"/(.01'
)*-#(/'.$"$+*0%1($'*$"$1*-2
+"++
N>
#"10(3:$#"10(?$!"/(.01'
"++.$'F*$+*0%1(.
+"1*"+
>3
"//"&/(6:$,-+(;$!"#"10(
1*"+.$"$+*0%1($6/*7$"-$"//"&
75
Java bytecode instruction listings
2
dastore
52
arrayref, index, value Ä
stores a double into an array
dcmpg
98
value1, value2 Ä result
compares two doubles
dcmpl
97
compares two doubles
dconst_0
0e
Ä 0.0
pushes the constant 0.0 onto the stack
dconst_1
0f
Ä 1.0
pushes the constant 1.0 onto the stack
ddiv
6f
divides two doubles
dload
18
Ä value
loads a double value from a local variable #index
dload_0
26
Ä value
loads a double from local variable 0
dload_1
27
Ä value
dload_2
28
Ä value
dload_3
29
Ä value
dmul
6b
multiplies two doubles
dneg
77
value Ä result
negates a double
drem
73
gets the remainder from a division between two
doubles
dreturn
af
value Ä [empty]
returns a double from a method
dstore
39
value Ä
stores a double value into a local variable #index
dstore_0
47
value Ä
stores a double into local variable 0
dstore_1
48
value Ä
dstore_2
49
value Ä
dstore_3
4a
value Ä
dsub
67
subtracts a double from another
dup
59
value Ä value, value
duplicates the value on top of the stack
dup_x1
5a
value2, value1 Ä value1,
value2, value1
inserts a copy of the top value into the stack two
values from the top
dup_x2
5b
value3, value2, value1 Ä
value1, value3, value2,
value1
inserts a copy of the top value into the stack two (if
value2 is double or long it takes up the entry of
value3, too) or three values (if value2 is neither double
nor long) from the top
dup2
5c
{value2, value1} Ä
{value2, value1}, {value2,
value1}
duplicate top two stack words (two values, if value1 is
not double nor long; a single value, if value1 is double
or long)
dup2_x1
5d
value3, {value2, value1}
Ä {value2, value1},
value3, {value2, value1}
duplicate two words and insert beneath third word (see
explanation above)
dup2_x2
5e
{value4, value3}, {value2,
value1} Ä {value2,
value1}, {value4, value3},
{value2, value1}
duplicate two words and insert beneath fourth word
f2d
8d
value Ä result
converts a float to a double
f2i
8b
value Ä result
converts a float to an int
f2l
8c
value Ä result
converts a float to a long
fadd
62
adds two floats
1: index
1: index
3
faload
30
arrayref, index Ä value
loads a float from an array
fastore
51
stores a float in an array
fcmpg
96
compares two floats
fcmpl
95
compares two floats
fconst_0
0b
Ä 0.0f
pushes 0.0f on the stack
fconst_1
0c
Ä 1.0f
fconst_2
0d
Ä 2.0f
fdiv
6e
divides two floats
fload
17
Ä value
loads a float value from a local variable #index
fload_0
22
Ä value
loads a float value from local variable 0
fload_1
23
Ä value
fload_2
24
Ä value
fload_3
25
Ä value
fmul
6a
multiplies two floats
fneg
76
value Ä result
negates a float
frem
72
gets the remainder from a division between two floats
freturn
ae
value Ä [empty]
returns a float
fstore
38
value Ä
stores a float value into a local variable #index
fstore_0
43
value Ä
stores a float value into local variable 0
fstore_1
44
value Ä
fstore_2
45
value Ä
fstore_3
46
value Ä
fsub
66
subtracts two floats
getfield
b4
2: index1, index2
objectref Ä value
gets a field value of an object objectref, where the
field is identified by field reference in the constant
pool index (index1 << 8 + index2)
getstatic
b2
2: index1, index2
Ä value
gets a static field value of a class, where the field is
identified by field reference in the constant pool index
(index1 << 8 + index2)
goto
a7
2: branchbyte1, branchbyte2
[no change]
goes to another instruction at branchoffset (signed
short constructed from unsigned bytes branchbyte1
<< 8 + branchbyte2)
goto_w
c8
4: branchbyte1, branchbyte2,
branchbyte3, branchbyte4
[no change]
goes to another instruction at branchoffset (signed int
constructed from unsigned bytes branchbyte1 << 24
+ branchbyte2 << 16 + branchbyte3 << 8 +
branchbyte4)
i2b
91
value Ä result
converts an int into a byte
i2c
92
value Ä result
converts an int into a character
i2d
87
value Ä result
converts an int into a double
i2f
86
value Ä result
converts an int into a float
i2l
85
value Ä result
converts an int into a long
i2s
93
value Ä result
converts an int into a short
1: index
1: index
4
iadd
60
adds two ints together
iaload
2e
loads an int from an array
iand
7e
performs a bitwise and on two integers
iastore
4f
stores an int into an array
iconst_m1
02
Ä -1
loads the int value -1 onto the stack
iconst_0
03
Ä0
loads the int value 0 onto the stack
iconst_1
04
Ä1
iconst_2
05
Ä2
iconst_3
06
Ä3
iconst_4
07
Ä4
iconst_5
08
Ä5
idiv
6c
divides two integers
if_acmpeq
a5
value1, value2 Ä
if references are equal, branch to instruction at
branchoffset (signed short constructed from unsigned
bytes branchbyte1 << 8 + branchbyte2)
if_acmpne
a6
value1, value2 Ä
if references are not equal, branch to instruction at
if_icmpeq
9f
value1, value2 Ä
if ints are equal, branch to instruction at branchoffset
(signed short constructed from unsigned bytes
branchbyte1 << 8 + branchbyte2)
if_icmpne
a0
value1, value2 Ä
if ints are not equal, branch to instruction at
if_icmplt
a1
value1, value2 Ä
if value1 is less than value2, branch to instruction at
if_icmpge
a2
value1, value2 Ä
if value1 is greater than or equal to value2, branch to
instruction at branchoffset (signed short constructed
from unsigned bytes branchbyte1 << 8 +
branchbyte2)
if_icmpgt
a3
value1, value2 Ä
if value1 is greater than value2, branch to instruction
at branchoffset (signed short constructed from
unsigned bytes branchbyte1 << 8 + branchbyte2)
if_icmple
a4
value1, value2 Ä
if value1 is less than or equal to value2, branch to
branchbyte2)
ifeq
99
value Ä
if value is 0, branch to instruction at branchoffset
ifne
9a
value Ä
if value is not 0, branch to instruction at branchoffset
iflt
9b
value Ä
if value is less than 0, branch to instruction at
5
ifge
9c
value Ä
if value is greater than or equal to 0, branch to
branchbyte2)
ifgt
9d
value Ä
if value is greater than 0, branch to instruction at
ifle
9e
value Ä
if value is less than or equal to 0, branch to instruction
at branchoffset (signed short constructed from
unsigned bytes branchbyte1 << 8 + branchbyte2)
ifnonnull
c7
value Ä
if value is not null, branch to instruction at
ifnull
c6
value Ä
if value is null, branch to instruction at branchoffset
iinc
84
2: index, const
[No change]
increment local variable #index by signed byte const
iload
15
1: index
Ä value
loads an int value from a local variable #index
iload_0
1a
Ä value
loads an int value from local variable 0
iload_1
1b
Ä value
iload_2
1c
Ä value
iload_3
1d
Ä value
imul
68
multiply two integers
ineg
74
value Ä result
negate int
instanceof
c1
2: indexbyte1, indexbyte2
objectref Ä result
determines if an object objectref is of a given type,
identified by class reference index in constant pool
(indexbyte1 << 8 + indexbyte2)
invokeinterface
b9
4: indexbyte1, indexbyte2, count,
0
objectref, [arg1, arg2, ...]
Ä
invokes an interface method on object objectref,
where the interface method is identified by method
reference index in constant pool (indexbyte1 << 8 +
indexbyte2)
invokespecial
b7
Ä
invoke instance method on object objectref, where the
method is identified by method reference index in
constant pool (indexbyte1 << 8 + indexbyte2)
invokestatic
b8
[arg1, arg2, ...] Ä
invoke a static method, where the method is identified
by method reference index in constant pool
invokevirtual
b6
Ä
invoke virtual method on object objectref, where the
method is identified by method reference index in
constant pool (indexbyte1 << 8 + indexbyte2)
ior
80
bitwise int or
irem
70
logical int remainder
ireturn
ac
value Ä [empty]
returns an integer from a method
ishl
78
int shift left
ishr
7a
int arithmetic shift right
istore
36
value Ä
store int value into variable #index
istore_0
3b
value Ä
store int value into variable 0
1: index
6
istore_1
3c
value Ä
istore_2
3d
value Ä
istore_3
3e
value Ä
isub
64
int subtract
iushr
7c
int logical shift right
ixor
82
int xor
jsr
a8
Ä address
jump to subroutine at branchoffset (signed short
constructed from unsigned bytes branchbyte1 << 8 +
branchbyte2) and place the return address on the stack
jsr_w
c9
4: branchbyte1, branchbyte2,
branchbyte3, branchbyte4
Ä address
jump to subroutine at branchoffset (signed int
constructed from unsigned bytes branchbyte1 << 24
+ branchbyte2 << 16 + branchbyte3 << 8 +
branchbyte4) and place the return address on the stack
l2d
8a
value Ä result
converts a long to a double
l2f
89
value Ä result
converts a long to a float
l2i
88
value Ä result
converts a long to a int
ladd
61
add two longs
laload
2f
load a long from an array
land
7f
bitwise and of two longs
lastore
50
store a long to an array
lcmp
94
compares two longs values
lconst_0
09
Ä 0L
pushes the long 0 onto the stack
lconst_1
0a
Ä 1L
pushes the long 1 onto the stack
ldc
12
1: index
Ä value
pushes a constant #index from a constant pool (String,
int or float) onto the stack
ldc_w
13
Ä value
pushes a constant #index from a constant pool (String,
int or float) onto the stack (wide index is constructed
as indexbyte1 << 8 + indexbyte2)
ldc2_w
14
Ä value
pushes a constant #index from a constant pool (double
or long) onto the stack (wide index is constructed as
indexbyte1 << 8 + indexbyte2)
ldiv
6d
divide two longs
lload
16
Ä value
load a long value from a local variable #index
lload_0
1e
Ä value
load a long value from a local variable 0
lload_1
1f
Ä value
lload_2
20
Ä value
lload_3
21
Ä value
lmul
69
multiplies two longs
lneg
75
value Ä result
negates a long
lookupswitch
ab
key Ä
a target address is looked up from a table using a key
and execution continues from the instruction at that
address
1: index
4+: <0-3 bytes padding>,
defaultbyte1, defaultbyte2,
npairs1, npairs2, npairs3, npairs4,
match-offset pairs...
7
lor
81
bitwise or of two longs
lrem
71
remainder of division of two longs
lreturn
ad
value Ä [empty]
returns a long value
lshl
79
bitwise shift left of a long value1 by value2 positions
lshr
7b
bitwise shift right of a long value1 by value2 positions
lstore
37
value Ä
store a long value in a local variable #index
lstore_0
3f
value Ä
store a long value in a local variable 0
lstore_1
40
value Ä
lstore_2
41
value Ä
lstore_3
42
value Ä
lsub
65
subtract two longs
lushr
7d
bitwise shift right of a long value1 by value2
positions, unsigned
lxor
83
bitwise exclusive or of two longs
monitorenter
c2
objectref Ä
enter monitor for object ("grab the lock" - start of
synchronized() section)
monitorexit
c3
objectref Ä
exit monitor for object ("release the lock" - end of
synchronized() section)
multianewarray
c5
3: indexbyte1, indexbyte2,
dimensions
count1, [count2,...] Ä
arrayref
create a new array of dimensions dimensions with
elements of type identified by class reference in
constant pool index (indexbyte1 << 8 + indexbyte2);
the sizes of each dimension is identified by count1,
[count2, etc.]
new
bb
Ä objectref
creates new object of type identified by class reference
in constant pool index (indexbyte1 << 8 + indexbyte2)
newarray
bc
1: atype
count Ä arrayref
creates new array with count elements of primitive
type identified by atype
nop
00
[No change]
performs no operation
pop
57
value Ä
discards the top value on the stack
pop2
58
{value2, value1} Ä
discards the top two values on the stack (or one value,
if it is a double or long)
putfield
b5
objectref, value Ä
set field to value in an object objectref, where the field
is identified by a field reference index in constant pool
putstatic
b3
value Ä
set static field to value in a class, where the field is
identified by a field reference index in constant pool
ret
a9
1: index
[No change]
continue execution from address taken from a local
variable #index (the asymmetry with jsr is intentional)
return
b1
Ä [empty]
return void from method
saload
35
load short from array
sastore
56
store short to array
sipush
11
Ä value
pushes a short onto the stack
swap
5f
value2, value1 Ä value1,
value2
swaps two top words on the stack (note that value1
and value2 must not be double or long)
1: index
2: byte1, byte2
8
tableswitch
aa
4+: [0-3 bytes padding],
lowbyte1, lowbyte2, lowbyte3,
lowbyte4, highbyte1, highbyte2,
highbyte3, highbyte4, jump
offsets...
index Ä
wide
c4
3/5: opcode, indexbyte1,
indexbyte2
or
iinc, indexbyte1, indexbyte2,
countbyte1, countbyte2
[same as for corresponding execute opcode, where opcode is either iload, fload,
instructions]
aload, lload, dload, istore, fstore, astore, lstore, dstore,
or ret, but assume the index is 16 bit; or execute iinc,
where the index is 16 bits and the constant to
increment by is a signed 16 bit short
breakpoint
ca
reserved for breakpoints in Java debuggers; should not
appear in any class file
impdep1
fe
reserved for implementation-dependent operations
within debuggers; should not appear in any class file
impdep2
ff
reserved for implementation-dependent operations
within debuggers; should not appear in any class file
(no name)
cb-fd
these values are currently unassigned for opcodes and
are reserved for future use
xxxunusedxxx
ba
External links
Ä Sun's Java Virtual Machine Specification [1]
References
[1] http:/ / java. sun. com/ docs/ books/ vmspec/ 2nd-edition/ html/ VMSpecTOC. doc. html
continue execution from an address in the table at
offset index
this opcode is reserved "for historical reasons"
B. Miscellaneous
<?xml version=" 1 . 0 " e n c o d i n g="UTF−8" standalone=" y e s " ?>
< s e t t i n g s>
<t i m e o u t>30000</ t i m e o u t>
<seqLength>10</ seqLength>
<g e n e r a t i o n s>80</ g e n e r a t i o n s>
10
<topKUpdateTrails>0</ topKUpdateTrails>
<l e a r n i n g R a t e>0 . 3</ l e a r n i n g R a t e>

0.7

<pheromoneDefault>4000</ pheromoneDefault>
<a c t i o n D e l a y>80</ a c t i o n D e l a y>
<a c t i o n D u r a t i o n>0</ a c t i o n D u r a t i o n>
<o u t D i r>output</ o u t D i r>
<s u s p D i r>s u s p i c i o u s</ s u s p D i r>
<s u t> . . / c t e . exe</ s u t>
<mainWindowName>CTE XL P r o f e s s i o n a l</mainWindowName>
<c l e a n u p F i l e s>
< f i l e > . . / workspace</ f i l e >
< f i l e>
C: \Dokumente und E i n s t e l l u n g e n \ B a u e r s f e l d \ d e f a u l t . c t e
</ f i l e >
< f i l e > . . / c o n f i g u r a t i o n / o r g . e c l i p s e . o s g i / . manager</ f i l e >
</ c l e a n u p F i l e s>
<s u s p i c i o u s>
< !−−<s t d o u t>.</ s t d o u t>−−>
<s t d e r r> .</ s t d e r r>
</ s u s p i c i o u s>
</ s e t t i n g s>
Figure B.1.: settings.xml.
83
B. Miscellaneous
Display
Shell
CBanner
Composite
Composite
CoolBar
CoolBar
ToolItem
CLabel
ToolBar
ProgressIndicator
ToolItem
ToolItem
ToolBar
ToolBar
ToolBar
ToolItem
ToolItem
ToolItem
ToolItem
ToolItem
ToolItem
ToolBar
ToolBar
ToolBar
ToolItem
ToolItem
ToolItem
ToolItem
ToolItem
ToolItem
ToolItem
ToolItem
Combo
Combo
Combo
ToolBar
ToolBar
CoolItem
CoolItem
CoolItem
ToolItem
ToolItem
ToolItem
ToolItem
ToolItem
ToolItem
ToolBar
PageBook
ToolItem
LayoutComposite
ToolItem
TabbedPropertyComposite
PageBook
TopNavigationElement
ListElement
BottomNavigationElement
TabbedPropertyList
CLabel
LayoutComposite
ListElement
LayoutComposite
ScrollBar
LayoutComposite
TreeItem
TreeItem
TreeItem
TreeItem
TreeItem
TreeItem
TreeItem
ScrollBar
TreeItem
FigureCanvas
ScrollBar
TreeItem
TreeColumn
FreeformLayeredPane
ScrollBar
Tree
TreeItem
TreeColumn
Sash
Sash
RulerComposite
Composite
TreeItem
TreeItem
ScrolledComposite
TabbedPropertyTitle
Toolbars
SynchronizedTestCaseTable
TreeItem
ScrollBar
LayoutComposite
LayoutComposite
FreeformLayer
FreeformLayeredPane
FreeformLayer
ResizeHandle
ResizeHandle
DiagramRenderedScalableFreeformLayeredPane
MoveHandle
ResizeHandle
ResizeHandle
ResizeHandle
FreeformLayer
ResizeHandle
GuideLayer
FeedbackLayer
ResizeHandle
ResizeHandle
LayoutComposite
Tree
TreeItemHasParentFigure
BorderItemsAwareFreeFormLayer
ConnectionLayerEx
FreeformLayer
TreeColumn
TreeItem
TreeColumn
TreeItem
TreeItem
DefaultSizeNodeFigure
CteClassFigure
CteClassFigure
CteClassFigure
CteClassificationFigure
CteCompositionFigure
CteClassFigure
CteClassFigure
WrappingLabel
WrappingLabel
WrappingLabel
WrappingLabel
WrappingLabel
WrappingLabel
WrappingLabel
WrappingLabel
WrappingLabel
WrappingLabel
WrappingLabel
FlowPage
FlowPage
FlowPage
FlowPage
FlowPage
FlowPage
FlowPage
FlowPage
FlowPage
FlowPage
FlowPage
TextFlowEx
TextFlowEx
TextFlowEx
TextFlowEx
TextFlowEx
TextFlowEx
TextFlowEx
TextFlowEx
TextFlowEx
TextFlowEx
TextFlowEx
TreeItem
Figure B.2.: Widget Tree for the CTE.
CoolItem
CoolItem
Composite
ToolItem
CoolItem
ToolItem
ToolItem
Composite
Composite
ToolBar
Composite
Composite
StatusLine
Composite
ToolItem
Composite
Composite
ViewForm
ToolBar
Composite
Composite
CTabFolder
CTabItem
ToolItem
Composite
CTabFolder
ViewForm
Composite
Canvas
ProgressRegion$1
CoolBar
CTabFolder
CTabItem
ToolItem
TreeItem
Composite
MenuItem
Sash
ToolBar
SashForm
Tree
TrimCommonUIHandle
MenuItem
Sash
ViewForm
Popup
Menu
SashForm
MenuItem
Composite
Sash
SashForm
MenuItem
CTabItem
Menu
CoolItem
MenuItem
MenuItem
MenuItem
MenuItem
MenuItem
MenuItem
MenuItem
MenuItem
MenuItem
MenuItem
MenuItem
MenuItem
MenuItem
MenuItem
MenuItem
Tree
Figures
MenuItem
Menu
MenuItem
MenuItem
MenuItem
MenuItem
MenuItem
MenuItem
MenuItem
84
MenuItem
MenuItem
MenuItem
Main
Menu
Main
Window
Widgets
Snippets
Examples
FAQ
Tools
Javadoc
Documentation
Community
Bugs
Contact Us
SWT Home
Eclipse Platform
Eclipse Home
SWT Widgets
Below are screenshots and links to documentation for many of the widgets included in SWT. For a complete list of classes including
those that don't screenshot well, see the SWT Javadoc.
Browser
Button (SWT.ARROW)
Button (SWT.CHECK)
javadoc - snippets
javadoc - snippets
javadoc - snippets
Button (SWT.PUSH)
Button (SWT.RADIO)
Button (SWT.TOGGLE)
javadoc - snippets
javadoc - snippets
javadoc - snippets
Canvas
Combo
Composite
javadoc - snippets
javadoc - snippets
javadoc - snippets
CoolBar
CTabFolder
DateTime
javadoc - snippets
javadoc - snippets
javadoc - snippets
ExpandBar
Group
Label
javadoc - snippets
javadoc
javadoc - snippets
Link
List
Menu
javadoc - snippets
javadoc - snippets
javadoc - snippets
85
Figure B.3.: SWT Widgets [Ecl11].
B. Miscellaneous
ProgressBar
Sash
ScrolledComposite
javadoc - snippets
javadoc - snippets
javadoc - snippets
Shell
Slider
Scale
javadoc - snippets
javadoc - snippets
javadoc - snippets
Spinner
StyledText
TabFolder
javadoc - snippets
javadoc - snippets
javadoc - snippets
Table
Text (SWT.SINGLE)
Text (SWT.MULTI)
javadoc - snippets
javadoc - snippets
javadoc - snippets
ToolBar
Tray
Tree
javadoc - snippets
javadoc - snippets
javadoc - snippets
Figure B.4.: SWT Widgets [Ecl11].
86
Bibliography
[ADJ+ 11] Shay Artzi, Julian Dolby, Simon Holm Jensen, Anders Møller, and Frank
Tip. A framework for automated testing of javascript web applications.
In Proceeding of the 33rd international conference on Software engineering, ICSE ’11, pages 571–580, New York, NY, USA, 2011. ACM.
[AOA05]
Anneliese A. Andrews, Jeff Offutt, and Roger T. Alexander. Testing web
applications by modeling with fsms. Software and Systems Modeling,
4:326–345, 2005.
[BHMV08] Walter Binder, Jarle Hulaas, Philippe Moret, and Alex Villazón.
Platform-independent profiling in a virtual execution environment. Software: Practice and Experience, 2008.
[BPMP08] Alexander E.I. Brownlee, Martin Pelikan, John A.W. McCall, and Andrei
Petrovski. An application of a multivariate estimation of distribution
algorithm to cancer chemotherapy. In Proceedings of the 10th annual
conference on Genetic and evolutionary computation, GECCO ’08, pages
463–464, New York, NY, USA, 2008. ACM.
[BSS02]
André Baresel, Harmen Sthamer, and Michael Schmidt. Fitness function
design to improve evolutionary structural testing. In Proceedings of the
Genetic and Evolutionary Computation Conference, GECCO ’02, pages
1329–1336, San Francisco, CA, USA, 2002. Morgan Kaufmann Publishers
Inc.
[CGMC03] Myra B. Cohen, Peter B. Gibbons, Warwick B. Mugridge, and Charles J.
Colbourn. Constructing test suites for interaction testing. In Proceedings
of the 25th International Conference on Software Engineering, ICSE ’03,
pages 38–48, Washington, DC, USA, 2003. IEEE Computer Society.
[DB05]
Marco Dorigo and Christian Blum. Ant colony optimization theory: a
survey. Theor. Comput. Sci., 344:243–278, November 2005.
[DCG99]
Marco Dorigo, Gianni Di Caro, and Luca M. Gambardella. Ant algorithms for discrete optimization. Artificial Life, 5:137–172, 1999.
[DS09]
Marco Dorigo and Thomas Stützle. Ant colony optimization: Overview
and recent advances, 2009.
[Ecl11]
Eclipse.org. SWT Widget Gallery. http://www.eclipse.org/swt/
widgets/, 2011. [Online; accessed 02-July-2011].
V
Bibliography
[GADP89] S. Goss, S. Aron, J. Deneubourg, and J. Pasteels. Self-organized shortcuts in the Argentine ant. Naturwissenschaften, 76(12):579–581, December 1989.
[GCD09]
Brady J. Garvin, Myra B. Cohen, and Matthew B. Dwyer. An improved
meta-heuristic search for constrained interaction testing. In Proceedings
of the 2009 1st International Symposium on Search Based Software Engineering, SSBSE ’09, pages 13–22, Washington, DC, USA, 2009. IEEE
Computer Society.
[HCM10]
Si Huang, Myra B. Cohen, and Atif M. Memon. Repairing gui test suites
using a genetic algorithm. In ICST ’10: Proceedings of the 2010 Third International Conference on Software Testing, Verification and Validation,
pages 245–254, Washington, DC, USA, 2010. IEEE Computer Society.
[KG96]
David J. Kasik and Harry G. George. Toward automatic generation of
novice user test scripts. In CHI ’96: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 244–251, New York,
NY, USA, 1996. ACM.
[KS01]
Dawid Kurzyniec and Vaidy Sunderam. Efficient cooperation between
java and native codes – jni performance benchmark. In In The 2001
International Conference on Parallel and Distributed Processing Techniques and Applications, 2001.
[Lia99]
Sheng Liang. Java Native Interface: Programmer’s Guide and Reference.
Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1st
edition, 1999.
[Luk09]
Sean Luke. Essentials of Metaheuristics. Lulu, 2009. http://cs.gmu.
edu/~sean/book/metaheuristics/.
[LY99]
Tim Lindholm and Frank Yellin. Java(TM) Virtual Machine Specification, The (2nd Edition). Prentice Hall PTR, 2 edition, April 1999.
[MBN03]
Atif Memon, Ishan Banerjee, and Adithya Nagarajan. Gui ripping: Reverse engineering of graphical user interfaces for testing. In Proceedings
of the 10th Working Conference on Reverse Engineering, WCRE ’03,
pages 260–, Washington, DC, USA, 2003. IEEE Computer Society.
[McM04]
Phil McMinn. Search-based software test data generation: a survey:
Research articles. Softw. Test. Verif. Reliab., 14:105–156, June 2004.
[Mem01]
Atif M. Memon. A comprehensive framework for testing graphical user
interfaces. Ph.D., 2001. Advisors: Mary Lou Soffa and Martha Pollack; Committee members: Prof. Rajiv Gupta (University of Arizona),
Prof. Adele E. Howe (Colorado State University), Prof. Lori Pollock (University of Delaware).
VI
Bibliography
[Mem07]
Atif Memon. Softw. test. verif. reliab.; an event-flow model of gui-based
applications for testing: Research articles. 17(3):137–157, 2007.
[MM08]
Scott McMaster and Atif Memon. Call-stack coverage for gui test suite
reduction. IEEE Transactions on Software Engineering, 34:99–115, 2008.
[MSP01]
Atif M. Memon, Mary Lou Soffa, and Martha E. Pollack. Coverage
criteria for gui testing. In ESEC/FSE-9: Proceedings of the 8th European
software engineering conference held jointly with 9th ACM SIGSOFT
international symposium on Foundations of software engineering, pages
256–267, New York, NY, USA, 2001. ACM.
[MT11]
Alessandro Marchetto and Paolo Tonella. Using search-based algorithms
for ajax event sequence generation during testing. Empirical Softw.
Engg., 16:103–140, February 2011.
[Ora06]
Oracle. JVM Tool Interface. http://download.oracle.com/javase/
6/docs/platform/jvmti/jvmti.html, 2006. [Online; accessed 25-May2011].
[PM99]
Martin Pelikan and Heinz Mühlenbein. Marginal distributions in evolutionary algorithms. In Proceedings of the International Conference on
Genetic Algorithms Mendel 1998, pages 90–95, 1999.
[Str06]
Jaymie Strecker. An empirical evaluation of test adequacy criteria for
event-driven programs, 2006.
[SWB02]
Harmen Sthamer, Joachim Wegener, and Andre Baresel. Using evolutionary testing to improve efficiency and quality in software testing. In
In Proceedings of the 2nd Asia-Pacific Conference on Software Testing
Analysis and Review (AsiaSTAR, pages 22–24, 2002.
[Wap07]
Stefan Wappler. Automatic Generation Of Object-Oriented Unit Tests
Using Genetic Programming. PhD thesis, Institut für Softwaretechnik
und Theoretische Informatik, Elektrotechnik und Informatik, Technische
Universitat Berlin, 19 December 2007.
[Weg01]
Joachim Wegener. Evolutionärer test des zeitverhaltens von realzeitsystemen. Shaker Verlag, 2001.
[WGGS96] Joachim Wegener, Klaus Grimm, Matthias Grochtmann, and Harmen
Sthamer. Systematic testing of real-time systems. In In Proceedings of
the 4th European Conference on Software Testing, Analysis and Review
(EuroSTAR 1996), 1996.
[Wik11]
Wikipedia. Java bytecode instruction listings — wikipedia, the free
encyclopedia. http://en.wikipedia.org/w/index.php?title=Java_
bytecode_instruction_listings&oldid=429777649, 2011. [Online;
accessed 24-May-2011].
VII
Bibliography
[WK00]
Steve Wilson and Jeff Kesselman. Java Platform Performance: Strategies
and Tactics. Addison-Wesley Longman Publishing Co., Inc., Boston,
MA, USA, 2000.
[WM97]
David H. Wolpert and William G. Macready. No free lunch theorems for
optimization. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 1(1):67–82, 1997.
[WW06]
Stefan Wappler and Joachim Wegener. Evolutionary unit testing of
object-oriented software using strongly-typed genetic programming. In
Proceedings of the 8th annual conference on Genetic and evolutionary
computation, GECCO ’06, pages 1925–1932, New York, NY, USA, 2006.
ACM.
[WWW07] Andreas Windisch, Stefan Wappler, and Joachim Wegener. Applying
particle swarm optimization to software testing. In Proceedings of the 9th
annual conference on Genetic and evolutionary computation, GECCO
’07, pages 1121–1128, New York, NY, USA, 2007. ACM.
VIII
A Metaheuristic Approach to automatic Test Case Generation
for GUI-Based Applications.
Erklärungen
Selbstständigkeitserklärung
Ich erkläre hiermit, dass ich die vorliegende Arbeit selbstständig und nur unter Verwendung der angegebenen Quellen und Hilfsmittel angefertigt habe.
Berlin,
Datum
Unterschrift
Einverständniserklärung
Ich erkläre hiermit mein Einverständnis, dass die vorliegende Arbeit in der Bibliothek
des Institutes für Informatik der Humboldt-Universität zu Berlin ausgestellt werden
darf.
Berlin,
Datum
Unterschrift
IX

A Metaheuristic Approach to Automatic Test Case Generation for

Transcription

Similar documents

Welcome to the Fitness Trails

135 136 photography by JeNaVieVe belair

Read More - McCollister`s

3–4 September at Synergy 81 in Chessington

Case Study - Netsertive

The SeattleGYM brochure

Silent Auction - Delaware Dance Company

Booyah!

SPECIAL EDITION - December 2014

American women are hyper-focused on extremes. Have we lost