A Metaheuristic Approach to Automatic Test Case Generation for
Transcription
A Metaheuristic Approach to Automatic Test Case Generation for
Diplomarbeit A Metaheuristic Approach to Automatic Test Case Generation for GUI-Based Applications Sebastian Bauersfeld bauersfeld (at) informatik.hu-berlin.de 22. August 2011 Gutachter: Prof. Dr. Klaus Bothe, Dr. Joachim Wegener Datum der Abgabe: 22. August 2011 2 Acknowledgements It is a pleasure to thank the people who helped me to accomplish this thesis. I would like to express my sincere gratitude to Dr. Joachim Wegener, who inspired and encouraged me in the beginning and gave valuable input throughout the course of this work. I would like to thank Dr. Stefan Wappler who helped me with his experience and incredibly detailed feedback. I wish to thank my parents for their support throughout the last weeks of writing. They made my life much easier during that period of time. I II Contents Contents 1. Introduction 1.1. Abstract . . 1.2. Motivation . 1.3. Objectives . 1.4. Outline . . . . . . 1 1 1 3 5 2. Introduction to Metaheuristics 2.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2. Ant Colony Optimization . . . . . . . . . . . . . . . . . . . . . . . . 7 7 11 3. Related Work 15 4. The 4.1. 4.2. 4.3. Approach Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Description of the individual Steps . . . . . . . . . . . . . . . . . . . Applying the Approach . . . . . . . . . . . . . . . . . . . . . . . . . 19 19 20 22 5. Sequence Generation with ACO 5.1. The Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3. Adjusting the Metaheuristic Optimization . . . . . . . . . . . . . . . 25 25 27 28 6. The Fitness Function 6.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 31 35 7. Operating the SUT 7.1. Scanning the GUI . . . . . . . . . . . . . . . . . . . . . . 7.2. Deriving Actions . . . . . . . . . . . . . . . . . . . . . . 7.3. Executing Actions . . . . . . . . . . . . . . . . . . . . . 7.3.1. Simulating Input versus Invoking Event Handlers 7.3.2. Naming Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 37 41 47 47 48 8. Implementation 8.1. The Framework . . . . . . . . 8.2. Java Agents . . . . . . . . . . 8.3. Implementation of the Fitness 8.3.1. The Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 51 53 54 55 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III Contents 8.3.2. Bytecode Instrumentation . 8.3.3. The ASM Framework . . . 8.4. Operating the SUT . . . . . . . . . 8.4.1. Accessing the SWT Classes 8.4.2. Generating Inputs . . . . . 8.4.3. Replaying Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 59 61 61 63 63 9. Experiment 9.1. Setup and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2. Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 65 69 10.Conclusion and Future Work 10.1. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2. Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 71 72 Appendices 75 A. Java Bytecode Instructions 75 B. Miscellaneous 83 Bibliography Erklärungen Selbstständigkeitserklärung . . . . . . . . . . . . . . . . . . . . . . . . . . Einverständniserklärung . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV V IX IX IX 1. Introduction 1.1. Abstract Testing applications with a Graphical User Interface (GUI) is an important, though challenging and time-consuming task. The state of the art in the industry are still scripting and capture and replay tools, which simplify the recording and execution of input sequences, but do not support the tester in finding fault-sensitive test cases. While search-based test case generation strategies, such as Evolutionary Testing, are well-researched for various areas of testing, relatively little work has been done on applying these techniques to an entire GUI of an application. This work presents an approach to finding input sequences, using a metaheuristic algorithm named Ant Colony Optimization. 1.2. Motivation Software testing is an important and widely-used quality assurance technique applied in the industry. Modern software systems comprise various components, which interact with each other to accomplish tasks. The correct behaviour of these components is often verified through unit, integration and system tests. Many of today’s applications have a special component in the form of a Graphical User Interface (GUI). A GUI is an interface which consists of control elements called widgets, like for example buttons, menu items and text boxes. The GUI is often the only part of the software that the user interacts with. It is thus necessary to thoroughly test this interface in order to ensure the quality of the product. This is done by creating test cases in the form of input sequences. An input sequence for a GUI application is a sequence of actions, like a click on a button control, or a drag and drop operation. Figure 1.1 shows an input sequence for Microsoft Word, which causes the currently opened document to be printed out. In the current industrial practice a typical testing scenario performed throughout the development process of a GUI application can look as follows: The testers start by designing an initial test suite with several test cases. Often these suites comprise common scenarios, like printing a document, filling in forms and committing the content to the database and so forth. The test cases are created with the help of scripting or capture and replay tools. With a scripting tool the testers have to write a script consisting of explicit actions to be executed on the System Under Test (SUT). A capture and replay tool facilitates the creation of such a script by recording the actions that the human 1 1. Introduction clickMenu ( " F i l e " ) , clickMenu ( " P r i n t " ) , pressKey ( Tab ) , type ( " 22 " ) , pressKey ( Tab ) , type ( " 44 " ) , c l i c k B u t t o n ( "OK" ) Figure 1.1.: Input sequence that causes Microsoft Word to print pages 22 to 44 of the current document. tester performs. Recorded or scripted sequences may then be replayed, for example to perform daily regression tests. However, due to the fact that the interface of the SUT undergoes various changes throughout the development process, many of these scripts will break, because they rely on widgets whose name or position has changed, or which have been removed. This means the testers have to constantly repair the scripts in order to maintain the test suite. This is labour-intensive with consequently high costs [Mem01]. Considering these difficulties, techniques for automatic test case generation are quite desirable. One way of dealing with the task of automatically generating test cases, is to transform it into an optimization problem. The idea is to define a quality criterion or fitness function and search for test cases which maximize this function. Since the search space of all possible test cases is often large and has a complex structure, one could try to exploit metaheuristic techniques. There has been a lot of research about this idea in a field commonly known as Evolutionary Testing [McM04]. For example: Wegener [Weg01] performs temporal testing with Genetic Algorithms. He tries to find input data with extreme execution times (either high or low). Wappler [Wap07] generates unit tests for classes by employing strongly-typed Genetic Programming to find method call sequences causing high code coverage of the classes under test. Windisch et al. [WWW07] perform structural testing, employing Particle Swarm Optimization to find arguments to functions, so that branch coverage gets maximized. Recently, metaheuristic techniques have also been applied to GUI testing [MT11, HCM10], but the research is still quite sparse. Automatic testing of GUI applications poses several difficult challenges among which are CH1 the huge amount of possible sequences. At each state of the SUT, there are many alternative actions to choose from, which leads to an exceptionally large search space. In addition it is computationally expensive to generate and evaluate sequences, since the SUT needs to be started and all the actions in the sequence need to be executed. This requires efficient algorithms which explore the search space in an intelligent manner to find good sequences. CH2 the lack of well-studied quality criteria. What characterizes a “good” and faultsensitive test sequence in the context of GUI testing? CH3 the technical difficulty of generating inputs. In order to click buttons, perform drag and drop operations or input text, one needs to be able to 2 1.3. Objectives 1. scan the GUI to determine the visible widgets and their properties (e.g. the positions of buttons, menu items etc.), 2. derive a set of reasonable actions at each execution stage (e.g. a visible, enabled button is clickable) 3. and execute, record and replay these later on. 1.3. Objectives The vision of a future framework for GUI testing could look as follows: Given a GUI application and a test oracle – which determines whether a sequence has been properly executed by the application – this framework automatically generates a fault-sensitive test suite and returns the list of detected errors, without human intervention. The development of such a framework is an ambitious task. This work contributes a first step towards accomplishing this task by presenting an approach for the automatic generation of single input sequences for GUI-based applications. To achieve this goal, it addresses the aforementioned challenges by 1. introducing and motivating a metaheuristic algorithm named Ant Colony Optimization, suitable for finding an input sequence with a high fitness value. 2. introducing and motivating a fitness function for input sequences, based on the Call Tree Size metric [MM08]. 3. presenting techniques for executing, recording and replaying complex actions on applications’ GUIs. All of the abovementioned objectives are implemented in a framework which is presented and tested in a first experiment. This framework focuses on Java applications based on the Standard Widget Toolkit. The SUT utilized in the experiment is the Classification Tree Editor (CTE) (see Figure 1.3), a graphical editor for classification trees, developed by Berner & Mattner Systemtechnik GmbH. Figure 1.2 shows an abstract version of the optimization process used by the presented approach. It sports the fitness function and the optimization algorithm working together in order to generate and improve sequences on each iteration. Ideally, it eventually finds a sequence with the optimal fitness value. This process, which will be explained in detail throughout the next chapters, is the main contribution of this work. Contrary to previous approaches it neither requires a model of the GUI1 nor existing human input sequences or similar handcrafted artifacts and is thus completely automatic. 1 For example in the form of a finite state machine, which provides a list of possible actions that may be executed in each state, etc. 3 1. Introduction Optimization Algorithm generate + execute sequence SUT "optimal" Sequence learn rate sequence Fitness Function Figure 1.2.: Optimization Process. Figure 1.3.: The Classification Tree Editor, which is the SUT for this work. 4 1.4. Outline 1.4. Outline The next chapter explains the concepts behind metaheuristic optimization techniques and introduces the Ant Colony Optimization algorithm. Chapter 3 gives an overview of existing approaches to automatic GUI testing. Chapter 4 presents the approach applied in this work. It explains the sequence generation process and the particular steps involved, which are elaborated in the following three chapters. Chapter 5 discusses how Ant Colony Optimization is used in this process in order to find sequences with high fitness values. Chapter 6 motivates and defines the fitness function and chapter 7 presents the techniques used to scan and operate the GUI, e.g. how to perform clicks, type text into text fields or perform drag and drop operations. Chapter 8 explains the implementation of the features presented throughout chapters 4 to 7 and introduces the framework which has been developed during the course of this work. Chapter 9 presents the results of an experiment, where the framework is applied to the Classification Tree Editor and compared to a random sequence generation strategy. Chapter 10 reviews the approach and discusses future work. 5 1. Introduction 6 2. Introduction to Metaheuristics This chapter gives a short introduction to metaheuristic techniques and introduces the Ant Colony Optimization algorithm. 2.1. Overview Optimization is the process of finding a solution with the highest value according to a given criterion. For example: Figure 2.1 shows the two-dimensional sinc function with its local and global optima. One could try to find the global maximum (x, y)∗ = arg max(x,y)∈S sinc(x, y), with S = [−20, 20]×[−20, 20]. The set S is the search space, the tuples (x, y) ∈ S are the candidate solutions or individuals and the function itself is the objective or fitness function. To solve this problem one could make use of classic optimization algorithms like Gradient Descent or Newton’s Method which employ the ∂f gradient ∇f = ( ∂f ∂x , ∂y ) of a fitness function f to direct their search process. They usually expect a start position s0 = (x0 , y0 ) ∈ S and use ∇f to create new and better candidate solutions in the neighbourhood of s0 . After a number of iterations and depending on the quality of s0 , they will eventually find a local or global optimum. In order to achieve this they make assumptions about the function, in particular that it is possible to calculate its gradient. Unfortunately, problems exist, where, contrary to the abovementioned example, the fitness function is discontinuous, nondifferentiable and lacks a closed-form expression. In these cases one cannot make use of classical search algorithms, but has to resort to different techniques. Metaheuristics belong to the subfield of stochastic optimization [Luk09] and make few or no assumptions about the problem at hand. They employ a certain degree of randomness to find optimal or near-optimal solutions to hard problems. Algorithm 1 presents the skeleton of a simplistic strategy named Hill-Climbing (HC) which is the metaheuristic equivalent to Gradient Descent. It starts with a given individual and employs a mutation operator which makes small, random modifications to individuals. In case the fitness of the modified individual exceeds the one of the original, HC accepts it as the current solution, otherwise it keeps the original. This process is repeated until certain stopping criteria are met. The essential parts of metaheuristic algorithms are now described using the example of the Knapsack Problem. 7 2. Introduction to Metaheuristics Figure 2.1.: The sinc function with local and global optima. Algorithm 1: Hill-Climbing 1 2 3 4 5 6 7 8 8 Input: start /* initial candidate solution */ Output: best individual found (local optimum) begin current ← start repeat ind ← mutate(current) if f itness(ind) > f itness(current) then current ← ind until stopping criteria met return current 2.1. Overview The Knapsack Problem. This is a well-known optimization Problem and has been shown to be NP-hard: Given a set I of items, a weight functionPw : I → R, a value function v :P I → R and a limit l ∈ R, find a subset K ⊆ I with u∈K w(u) ≤ l that maximizes u∈K v(u). Intuitively, the problem is about filling a bag with items, so that the value of the bag is maximal and its weight stays below a limit l. Representation. In order to apply metaheuristics, one first needs to define what the candidate solutions look like. This is an important step, since other parts of the metaheuristic depend upon the structure of the solutions, like for example the fitness function or the search operators. In the above example a candidate solution is a set s = {u1 , u2 , · · · , un } ∈ P(I) of items. Many metaheuristics work with vector representations, but more complex structures are possible. In Genetic Programming for example, the candidate solutions are often represented as trees. Fitness Function. The fitness function determines the quality of a candidate solution and is usually of the form f : S → R, where S is the search space. This function plays a central role in the optimization process, since it guides the algorithm towards the interesting regions of the search space [BSS02]. For the Knapsack Problem one could define f as P f (s) := u∈s v(u) p P , if u∈s w(u) ≤ l , else where p ∈ R would be a small value to penalize infeasible solutions, i.e. solutions where the bag’s weight exceeds l. As we can see here, the fitness function for this problem does not have a closed mathematical form and in contrast to for example the sinc function in Figure 2.1, it is not possible to calculate a gradient. Good fitness functions often satisfy the smoothness criterion [Luk09]: Solutions that lie close to each other in the search space, tend to have a similar fitness value. This does not mean that the function needs to be as smooth as the one depicted in the upper left of Figure 2.2, but it should not exhibit an extremely “hilly” character, as the one depicted in the lower left. This criterion is not sufficient for a metaheuristic to perform well, since “deceptive” or “needle in a haystack” environments are highly smooth, yet can be very challenging for algorithms like HC, since they either lead it away from the optimum, or do not give enough information to direct the optimization. The search landscape defined by the fitness function usually dictates the applicability of certain classes of metaheuristics. For example: A simple local optimization algorithm like HC would probably perform poorly on a multimodal landscape like the one generated by the sinc function. Unfortunately, it is generally not possible to visualize the entire landscape, for example due to high dimensionality. Hence, a lot of experience is involved with choosing the appropriate algorithm. 9 2. Introduction to Metaheuristics Unimodal Needle in a Haystack Noisy Deceptive (or “Hilly” or “Rocky”) Figure Search Landscapes [Luk09]. Figure2.2.: 6 Four example quality functions. Algorithm 10 Hill-Climbing with Random Restarts Operators. A metaheuristic usually employs one ore more operators. Operators are 1: Tfunctions ← distribution of possible time or intervals that create, modify select individuals. To apply HC, one needs to define a mutation operator, which is also used by many other metaheuristics. This operator 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: S usually ← somemakes initial small random candidate solution changes to individuals. Intuitively, it generates similar solutions Best ← S in the environment of the given one. Algorithm 2 shows a possible implementation repeat of this operator for the Knapsack Problem. time ← random time in the near future, chosen from T repeat Algorithm 2: Mutation operator for the Knapsack Problem. R ← Tweak(Copy(S)) Input: s candidate if Quality(R) >/* Quality(S) thensolution represented as a fixed list */ Output: S ← Rslightly modified copy of s 1 begin until S is the ideal solution, or time is up, or we have run out of total time if > Quality(Best) then 2 Quality(S) s0 = copy(s) Best ← S 3 index ← random index within [0, length(s) − 1] S random solution 4 ← some s0 [index] ← candidate random item u∈I 0 until5 S is return the idealssolution or we have run out of total time return Best like Genetic Algorithms often make use of long, additional operators like If Metaheuristics the randomly-chosen time intervals are generally extremely this algorithm is basically crossover, selection and an operator forare creating initialwe’re candidate solutions. one big Hill-Climber. Likewise, if the intervals very short, basically doing random search (by resetting to random new locations each time). Moderate interval lengths run the gamut between the two. That’s good, right? It Termination depends. Consider FigureSince 6. The figure, a situation where Criterion. onefirst only has alabeled limitedUnimodal, amount ofiscomputational re- HillClimbing is close to optimal, and where Random Search is a very bad pick. But for the figure sources, one needs to define a termination criterion, that is, a function that deterlabelled Noisy, Hill-Climbing is quite bad; and in fact Random Search is expected to be about mines when to stop the optimization process. This can be quite difficult, since often 19 10 2.2. Ant Colony Optimization the fitness of the best individual is unknown. Examples for termination criteria are: 1. Maximum amount of time reached. 2. Maximum number of generated candidate solutions reached. 3. Maximum number of bad moves reached. A bad move is when the algorithm generates a new individual, whose fitness does not exceed that of the current maximal solution. For example: When HC reaches a local optimum, it will not be able to improve any further. The choice of the termination criterion can have a big impact on the efficiency and effectiveness of the optimization process. If the process terminates prematurely, the resulting candidate solution will not be of high quality. If the process runs too long without making any improvements, it wastes resources. Exploration versus Exploitation HC is a local optimization algorithm. This is because it searches for new candidate solutions only in a very small area, namely the environment of the current solution. In addition, it only accepts solutions if they are better than the current one, which means it cannot go “down the hill”. Such a strategy is said to be exploitative and tends to get stuck in local optima. The other extreme are explorative algorithms. A representative of this class is Random Search (RS). RS generates candidate solutions completely at random with a uniform distribution over the search space. Both algorithms have their advantages and downsides. For example: RS might be more successful in a “needle in a haystack environment”, whereas HC would perform better in a unimodal environment as depicted in Figure 2.2. But quite often fitness landscapes like the one in Figure 2.1 require a hybrid optimization algorithm that exposes both properties. This is where algorithms like Simulated Annealing (SA) or Genetic Algorithms come into play. SA for example, is a version of HC that allows it to go downhill for a certain amount of steps2 . This way it does not get stuck in local optima as much as HC does and is considered to be a global optimization algorithm. There is always a tradeoff between exploration and exploitation in the design-process of a metaheuristic. This tradeoff depends on the problem to be solved and is often difficult to figure out [Luk09]. 2.2. Ant Colony Optimization This work adopts a metaheuristic technique named Ant Colony Optimization (ACO), which has shown to be effective in solving hard combinatorial optimization tasks like the Traveling Salesman Problem [DCG99]. The algorithm is inspired by the foraging behaviour of ants. Figure 2.3 shows the double bridge experiment [GADP89], which 2 Depending on the current temperature τ . 11 2. Introduction to Metaheuristics Foraging area Foraging area Foraging area Nest Nest Nest Figure 2.3.: Double bridge experiment with ants and a food source [DCG99]. sports an ant nest and a food source connected by paths of distinct length. It can be observed, that after a short transitional phase, where the ants use all paths equally (a), eventually the majority of the ants carries the food along the shortest path (b). It was found that ants deposit a pheromone on the ground while walking. Their walking direction, in turn, is influenced by other ants’ pheromone. The higher the pheromone density on a certain path, the more likely it gets travelled. When the experiment starts, no pheromone is on the ground and consequently the paths are roughly travelled equally. The ants which took the shorter path, arrive earlier at the food source. Once they picked up the food and prepare for return, they have to make a decision on which path to take to go back to the nest. Since there is already some pheromone on the short path, they tend to favor this one and hence, further increase its pheromone density. This eventually makes the majority of the ants follow the shorter path. The Ant Colony Optimization algorithm uses a similar strategy, though without a direct equivalent for the ants. Algorithm 3 shows the basic approach. The metaheuristic is population-oriented, which means that it works with an entire pool of candidate solutions. The solutions are called trails and a trail t = (c1 , c2 , · · · , cn ) ∈ C n consists of components c ∈ C from a component set (In the double bridge experiment these components would relate to the edges of the paths). Each component ci is associated with a pheromone value pi . In ACO the trails are constructed step by step. Therefore, the algorithm iteratively selects components from the component set. A Selection Rule determines how this is done. Usually, the components are selected proportionate to their pheromone values. The overall procedure is as follows: The algorithm first creates a certain amount of trails, i.e. a population. After 12 2.2. Ant Colony Optimization the population has been generated, each trail is assessed with the help of the fitness function. Then the pheromones of the components are updated. The Pheromone Update Rule determines how this is done. Usually, components that are part of high-rated trails, obtain better pheromone values than the ones that appear in lowrated trails. This leads to a higher utilization of those components within subsequent generations. The optimization process usually starts with equal pheromone values for each component. Thus, at the beginning of the optimization, it produces random trails. Over time, certain components obtain higher pheromone values than others, so that the optimization focuses on a certain area within the search space, hopefully the one containing the best trail. It is important to understand that this does not mean that toward the end only high-rated components are employed. In contrast to algorithms such as Hill-Climbing, ACO is a global metaheuristic, which means that it always samples from the entire search space and may always generate each possible trail. However, the likelihood for the generation of trails with low-rated components decreases over time. So essentially, the algorithm learns a probability distribution over the search space [DS09] and ideally the area with the best trails has the highest density. Algorithm 3: Skeleton for the Ant Colony Optimization algorithm. 1 2 3 4 5 6 7 8 9 10 Input: C ← {c1 , c2 , · · · , cn } /* Component Set. */ Input: p~ ← {p1 , p2 , · · · , pn } /* initial pheromone values. */ Input: popsize /* Number of trails in a population. */ Output: best trail found begin best ← repeat for i ← 1 to popsize do ti ← generateT rail() /* select components based on p~ */ if f itness(ti ) > f itness(best) then best ← ti Update the pheromone px of each component cx , based on the fitness values of the trails ti ∈ T in which the component appears until stopping criteria met return best 13 2. Introduction to Metaheuristics 14 3. Related Work This chapter presents brief descriptions of contributions related to the subject of sequence generation for GUI applications. Kasik and George: Toward Automatic Generation of Novice User Test Scripts. Kasik and George[KG96] strive to generate novice user sequences by employing Genetic Algorithms. Their implementation scans the GUI to determine the set of alternative actions so that they can generate arbitrary feasible input sequences. They reward sequences that stay on the same dialog, based on the observation that novice users learn the behaviour of a GUI’s functions through experimentation with different parameters within the same dialog. Their program takes existing sequences as input, into which the tester may insert a deviate command at the beginning or end or somewhere in between. The sequence then gets extended with new actions at the command’s index. The goal is to make the inserted subsequence look like it was created by a novice user. Their implementation is also able to generate sequences entirely from scratch. However, according to the authors this leads to quite random results, which do not resemble novice user sequences. Their implementation offers two possible modes: meander and pullback. Meander mode replays an existing sequence and turns control over to the Genetic Algorithm whenever it encounters a deviate command. It does not return to the remainder of the sequence that follows the deviate command. In pullback mode, the authors give reward for returning to the sequence’s tail. In order for the GUI scanning process to work, slight modifications need to be applied to the SUT’s source code. The implementation works for applications that employ Motif 1.2 on X11 to display their GUI. The authors do not mention which widget types they support. However, they say that the GUI is operated with keystrokes only, thus they seem not to consider mouse operations. Mutation and crossover operators are not explained in the paper, but the user may provide his own implementation for them. Since the type of used test applications is not mentioned, it is hard to tell how well their implementation will perform on real world subject applications. Huang et. al.: Repairing GUI Test Suites Using a Genetic Algorithm. Huang et al. [HCM10] use Genetic Algorithms to fix broken test suites. Their work consists of two steps: 1. Generating a test suite and 2. repairing the suite in case it contains infeasible sequences. They work with an approximate model of the GUI called Event Flow Graph (EFG). An EFG is a directed graph whose nodes are the actions that 15 3. Related Work File Open Save Help Contents About Paste Edit Cut Copy Figure 3.1.: An Event Flow Graph of a typical main menu. The nodes correspond to clicks on menu items. a user can perform (e.g. clicks on menu items, etc.). A transition between action x and action y means: y is available after the execution of x. Figure 3.1 shows an EFG for the main menu of a typical GUI application. By traversing the edges of this graph one can generate sequences offline. For example: When clicking on the menu entry “Help”, a drop down menu appears, which contains the “About” entry, which in turn can be clicked. In the first step they try to find a covering array to sample from the sequence space in order to generate their initial test suite. A covering array CA(N, t, k, v) is an N × k array (N sequences of length k over v symbols). In such an array all ttuples over the v symbols are contained within every N × t sub-array. So, instead of trying all permutations of actions (which are exponentially many), only the set of sequences that contains all t-length tuples in every position are used as the test suite. The parameter t determines the strength of the sampling process. Their array is constrained by the EFG, meaning that certain combinations of actions are not permitted. Since it is hard to find such a constrained covering array, they employ a special metaheuristic based on Simulated Annealing [GCD09]. This way they get their initial test suite which, due to the fact that the EFG is only an approximation of the GUI, contains infeasible input sequences. For example: In Figure 3.1 we could generate s = (Edit, P aste). However, since in most applications the paste menu entry is disabled or invisible until a copy operation has occurred, the execution of s is likely to fail. 16 In step two they identify and discard the infeasible sequences. By doing that they lose coverage with respect to the covering array. Hence, they use a Genetic Algorithm which utilizes the EFG to generate new sequences offline, which will then be executed and rewarded depending on how many of their actions are executable and on how much coverage they restore. Infeasible sequences are penalized with a static value. The authors employ the GUITAR 3 framework to execute their sequences. The EFG is generated automatically with the help of a GUI-Ripper [MBN03], but requires human verification, because it might be incomplete. The approach has been tested on small-sized synthetic subject applications, where the set of considered actions has been restricted to clicks on button controls. Andrews et al.: Testing Web Applications by Modeling with FSMs. Andrews et al. [AOA05] test web applications with the help of hierarchical finite state machines (FSMs). They model large web applications by building hierarchies of FSMs for their subsystems, which they annotate with input constraints to reduce the amount of possible inputs. They go on and derive sequences from the individual FSMs to combine these and form complete test sequences. They created a simple web application that they use as the SUT and generate test suites that satisfy node or edge coverage. The annotated FSM hierarchy has to be modeled by hand prior to using their framework. They provide a rough description on how this might be achieved in an automatic way, but generally leave this problem for future research. Artzi et al.: A Framework for Automated Testing of JavaScript Web Applications. Artzi et al. [ADJ+ 11] perform feedback-directed test generation for JavaScript web applications. Their objectives are to find test suites with high code coverage as well as sequences that exhibit programming errors, like invalid-html or runtime exceptions. They developed a framework called Artemis which is able to trigger sequences of events by calling the appropriate event handlers and supplying them with the necessary arguments. For the generation of the suites, they use prioritization functions to focus on event handlers with low coverage. Their framework requires access to the SUT’s source code, including any server-side components. In their experiments they used small-sized web applications. Marchetto and Tonella: Using search-based algorithms for Ajax event sequence generation during testing. Marchetto and Tonella [MT11] generate test suites for AJAX web applications using Hill-Climbing and Simulated Annealing. They execute the applications to obtain an approximate model in the form of a finite state machine. The states in this machine are instances of the application’s DOM-tree (Document Object Model) and the transitions are events (messages from the server, user input). From this FSM they can obtain the set of semantically interacting events. Two events 3 http://sourceforge.net/projects/guitar/ 17 3. Related Work e1 and e2 interact semantically, if states s0 , s1 , s2 exist, so that swapping the order of the events upon execution, brings the system to a different state, i.e.: s0 →(e1 ;e2 ) s1 and s0 →(e2 ;e1 ) s2 , where s1 6= s2 . Their goal is to generate test suites that consist of maximally diverse event interaction sequences, that is, sequences where each pair of consecutive events is semantically interacting. Hence, they define several fitness functions to describe the diversity of a test suite. They start with suites consisting of short (length-2) event interaction sequences and use Hill-Climbing or Simulated Annealing and their fitness functions to extend these sequences and thus the test suites. For the construction of their FSM they employ execution traces generated by humans, as well as static code analysis. Therefore, they need to instrument the source code of the test applications. Since the resulting FSM is not guaranteed to be complete or correct, it needs additional verification. They perform a case study with two medium-sized AJAX applications with injected faults. For the execution of their generated sequences they use the web testing tool Selenium 4 . In order to provide input for text boxes and the like, they use a database of input values generated from the input traces. 4 http://seleniumhq.org/ 18 4. The Approach This chapter presents the central idea of this thesis, the sequence generation process. It is the starting point for the following chapters, which discuss the individual steps of this process in detail. 4.1. Overview Figure 4.1 shows the process applied in this work in order to generate test sequences. It works as follows: 1) The SUT is started and 2) instrumented. This step is necessary to obtain the fitness value of the generated sequence later on. 3) In order to be able to generate actions, it is necessary to find the visible widgets and determine their properties. Without this information we would not know where to click, where to type text and so on. So the process gathers the widgets’ positions, their size and their state, i.e. whether they are enabled and focused, etc. 4) From this information one can derive a set of “reasonable” actions. For example: If there is a visible and enabled button, placed on a foreground window and not covered by any other control, then this button is clickable. And since the button’s coordinates are known, one could perform a click on its center. But of course there may be various other controls which could be clicked, right clicked or dragged. There might be a text control which is currently focused so that text may be typed into it, etc. This leads to an entire set of alternative actions which can be performed. 5) At this point the process needs to make a decision about the action to be executed. This is a very important step, because it is desirable to select promising actions which are likely to produce sequences with high fitness values. 6) After an action has been selected it is executed, that is, a click on a menu item is performed or text is typed or a scrollbar is dragged, etc. Every time this happens, the state of the GUI possibly changes, meaning that new controls appear, other controls disappear or change their position or other attributes. Hence, the process needs to go through steps 3) to 6) again, in order to execute the next action. This way it is possible to generate sequences of arbitrary length. Once a certain amount of actions has been executed, the process continues with step 7) where it determines the quality of the generated sequence with the help of the fitness function. 8) Now the SUT is stopped. Up to this point we have successfully generated an input sequence and obtained its fitness value. 9) This information can be exploited to learn about promising and less promising actions. For example: If the generated sequence obtained a high fitness value, then 19 4. The Approach 1) start SUT 9) 2) 3) 4) instrument SUT scan GUI derive actions no learn rate sequence stop SUT 8) yes 5) desired length? 7) select action execute action 6) Figure 4.1.: The sequence generation process. the actions within this sequence might be suitable for the generation of other highrated sequences. On the other hand, if the sequence obtained a low fitness value, then its actions might be less likely to produce good sequences. This information can be saved to make better decisions in step 5) in the future. Now the whole process is repeated, i.e. steps 1) to 9) are executed again. On each iteration the process learns more about the SUT’s actions. Ideally, the resulting sequences get better and better so that the best sequence will be found. 4.2. Description of the individual Steps 1) Preparing and Starting the SUT. In order for the sequence generation process to work properly, it is necessary to assure that the SUT is always in the same initial state when started. If this is not the case, then the execution of identical sequences might lead to different results, which would disturb the optimization process. Therefore, it is necessary to delete all temporary, setup or document files created during the last run. For example: The CTE saves settings files in its ./workspace directory, which contain information about the last edited file or the size and position of editor windows, etc. It will use this data to restore the settings from the last run. Furthermore it saves newly created files in the users home directory by default. The default name for new files is default.cte, but if this file already exists – for example because it was created during an earlier run – the name will be changed to something like default1.cte. This will have an effect on the caption property of tabs which contain newly created files and so forth. Step 1) takes care of deleting all of these files before starting the SUT’s executable. 20 4.2. Description of the individual Steps 2) Instrumentation. In this step the SUT is instrumented, which is necessary for the fitness function used in this work. This function is based on the dynamic CTS criterion [MM08], which considers internal activities within the SUT. Therefore, the SUT’s bytecode needs to be modified. Since this work targets Java applications, the instrumentation can be accomplished during runtime. It would also be possible to perform this step only once, before the start of the sequence generation process. Therefore, all of the SUT’s modules would need to be modified. However, Java is a dynamic language and the SUT might create classes during runtime or download them from the internet. Consequently, these classes would not be instrumented with this approach. Moreover, the runtime overhead associated with the dynamic instrumentation technique used in this work, turned out to be small. In case the applied fitness function does not require the SUT to be instrumented, this step is not necessary. Once the SUT has been initialized and the main window has popped up, the process continues with step 3). 3) Scan GUI. In this step all visible widgets and their properties need to be determined. The technical feasibility of this step depends on the GUI framework that the SUT is based on. The CTE uses the Standard Widget Toolkit which provides all necessary methods to access the used widgets. However, custom widgets – e.g. graphics – which are not maintained by any framework, might not be accessible and cannot be included in the testing process. 4) Derive Actions. The information gathered in the previous step is used to derive the set of alternative actions. Basically, it is always possible to perform clicks anywhere within the GUI of the SUT or to simulate arbitrary keystrokes, respectively. However, it is desirable to form a set of “reasonable” actions. For example: It is probably rather ineffective to click disabled menu or tool items and a click on the upper right of a button is quite likely equivalent to a center click on this button. The larger the set of alternative actions, the larger the search space (i.e. the amount of possible input sequences) and the harder the optimization problem. Hence, this set should be as small as possible. Ideally, it only contains the actions which expose the faults of the SUT. 5) Select Action. This is were the optimization algorithm comes into play. It considers the information learned from the fitness values of earlier sequences and selects a promising action. Usually, step 5) and 9) cooperate in order to accomplish this goal. This work makes use of the Ant Colony Optimization algorithm, though other algorithms are conceivable. 21 4. The Approach 6) Execute Action. The selected action is executed and saved as part of the generated sequence. In order to identify, save and replay an action, it must be given a unique name. Steps 3), 4), 5) and 6) are repeated until the desired sequence length is reached. 7) Rate Sequence. This step uses the fitness function to assign a fitness value to the sequence that has been generated. 8) Stop. In this step the SUT is terminated. In case the generated sequence caused the SUT to crash or hang, it may be saved to a special directory for “suspicious” sequences. The implementation presented in this work is also able to parse the standard output and standard error streams of the SUT. In case these streams contain “abnormal” output – where the definition of abnormal needs to be specified by the tester – the sequence is also considered suspicious. These facilities provide a coarsegrained test oracle. 9) Learn. This step is optional and its application depends on the optimization algorithm used during sequence generation. The dotted lines indicate that the step may be performed after each iteration. However, it is also possible to first generate several sequences before performing this step. This is what the proposed ACO algorithm, which will be introduced in the next chapter, does. As stated above, steps 5) and 9) work together to improve the generated sequences over time. Step 9) usually takes the rated sequences and performs a learn operation to make better action selections in the future. Steps 1) through 9) are repeated until a certain termination criterion is met. At the end of the process the best sequence found will be returned. 4.3. Applying the Approach The presented approach may be applied to different SUTs, independent of the operating system, the programming language that the SUT was developed with and the GUI framework that it is based on. In order to apply the process to the SUT used in this work, i.e. the Classification Tree Editor, it is necessary to specify certain parts more precisely, specifically 1. the optimization algorithm, which corresponds to steps 5) and 9), 2. the fitness function, which corresponds to step 7) 5 5 and step 2), for its implementation, which will be presented in chapter 8. 22 4.3. Applying the Approach 3. and the steps 3), 4) and 6), used to operate the GUI of the SUT. The following three chapters will elaborate these steps. The resulting implementation will be presented in chapter 8 and will target Java applications based on the Standard Widget Toolkit, running on Microsoft Windows XP. 23 4. The Approach 24 5. Sequence Generation with ACO This chapter explains and motivates the application of the ACO algorithm in the context of the sequence generation process. 5.1. The Concept When used in the context of sequence generation, a possible application of ACO could look as follows: The ACO component set C corresponds to the set of all actions that can be performed on the SUT. The ACO trails, in turn, correspond to the input sequences of the SUT. The idea is to assign a pheromone value to each action and prefer actions with a high pheromone value during the trail construction. Figure 5.1 highlights the steps of the sequence generation process where ACO is applied. Step 5) relates to the Selection Rule mentioned in chapter 2. In step 9) the Pheromone Update Rule is applied. The optimization process starts with equal pheromone values for each action, hence, at the beginning it effectively produces random sequences. Over time, certain actions obtain higher pheromone values than others, so that the optimization focuses on certain areas within the search space, hopefully the ones that contain the best sequences. Representation Let A be the set of all actions that are executable on the SUT. The ACO component set is C = A. A trail is a tuple t = (a1 , a2 , ..., an ) ∈ An which corresponds to a length-n input sequence. Selection Rule (Step 5). There are many selection rules available for the ACO metaheuristic [DB05]. A common strategy is the proportionate random selection rule, where the components ci are selected at random, but proportionate to their pheromone pi . Essentially, this strategy samples from a univariate probability distribution over the available actions. Another option is to always pick the action with the highest pi , which is an extremely exploitative strategy. This work adopts a policy called pseudo-random proportionate selection [DB05], which is a combination of the two abovementioned strategies. It selects with probability ρ the best action available and with probability 1 − ρ it performs a random proportionate selection. 25 5. Sequence Generation with ACO 1) start SUT 9) 2) 3) 4) instrument SUT scan GUI derive actions no learn rate sequence stop SUT 8) 7) yes desired length? 5) select action execute action 6) Figure 5.1.: The sequence generation process. Steps 5) and 9) correspond to the ACO Selection Rule and the Pheromone Update Rule, respectively. Pheromone Update Rule (Step 9). The update of the components’ pheromone values corresponds to step 9) of the sequence generation process. The works of Dorigo et al. [DB05, DCG99] provide a detailed overview and descriptions of the various update rules. In this work this step is processed after all trails within a population have been constructed, that is after each generation. The dotted lines in Figure 5.1 between steps 8), 9) and 1) indicate, that step 9) and thus the pheromone update, is not performed after each sequence construction, but after an entire generation of sequences. The components’ pheromones are updated with the fitness values of the trails that they appear in. The rule applied in this work makes use of a learning rate α and is listed in Algorithm 4. The algorithm calculates the average fitness score xrii of a component ci within the current population. Then the corresponding pheromone pi is updated as pointed out in line 11. Pheromones of components that do not appear in any trail, are not updated. The higher the learning rate α, the more the pheromone values are influenced by the results of the current population. Instead of using all generated sequences in the population for the pheromone update, we only select the k best-rated ones, as proposed by Dorigo et al. [DB05]. The parameter k may be used to adjust the behavior of the optimization algorithm. Termination Criterion The current termination criterion is simply a limit on the number of generations. 26 5.2. Motivation Algorithm 4: Pheromone update rule with learning rate [Luk09]. 1 2 3 4 5 6 7 8 9 10 11 12 Input: C ← {c1 , c2 , · · · , cn } /* components. Input: p~ ← hp1 , p2 , · · · , pn i /* pheromone values. Input: T ← {t1 , t2 , · · · , tpopsize } /* current population. Input: α /* learning rate Output: updated p~ begin ~r ← hr1 , r2 , · · · , rn i /* total component scores, initially 0. ~x ← hx1 , x2 , · · · , xn i /* component counts, initially 0. for each tj ∈ T do for each ci ∈ C do if ci was used in tj then ri ← ri + f itness(tj ) xi ← xi + 1 for each pi ∈ p~ do if xi > 0 then pi ← (1 − α) · pi + α · xrii return p~ */ */ */ */ */ */ 5.2. Motivation Many different metaheuristic algorithms have been proposed, among which are Simulated Annealing, Tabu-Search, Genetic Algorithms and Evolution Strategies, to name a few. These techniques have shown to be successful in finding good solutions to hard problems [McM04]. However, each algorithm is well-suited for particular problem classes, but may be ineffective for others which is what the No-Free-LunchTheorem [WM97] states. The application of ACO is now motivated for the particular problem of generating test sequences to GUIs. Let us define the problem more formally: We want to generate length-n sequences s = (a1 , a2 , . . . , an ) ∈ An where A denotes the set of all actions that are executable on the SUT. In this case the search space is An and the s ∈ An are the candidate solutions. Some actions are only available in certain states of the SUT, hence, not all permutations of actions (input sequences) are executable. Therefore, we distinguish feasible and infeasible sequences. Let us denote the set of feasible sequences by Sf easible . Then we have that Sf easible ⊆ An . Our goal is to find a sequence s∗ with s∗ = (a1 , a2 , · · · , an ) ∈ Sf easible = arg max f itness(s) s∈Sf easible As we will see later on, this work employs a fitness function which needs to execute a sequence in order to obtain its fitness value, thus the sequence must be feasible. 27 5. Sequence Generation with ACO For infeasible sequences we could assign a penalty value, which is what Huang et al. [HCM10] do. However, this leads to the following problem: Let s1 = (a, b, c, d) ∈ Sf easible and s2 = (z, b, c, d) ∈ An \Sf easible . The two sequences lie close to each other in the search space, because they are almost identical. But unfortunately we have that f itness(s2 ) f itness(s1 ), because s2 is infeasible and is assigned a poor fitness value. So in this case the fitness function does not expose the smoothness criterion mentioned earlier. Of course, it is not always necessary or even possible for a fitness function to expose this criterion, but this might complicate the optimization process. This is a common problem in search spaces where the set of solutions is subject to hard constraints [Luk09]. Due to the mentioned problems, it is quite desirable to avoid infeasible sequences and only adopt Sf easible as the search space. The algorithms mentioned at the beginning of this section, usually employ a mutation operator as presented in the introduction. A possible implementation of this operator for input sequences could for example substitute one or more actions to obtain a slightly different solution. However, this introduces the following problem: Since we do not possess an exact model of the GUI, we do not know which actions can be substituted for others to obtain a different, yet feasible sequence. For example: If we substituted the action “clickMenu(“Print”)” in Figure 1.1 for a different action, then the remainder of the sequence would become infeasible, since the print dialog would never open. In the presence of constraints, the mutation operator is usually destructive and difficult to implement [Luk09]. Ant Colony Optimization avoids these problems, because it constructs its solutions step by step. It adopts Sf easible as its search space, which means that it generates valid sequences only. 5.3. Adjusting the Metaheuristic Optimization There is always a tradeoff between exploration and exploitation. The more explorative the algorithm is, the more it resembles a Random Search algorithm. The more exploitative it is, the more it resembles a Hill-Climbing approach. The presented algorithm offers a set of parameters that can be used to tune the search for good input sequences. Since we do not know what the structure of the search space (i.e. the sequence space) looks like, we do not know the ideal parameter values. Thus we have to experiment with different parameter combinations. The following list gives a short overview of these parameters and the effects they can have. Default Pheromone Values. It is difficult to predict the impact of this parameter on the optimization process. If the initial pheromone values are high, then the algorithm probably tends to be more explorative, especially at the beginning of the optimization process [DS09]. This is due to the fact that, even if the algorithm finds very good actions and increases their pheromones, other actions 28 5.3. Adjusting the Metaheuristic Optimization will still have quite large values and hence are still likely to be selected. This also means that the algorithm does not converge as fast. ρ. This parameter affects the pseudo-proportionate random selection rule. The higher its value, the more likely the algorithm will pick the action with the highest pheromone value. This leads to a rather exploitative and local search strategy [DS09]. α. This is the learning rate which determines how much the algorithm learns from the solutions generated in the current generation, as it is described in line 11 of Algorithm 4. The higher its value, the more the algorithm tends to “forget” about earlier generations and instead adjusts the pheromone values according to the current one. Setting α = 1.0 will cause the algorithm to update the pheromones using the fitness values of the current generation only. The lower the value, the more time the algorithm will take to converge. This can be more explorative. Setting α = 0 will result in a Random Search algorithm6 . k. This is the parameter that determines which sequences are used for the pheromone update. If k > 0 then only the top k sequences of the current population will be used. If k = 0 then all sequences will be used. Higher values might result in a more exploitative behavior [DS09]. 6 Provided that the initial pheromone values are all equal. 29 5. Sequence Generation with ACO 30 6. The Fitness Function This chapter presents the Call Tree Size criterion for input sequences and defines and motivates the fitness function used in this work. This function corresponds to step 7) of the sequence generation process as highlighted in Figure 6.1. 6.1. Definition This work adopts the Call Tree Size (CTS) metric [MM08]. The goal is to find sequences, which generate a large call tree upon execution on the SUT. A call tree is a structure that displays calling relationships among methods of an executed program. Each node represents a method. A directed edge between two nodes f and g means, that the method represented by f called the method represented by g in the context f . For example: Figure 6.2 shows a simple Java program which takes a list of numbers and outputs their mean as well as their sample variance. The calc()method only calculates the mean and variance if more than one argument is given (line 13), otherwise the mean is set to the value of the first parameter and the variance is set to 0 (line 16). Hence, different inputs cause different sets of methods to be called. This is reflected in Figure 6.3 where the two invocations of the program result in two distinct call trees. In the second scenario the mean()-method is called two times, namely in the contexts of calc and var. The goal is to generate call trees of large size, more specifically call trees with many leaves. Since the second tree has more leaves than the first, input b) is preferred over input a). The call trees in Figure 6.3 are only simplified versions of the original ones, because they lack some of the nodes that are generated by the Java library. Depending on the implementations of parseDouble(), println() and pow(), additional nodes could be involved. Aside from these, the original tree would also contain static initializers, class loader methods, Virtual Machine initialization, shutdown and garbage collector methods. Many of these are executed within distinct threads, which causes even a simple program like the one in Figure 6.2 to be multithreaded. So the shown call trees are actually only thread call trees of the main thread. To obtain the full call tree the trees of the different threads will be merged into a single program call tree. Therefore, an artificial root node, which connects all these trees, is introduced. 31 6. The Fitness Function 1) start SUT 9) 2) 3) 4) instrument SUT scan GUI derive actions no learn rate sequence stop SUT 8) yes 5) desired length? 7) select action execute action 6) Figure 6.1.: The fitness function corresponds to step 7) of the sequence generation process. Figure 6.4 depicts the merging process 7 . If two nodes in distinct trees have the same predecessors, they will be merged. Of course the discussed example is not a GUI application, but for GUIs the idea is similar: Depending on the type of the actions executed on the SUT, different methods in different contexts are invoked. For example: A sequence that prints out the currently opened document, will address different functionality than a sequence that navigates the settings dialog. So distinct sequences will most likely result in distinct program call trees of distinct size. Even the order of actions in the sequence can have an effect on the resulting call tree. A first attempt to define the fitness function could look as follows: f itness : Sf easible → N f itness(s) := Number of leaves of the program call tree generated by s where s ∈ Sf easible is a feasible length-n-sequence as defined in chapter 5. In each iteration of the sequence generation process the SUT is started and performs a lot of initialization work until the GUI is fully initialized and ready for use. So before the first action is executed, the call tree has already reached a significant size. In this work we are only interested in the leaves that are generated during the execution of an input sequence. Thus it is necessary to alter the abovementioned definition. To do this we determine the value CT Sstart , i.e. the number of leaves right before 7 This figure is only for comprehension. The implementation presented in chapter 8, does not record separate thread call trees, but generates the program call tree all at once. 32 6.1. Definition 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 public c l a s s S t a t { double [ ] data ; public s t a t i c void main ( S t r i n g [ ] a r g s ) { S t a t s = new S t a t ( a r g s ) ; s . calc (); } public S t a t ( S t r i n g [ ] a r g s ) { data = new double [ a r g s . l e n g t h ] ; f o r ( int i = 0 ; i < a r g s . l e n g t h ; i ++) data [ i ] = Double . p a r s e D o u b l e ( a r g s [ i ] ) ; } public void c a l c ( ) { i f ( data . l e n g t h > 1 ) { System . out . p r i n t l n ( mean ( ) + " , " + var ( ) ) ; } else { System . out . p r i n t l n ( data [ 0 ] + " , " + 0 . 0 ) ; } } double mean ( ) { double r e t = 0 . 0 ; f o r ( double d : data ) r e t += d ; return r e t / data . l e n g t h ; } double var ( ) { double r e t = 0 . 0 ; f o r ( double d : data ) r e t += Math . pow ( d − mean ( ) , 2 ) ; return r e t / data . l e n g t h ; } } Figure 6.2.: Java program which takes a list of numbers and outputs their mean and sample variance. 33 6. The Fitness Function a) java Stat 1.0 b) java Stat 1.0 2.0 3.0 main main calc Stat() calc println parse Double println . . . . . . . . . mean mean Stat() var parse Double . . . pow . . . Figure 6.3.: Simplified call trees for two different inputs to the program in Figure 6.2 (the dots indicate additional Java library methods, which have been omitted). root Thread 1 Thread 2 Thread 3 run1() run1() run2() f1() f2() f2() f3() + f1() f2() + f3() f5() run1() run2() f3() f1() f2() f3() f4() f2() f3() f4() f6() f5() f6() Figure 6.4.: Merging multiple thread call trees into a single program call tree. 34 6.2. Motivation the execution of the first action. This can be done between steps 2) and 3) of the sequence generation process. After the execution of the final action the number of leaves is counted again in order to obtain CT Send . The fitness value of a sequence s ∈ Sf easible is thus f itness(s) := CT Send − CT Sstart . 6.2. Motivation There are many criteria which can be used to define the quality of test sequences. Common ones are code coverage criteria, like instruction or branch coverage. Other examples are event handler coverage or criteria defined on models of the GUI. For Event Flow Graphs or Finite State Machines one could count the amount of covered nodes or edges [MSP01, Str06]. The main reason for adopting the CTS metric is an experiment conducted by McMaster and Memon [MM08]. Their goal was to minimize existing GUI test suites. Since a large suite takes a lot of time to execute, which might not always be available – for example in regression test scenarios – it is desirable to reduce its size. Ideally, the reduced suite should be as effective in revealing faults as the original one. The authors started off with a given test suite for a program. They executed each test sequence to obtain the program call trees, which they merged into a single large suite call tree to obtain the CTS metric for the entire suite.8 Then they went on to remove those test cases which did not cause the tree to shrink significantly. This means that they kept only the sequences which contributed the majority of the leaves. After this process, they calculated the error detection rate of the original and reduced version and found that the latter still revealed most of the known faults. They tested this approach on four distinct Java applications, three of which had GUIs, and found that this technique let to a measurable size reduction, while maintaining a high fault detection rate [MM08]. McMaster and Memon argue that the effectiveness of the CTS criterion stems from its ability to capture context information. A large part of a GUI’s functionality is usually implemented with the help of event handlers. Since GUIs often allow for many different ways to access this functionality, these handlers may be triggered in a variety of different contexts, each time possibly exhibiting a slightly different behaviour. For example: Considering again the second call tree in Figure 6.3, we can see that the method mean() is called in two different contexts. Calling a method in different contexts can lead to different behaviour, due to different arguments or internal state9 . So a broader call tree, i.e. one with a large amount of leaves, generally 8 Instead of using the notion of call trees the authors counted the number of distinct Maximum Call Stacks (MCS). An MCS is a path through a call tree, starting from the root node and ending at one of its leaves. So the number of MCSs generated by a sequence is equivalent to the amount of leaves of its call tree. 9 In the given example, however, mean() always behaves the same way. 35 6. The Fitness Function tends to have more contexts and hence tests more aspects of the SUT than a call tree with fewer leaves. Of course this is only an assumption, but their experiment shows that CTS can be suitable for test suite reduction [MM08]. In additon to these results the authors argue that the metric is particularly well suited for GUI tests, due to the following reasons: Libraries. Modern GUI applications usually comprise one or more third-party modules, like for example a GUI framework. The call tree can be collected for the entire application. So instead of only testing first-party source code, the whole system is considered during the test. This is interesting due to two reasons: 1) Third-party modules could contain faults, too. 2): Errors might result from a wrong or unintended usage of these modules. Efficiency. It is possible to obtain a call tree for an application without introducing excessive overhead. This remains valid even in multithreaded environments, which is important, because many modern GUI applications are multithreaded, due to the fact that they need to compute results and react to user inputs at the same time. Ease of Implementation. To collect the call tree of an application only requires method entry and exit hooks. Those hooks often exist on most compilers or runtime environments to allow for the application of profilers. This means that this technique is not only restricted to Java applications. In addition, the source code of the SUT is not required. Due to these reasons and results the CTS metric is adopted for this work. 36 7. Operating the SUT This chapter addresses the problem of generating and executing actions on the SUT, that is, it elaborates steps 3), 4) and 6) highlighted in Figure 7.1. Figure 7.2 shows a screenshot of the Classification Tree Editor 10 (CTE). The CTE is the SUT for this work and is used to assess the performance of the resulting framework in the experiment chapter. It is a Java application with a GUI based on the Standard Widget Toolkit 11 (SWT). It is a graphical editor designed for building and modifying classification trees and offers functionality to derive test cases from them. Its interface comprises classical controls like buttons, menu items, scrollbars as well as custom ones to display the classification trees. Although this thesis concentrates on the CTE as the main test application, the framework developed in this work can be applied to other Java SWT applications as well (for example Eclipse 12 ). The CTE’s target platform is Microsoft Windows XP and hence this work focuses on this system, too. However, the functionality presented in this chapter is not limited to a particular platform, programming language or toolkit. 7.1. Scanning the GUI Before we can actually issue clicks, type in text, drag scrollbars, etc., we need to determine the current state of the GUI, more precisely, we have to find the visible widgets and their properties. For example: In order to click on a button control, we first need to find the corresponding SWT Button object. After that we can go on to determine its screen coordinates and dimensions, with which we could already issue a click, for example to the center of the button. However, sometimes it is necessary to check additional properties of a control before executing an action. So one could check whether it is enabled, because clicking on a disabled button will most likely not have any effect. Of course complex GUIs consist of many controls and since we want to consider all possible actions, we need to find all these controls and determine their properties. To capture the state of the entire GUI, this work employs a structure called widget tree. In a widget tree each node corresponds to a control element and its properties (which makes it somewhat similar to a DOM tree of an HTML document). Figure 7.3 shows a simple SWT application and the 10 http://www.berner-mattner.com/en/berner-mattner-home/products/cte/ index-cte-ueberblick.html 11 http://www.eclipse.org/swt/ 12 http://www.eclipse.org 37 7. Operating the SUT 1) start SUT 9) 2) 3) 4) instrument SUT scan GUI derive actions no learn rate sequence stop SUT 8) yes 5) desired length? 7) select action execute action 6) Figure 7.1.: Steps addressed in this chapter. corresponding widget tree. The root element is an object called Display, which is the main object of an SWT application. This object is not actually visible, but it provides access to important system information, like the screen resolution, the current control under the cursor, etc. All other controls of the application are its direct or indirect children. In SWT a window is called a Shell. Figure B.3 in the appendix shows additional SWT widgets and their names. In the example we can see that the widget tree exhibits the GUI’s hierarchical nature and that each node belongs to one of the widgets. The grey boxes indicate that each node also contains the property values of the control it represents. For example: The button has a caption property and a bounding rectangle, the text element has a text-property, to name only a few. All of these properties can be accessed with the public methods provided by the SWT objects 13 . Since the state of the GUI changes throughout the execution of an input sequence, it is necessary to create a new widget tree after each performed action. For example: Clicking on the “File” menu item of the example program causes a drop down menu to appear. Figure 7.4 shows how the widget tree changes and now contains nodes for the various menu items within this drop down menu. In additon, the property values of control elements may also change during execution: Dragging the thumb of the slider control will change its bounding rectangle, etc. Of course the program in Figure 7.3 is very simple and merely contains standard widgets offered by the operating system’s window manager. For a more complex application like the CTE, the widget tree is usually much larger. For example: The widget tree for the screenshot in Figure 7.2 can be found in the appendix in 13 http://www.eclipse.org/swt/javadoc.php 38 7.1. Scanning the GUI Drawing Area Test Case Table Figure 7.2.: Drawing area and test case table of the CTE. Figure B.2. At the bottom we can see that it also contains the figures displayed in the drawing area, which are custom widgets (framed green in the screenshot). These figures are no SWT widgets, but are rendered to a SWT DrawingCanvas object by employing the Draw2d 14 framework. It is possible to access each figure and its properties via the DrawingCanvas object, because Draw2d cooperates with SWT. But not all objects that humans perceive as widgets, are also prgrammatically accessible. For example the test case table under the drawing area (framed red), is a custom coded “control”. Since there is no framework which provides easy access to the individual parts of the table, it is not possible to obtain information about it and consequently it will be hard to include it in the testing process. The implementation presented in this work only recognizes widgets maintained either by SWT or Draw2d. But of course it would be possible to support additional frameworks, like Swing, the WinAPI, GNOME, KDE or MacOSX’s Cocoa framework, to name a few. However, a widget can only be inspected if it is maintained by a framework that provides programmatic access to it. If this is not the case, it will be difficult to find this widget and determine its properties. 14 http://www.eclipse.org/gef/draw2d/index.php 39 7. Operating the SUT Display Shell Button Scale Text MI caption: "Button" enabled: true visible: true hasFocus: true rect: [180,100,260,130] ... Menu MI MI MI caption: "File" enabled: true ... Figure 7.3.: Simple SWT application and the corresponding widget tree. Display Shell Button Text Scale MI caption: "New" enabled: true ... Menu MI MI MI MI MI MI Menu MI MI MI MI MI Figure 7.4.: Changed widget tree after a click on the “File” menu item. 40 MI 7.2. Deriving Actions 7.2. Deriving Actions This section deals with step 4) of the sequence generation process. Whenever a human looks at the graphical interface of the application he is working with, his intuition tells him which actions are reasonable and which are not. He knows that scrollbars need to be dragged, that buttons and menu items need to be clicked and that it is possible to type text into a text box once it has the focus. Figure 7.5 shows examples of actions that could be performed on the CTE. Figure 7.6 depicts another scenario, where a modal dialog blocks input to the underlying windows and the amount of “reasonable” actions is drastically reduced. Again, the user employs his intuition when considering actions. Of course he cannot know with a 100% certainty what will happen if he clicks a disabled button or random locations within a static label, because he does not know the application’s implementation. But his experience with other applications tells him that most likely nothing will happen. This work takes the same heuristic approach when considering the set of alternative actions. Table 7.1 lists various SWT widgets and possible action types and the necessary preconditions. The corresponding widgets can be found in the appendix in Figures B.3 and B.4. The first row lists general preconditions for actions. In order to click on a widget, or type in text, or drag, the widget itself and all its ascendents in the widget tree, need to be visible and enabled. widget type all # / action type all description / Button (includes check and radio buttons) 1 2 execute context menu e.g. center click e.g. right click Text, StyledText 3 set focus activate the text control, so that it can receive keyboard input preconditions the widget itself must be visible and all parents in the widget tree enabled; for actions that generate clicks, the widget must not be covered by others; no modal window exists, which blocks input to the window containing the control a menu is associated with the control 41 7. Operating the SUT widget type # 4 action type type character Text, StyledText 5 6 type capital letter, number or special sign type word 7 mark all 8 delete 9 move 10 set focus 11 toggle size 12 drag 13 14 close drag 15 mark exclusive Shell Slider, Scale, Scrollbar TreeItem 42 description each character is a separate action “ ” preconditions has focus type an entire word, different words correspond to different actions mark the entire text delete the character after the caret or the marked text move the caret up, down, left or right make the window active maximize or minimize the window drag the window to another position, each position corresponds to a different action, e.g. upper left, upper right, lower left, ... close the window drag the thumb to a position, distinct positions result in distinct actions, e.g. beginning, center, end mark the item as the currently active item within the Tree has focus has focus has focus has focus has focus has focus; has title area has focus; has title area has focus 7.2. Deriving Actions widget type # 16 action type mark nonexclusive 17 open context menu description if other items are marked already, then add this one too (e.g. Ctrl + Click) open the context menu 18 expand expand the item 19 execute 20 21 execute open dropdown menu 22 23 24 increase decrease mark exclusive 25 mark nonexclusive 26 open context menu execute the item, e.g. double click e.g. click certain tool items have a style that includes an additional box, which opens a drop down menu when clicked. increase the value decrease the value mark the item as the currently active item within the Table if other items are marked already, then add this one too (e.g. Ctrl + Click) open the context menu 27 execute 28 29 30 activate activate toggle size 31 close TreeItem ToolItem Spinner TableItem, ListItem TabItem CTabItem execute the item, e.g. double click activate the tab activate the tab maximize or minimize the tab close the tab preconditions has focus; a menu is associated with the Tree has focus; item has children has focus has drop down style has focus; menu associated with the List / Table has focus 43 7. Operating the SUT widget type # 32 action type set focus Combo 33 35 36 37 38 open dropdown type characters, words, delete, mark etc. item up item down execute mark 39 context menu 40 41 execute drag 42 43 delete execute 34 Link Figure (Draw2d) MenuItem description activate the text area open the dropdown list see Text control go one item up pick the next item e.g. click the link make it the active figure open the context menu e.g. double click drag the figure to a location within the DrawingCanvas, each position corresponds to a distinct action delete the figure e.g. click preconditions has focus; style allows modification to the text box the DrawingCanvas which contains the figure is associated with a menu has focus item is not a separator Table 7.1.: Action types. Size of Search Space versus Test Granularity At this point, the following question is interesting: Which are the most effective action types in terms of fault-revealing capabilities? Each added action type increases the size of the search space and thus makes the optimization task more difficult. If we search for length-10-sequences, assuming only 5 alternative actions at each step, the search space will already consist of roughly 10 million sequences. Given the fact that sequence evaluation is quite slow, the set of action types to be applied, should be selected carefully. For example: Looking at types 4 and 5, one could question whether it makes sense to generate a distinct action for each letter that can be typed into a text field. Assuming the english alphabet this would already generate 26 actions for a single text field and much more if capital letters, numbers and special signs are allowed. On the one hand this gives 44 7.2. Deriving Actions Figure 7.5.: Possible actions that can be performed on the CTE (green circles: left clicks, yellow triangles: right clicks, blue arrows: drag and drop operations, green stars: double clicks). These are not all possible actions, but only a selection, to preserve clarity. Figure 7.6.: A modal window blocks input to the underlying windows. (green circles: left clicks, violet circles text input). 45 7. Operating the SUT Figure 7.7.: How many drag positions should be allowed? our tests more granularity – because every possible word could be typed – but on the other hand it drastically increases the search space. Thus it would probably make more sense to pick only a subset of the possible characters or allow only the input of a finite set of test words. Action types 14 and 16 introduce a similar problem: How many positions should be allowed for drag operations (Figure 7.7)? For example: We could allow to scroll a scrollbar only to either the first or the last position. This might be sufficient for lists that contain only a small amount of items. Yet in large lists, with thousands of items, we would not be able to access the majority of the contents. But if we allow many positions the search space becomes much larger. Another example: Dragging a window (see Figure 7.6) might not be a fault-sensitive action. But given the fact that by moving the window we could potentially uncover other widgets and thus enable us to perform actions that have not been available before. However, due to the lack of empirical data it is quite difficult to determine a reasonable set of action types to employ. This set is most likely quite specific to the employed SUT. For example: In the CTE it is possible to mark the figures in the drawing area. Once they are marked, they can be deleted by pressing the delete key. We can also change their labels by typing in text. This behaviour is specific to the CTE and other SUTs might provide different functionality. Thus there is no generic set of “reasonable” actions that works perfect for all SUTs. In order to find a good compromise between search space size and thorough testing, it might be necessary to specify the set of possible action types prior to the optimization run. The approach taken in this work is as follows: All action types described in Table 7.1 are used, except for types 6, 8 and 9. For action types 4 and 5 we allow for the characters “x, Y, $, 0, 9”. As for types 12 and 41 we allow three positions – upper left, upper right, lower center. Finally, for type 14, three positions are allowed: start, center and end. 46 7.3. Executing Actions 7.3. Executing Actions As seen in Table 7.1, actions are quite abstract entities. It is important not to confuse them with the inputs that they generate. For example: Figure 7.3 shows a button with the caption “Button”. In this particular situation, the action “execute button “Button” ” and the input “left click x=220 y=115” might coincide. But if the window is moved to another location or the screen’s resolution is changed, then the input will target another control. Actions are much more robust to those kind of changes than inputs. This is important, because having recorded a sequence of actions, we want to be able to replay it, even if the windows and controls are not exactly at the same positions that they used to be at when the sequence was recorded. The action table only describes what the actions do, not how this is done. In the approach presented in this work, each executed action generates a mouse or keyboard input, or both. For example: Action type 1 is implemented by moving the cursor to the center of the button’s bounding rectangle, simulating a left mouse button press input and eventually simulating a left mouse button release input. It would, however, be possible to implement this action in a different way: If the button has the focus, then one could simply simulate a keystroke of the enter key. Another example: To mark all characters in a text box we simulate “Control + A”. But it would also be possible to move the caret to the beginning of the text, using the arrow keys, then holding down the Shift key, moving the caret to the end and finally releasing the shift key. This example shows that a single action might generate a complex series of inputs. 7.3.1. Simulating Input versus Invoking Event Handlers As shown above, the actions generate input that is very similar to the input that a human tester would generate when using the GUI. This thesis makes the assumption that this is important and necessary, because it is the most realistic way of operating the GUI. However, there is a different approach. The SWTBot 15 scripting framework for example, does not simulate keystrokes or mouse input, but directly invokes the event handlers of the target controls. The idea behind this is as follows: Clicking on a button causes a list of event handlers to be fired inside of the SUT, e.g. MouseMoveEvent(...), MouseDownEvent(), MouseUpEvent(), MouseClickEvent(), ... The particular order is implementation-specific. So, instead of actually performing a click, the SWTBot framework invokes the event handlers of the control in the order they would be fired when performing a “real” click. This approach can be advantageous in the case where it is not possible to calculate the bounding rectangle for the button. For example: In SWT it is not possible to obtain the rectangle of the upper right close button of a window. So instead of clicking that button, SWTBot would simply 15 http://www.eclipse.org/swtbot/ 47 7. Operating the SUT invoke the Close() Handler of the window16 . Yet this approach has a few important downsides: Inconsistencies. The invocation of event handlers might corrupt the SUT’s state. If the event handlers are not invoked in the exact same order they would be invoked upon “real” input, this might lead to “artificial faults”. For example: If one performs a drag and drop operation, then the mouse cursor is moved over many controls, potentially causing many other handlers to be fired until it finally reaches the drop target. It is very difficult to calculate all necessary handlers and their correct invocation order. Limitations. Each state change in the GUI causes certain event handlers to be fired. But the opposite does not hold. For example: Clicking on the “File” menu item, causes the corresponding menu to be opened and fires the Arm() handler. But invoking the Arm() handler does not cause the menu to be opened. In fact this is one of SWTBot’s major problems: It is not possible to work with dynamic menus. Native Languages. Java’s reflection mechanism and other features, like Java agents, greatly simplify the access to internal methods such as event handlers and thus make the event handler method attractive. SUTs developed with native languages, such as C/C++, often do not provide similar facilities. This makes it difficult to apply this approach to those SUTs. 7.3.2. Naming Scheme Since we want to be able to replay generated sequences and since the search algorithm needs to assign pheromone values to particular actions in order to prioritize good ones, we need a reliable way to identify actions. As already mentioned, actions should not be identified by the raw inputs that they produce. Thus we cannot make use of coordinate values. Let us consider again Figure 7.4 and assume that a click on the “File” item followed by a click on the “Save” item has been performed. One way to describe this sequence could look as follows: 1 . Execute ! D i s p l a y . S h e l l ( ’SWT Widget Test ’ ) . Menu . MenuItem ( ’ F i l e ’ ) 2 . Execute ! D i s p l a y . S h e l l ( ’SWT Widget Test ’ ) . Menu . MenuItem ( ’ F i l e ’ ) . Menu . MenuItem ( ’ Save ’ ) This is another situation where the widget tree comes in handy. We can see here that it is possible to identify a widget by its access path. Display is the SWT main object and the root of the widget tree, the main window Shell is its child and so on. Because larger applications usually have several windows and because there are multiple MenuItems, we have to disambiguate the children nodes. One way of doing this is by augmenting the descriptor with property values, in this case the caption 16 Of course one can still simulate “Alt + F4”! 48 7.3. Executing Actions Figure 7.8.: The window caption property in the CTE cannot be used to disambiguate windows, because it contains the name of the currently open file and thus changes throughout the run. Display Shell Button Text Scale MI Menu MI MI MI MI MI MI Menu MI MI MI MI MI MI Figure 7.9.: A widget’s access path. of the window or the name of the menu item. This works well in those cases, where the properties used for disambiguation do not change their values. If, however, the caption of the main form is dynamic, because it contains the current time or the name of the currently opened file, then the actions are not recognized properly. Since the latter is the case for the CTE (see Figure 7.8) this work employs a different naming scheme. According to this scheme the above actions can be described as follows: 1 . Execute ! D i s p l a y [ 0 ] . S h e l l [ 0 ] . Menu [ 0 ] . MenuItem [ 0 ] 2 . Execute ! D i s p l a y [ 0 ] . S h e l l [ 0 ] . Menu [ 0 ] . MenuItem [ 0 ] . Menu [ 0 ] . MenuItem [ 2 ] Throughout the tests with the CTE, it turned out, that the order of creation of the widgets is relatively stable. This means that the order that the widgets appear in the widget tree is stable, too. So instead of using properties like the caption or the item’s names, we can simply employ the child index to disambiguate a widget’s children (see Figure 7.9). However, this might not be the case with other applications, so the decision on which naming scheme to use, has to be made on a per case basis. 49 7. Operating the SUT 50 8. Implementation This chapter presents the framework which has been developed throughout this work and explains the implementation of the fitness function and of the techniques used to operate the SUT’s GUI. 8.1. The Framework Figure 8.1 provides an overview of the framework’s structure. It consists of two main components. The first one is the Starter which implements steps 1), 5), 8) and 9) of the sequence generation process. It contains the implementation of the ACO algorithm and has the ability to record and replay sequences. The Starter executes and terminates the SUT with the attached Agent, which is the second component. The Agent implements steps 2), 3), 4), 6) and 7) of the sequence generation process. It runs within the SUT’s Virtual Machine, performs the bytecode instrumentation to obtain the call tree and thus the fitness values and scans and operates the GUI. Both components communicate via a TCP/IP connection. Figure 8.2 shows how they collaborate in order to generate sequences. The Starter requests the set of alternative actions from the Agent, selects an action and instructs the Agent to execute it. These steps are repeated until a sequence has been generated. Eventually, the Starter requests its fitness value. The Starter is the component that the tester works with. When the framework is started the Starter’s terminal window appears (see Figure 8.3). This window provides information about the course of the optimization process, e.g. the number of the current generation, the rating of the best sequence found and the ratings of the past few sequences. Once the optimization process started, it is difficult to work with the machine that the framework is running on, because the mouse cursor constantly moves and keyboard input is simulated. Hence, the Starter provides a special key combination which stops the run and returns the control to the tester. Prior to the start of the optimization process, the tester may supply a set of parameters in the form of a file named settings.xml. Figure B.1 in the appendix shows an example of this file. It contains parameter values used by the ACO algorithm, e.g. the number of generations, the population size, the sequence length, the learning rate, etc. In addition, the tester can specify the files which are to be deleted upon the start of the SUT. 51 8. Implementation - operates the GUI - instruments SUT and obtains fitness value SUT Agent TCP/IP - starts SUT and Agent - contains ACO - records sequences - replays sequences Starter Figure 8.1.: Main components of the framework. Starter Agent get available actions actions execute action get available actions . . . evaluate sequence fitness value Figure 8.2.: Communication between the Starter and the Agent. 52 8.2. Java Agents Figure 8.3.: The Starter displays information about the optimization process. 8.2. Java Agents This work makes use of Java agents for the implementation of several features, hence, this section briefly introduces this concept. A Java agent is a module that can be attached to a Virtual Machine (VM) at start or during runtime. Once attached, it has access to all loaded classes of the VM and the ability to modify them at runtime. Figure 8.4 shows how to start a VM with an attached agent via command line. Agents come in two flavours, either as native implementations using the Java Virtual Machine Tool Interface (JVMTI) or as Java implementations packaged in a jar-file. This work makes use of the latter, because they are easier to implement and platform independent. Figure 8.5 shows a simple agent and its premain()-method. This method is called before the main()-method of the Java program running in the VM. It receives as parameters the program arguments and an object of type Instrumentation. This object can be used to access loaded classes, for example via getAllLoadedClasses(). In addition, it allows to install a so called ClassFileTransformer via addTransformer(). Transformers may change the bytecode of classes during the class loading process. Java agents, both native and Java style, are often used by profilers like the Test and Performance Tools Platform 17 or the Java interactive Profiler 18 . 17 18 http://www.eclipse.org/tptp/ http://jiprof.sourceforge.net/ 53 8. Implementation j a v a −j a v a a g e n t : a g e n t . j a r SUT Figure 8.4.: Starting a Virtual Machine with an attached Java agent. import j a v a . l a n g . i n s t r u m e n t . ∗ ; public c l a s s Agent { public s t a t i c void premain ( S t r i n g agentArgs , I n s t r u m e n t a t i o n i n s t ) { for ( Class c l : i n s t . getAllLoadedClasses ( ) ) System . out . p r i n t l n ( "Name o f l o a d e d c l a s s : " + c l . getName ( ) ) ; i n s t . addTransformer (new MyTransformer ( ) ) ; } } Figure 8.5.: A simple Java agent, which prints out the currently loaded classes and installs a custom ClassFileTransformer. The framework’s Agent component is a Java style agent. It is necessary that the Agent is attached to the SUT at each start. The CTE provides a file called cte.ini, where it is possible to supply parameters for the Virtual Machine that the CTE is running in. So in order to attach the Agent, we add the line highlighted in Figure 8.6. 8.3. Implementation of the Fitness Function This section presents the implementation of the fitness function introduced in chapter 6. All of this functionality resides in the framework’s Agent component. In addition to their experimental results, McMaster and Memon [MM08] published the tool that they used to record the program call trees, the JavaCCTAgent 19 . This tool builds upon the Java Virtual Machine Tool Interface (JVMTI) to intercept method calls and thereby gather the necessary information to build a call tree. The 19 http://sourceforge.net/projects/javacctagent/ −vmargs −s e r v e r −Xms100m −Xmx1100m −ea −X b o o t c l a s s p a t h /p : asm−a l l − 3 . 3 . 1 . j a r -javaagent:agent.jar Figure 8.6.: Java VM parameters in the CTE initialization file cte.ini. 54 8.3. Implementation of the Fitness Function JVMTI is used by native Java agents. It provides callback functions for various events, like method entry / exit, thread start / end, Virtual Machine startup / exit, to name a few. The initial idea was to use the implementation of McMaster and Memon for this work. However, it turned out that this has two drawbacks: Overhead. When attached to the CTE, the agent significantly slowed down the execution. The CTE often became unresponsive and sometimes even crashed. This is due to the fact that JVMTI agents are implemented using native C code. This code communicates with the Virtual Machine via the Java Native Interface (JNI). Unfortunately, calling native methods from Java and vice versa is a quite expensive operation [KS01, WK00] and since the agent triggers two native function invocations (method entry and exit) for each method call within the VM, this introduces significant overhead. Moreover, since the program constantly crosses the boundaries between native and non-native code, the VM is unable to perform certain optimizations, like method inlining [Lia99]. Platform Dependency. Since the JVMTI is not available for all implementations of the Java VM [Ora06], this approach does not work on all platforms. For these reasons, the framework in this work uses a different solution. This solution injects bytecode into the SUT during runtime and does not make use of native code, which improves the performance and preserves platform independence. 8.3.1. The Concept Before getting to the actual bytecode instrumentation, the overall idea of the process is explained with the help of comprehensible Java source code. The goal is to instrument the SUT so that each method call can be detected. For example: A straightforward approach to instrumenting the Java program in Figure 6.2 would be to add callback invocations to the beginning and end of each method as depicted in Figure 8.7. This way each method call is tracked and can be saved as a node in the call tree. Inst.enter() and Inst.leave() are the callback methods. They take as a parameter the identifier of the called method and are responsible for building the actual call tree. Upon instrumentation, each method within the SUT will be assigned a method id. This id is obtained by combining the name of the class, the name of the method and its signature. For example: The method id for the main method would be i n t MID_MAIN = " s t a t i c v o i d S t a t . main ( S t r i n g [ ] ) " . hashCode ( ) ; This way every method possesses a unique identifier. This first approach has a little flaw: In line 16 of Figure 6.2 the data array is accessed without checking its length. If no parameters are supplied upon program invocation, 55 8. Implementation public c l a s s S t a t { double [ ] data ; public s t a t i c void main ( S t r i n g [ ] a r g s ) { Inst.enter(MID_MAIN); S t a t s = new S t a t ( a r g s ) ; s . calc (); Inst.leave(MID_MAIN); } public S t a t ( S t r i n g [ ] a r g s ) { Inst.enter(MID_STAT); data = new double [ a r g s . l e n g t h ] ; f o r ( i n t i = 0 ; i < a r g s . l e n g t h ; i ++) data [ i ] = Double . p a r s e D o u b l e ( a r g s [ i ] ) ; Inst.leave(MID_STAT); } public void c a l c ( ) { Inst.enter(MID_CALC); ... Inst.leave(MID_CALC); } . . . } Figure 8.7.: Modified version of the program in Figure 6.2. We added method calls at the begin and end of each method to record invocations. an exception is raised, which causes the program to crash. This prevents the invocation of Inst.leave(). So exceptions – as well as additional return statements – might perturb the construction process of the call tree. Hence, it is necessary to adjust the instrumentations as depicted in Figure 8.8. The try-finally-handler around the entire method body guarantees the proper invocation of both callbacks, no matter what happens inside of the body. These modifications neither affect the control flow of the original code nor intercept any thrown exceptions. The example shows only the instrumentation of the visible source code, but of course the Java library methods pow(), println() and parseDouble() have to be and will be instrumented, too. With this infrastructure in place, Inst.enter() and Inst.leave() can work as follows: 1. Upon start of the SUT, create a root node for the program tree. The nodes of the program tree are objects of the class shown in Figure 8.9. 2. Upon start of each thread t, assign a reference activeN odet to this thread. This reference will always point to the currently active node / method of the thread within the program call tree. At thread-start, activeN odet is set to point to the root node. 3. On every invocation of Inst.enter(mid), determine the currently active thread t. Consider the node pointed to by activeN odet . Check whether one of its children already has a method id identical to mid. If so, set activeN odet to point 56 8.3. Implementation of the Fitness Function public c l a s s S t a t { . . . public void c a l c ( ) { try{ Inst.enter(MID_CALC); i f ( data . l e n g t h > 1 ) { System . out . p r i n t l n ( mean ( ) + " , " + v a r ( ) ) ; } else { System . out . p r i n t l n ( data [ 0 ] + " , " + 0 . 0 ) ; } }finally{ Inst.leave(MID_CALC); } } . . . } Figure 8.8.: Add a finally handler to prevent exceptions from perturbing the call tree construction process. public c l a s s Node{ public Node ( i n t methodId ) { . . . } public synchronized L i s t <Node> g e t C h i l d r e n ( ) { . . . } public synchronized void addChild ( Node node ) { . . . } public Node g e t P a r e n t ( ) { . . . } public i n t getMethodId ( ) { . . . } } Figure 8.9.: Node class for the call tree. to this child. If not, then create a new child with mid as its method id and set activeN odet to point to this child. 4. Upon execution of Inst.leave(mid), determine the currently active thread t. Set activeN odet to point to its parent within the program call tree. 5. Since there are several threads accessing and modifying the program call tree, make sure that access is synchronized. The presented approach would introduce a new child for each recursive invocation. Thus step 2 is refined to only generate a new child if the caller is different from the callee. This is necessary, since recursive invocations might cause trees of excessive size. Applied to the CTE, this technique generated call trees of moderate size 20 . However, indirect recursions are not covered by the current approach and might be problematic for other SUTs. 20 In the experiment all call trees consumed less then 200 MB of main memory for length-10 sequences. 57 8. Implementation public Code : 0: 3: 4: 5: 8: 9: 10: 13: s t a t i c v o i d main ( j a v a . l a n g . S t r i n g [ ] ) ; new #1; // c l a s s S t a t dup aload_0 invokespecial #2; // Method "< i n i t > " : ( [ Ljava / l a n g / S t r i n g ; ) V astore_1 aload_1 invokevirtual #3; // Method c a l c : ( ) V return Figure 8.10.: Output of the javap tool for the main method of the program in Figure 6.2. 8.3.2. Bytecode Instrumentation So far only Java code has been used to illustrate the modifications. However, in practice the SUT’s source code is often unavailable. The solution for Java applications is to access the corresponding bytecodes. This section gives an introduction to the Java Virtual Machine (JVM) and the necessary steps involved with the bytecode instrumentation. The JVM is a stack-based abstract computing machine [LY99]. In contrast to other architectures, it does not make use of registers. Instead, each thread in the JVM has its own stack which is used for value storage, operations and method invocations. Appendix A provides an overview of the supported instructions. For each instruction it describes the necessary parameters and the state of the stack before and after its execution. Figure 8.10 shows the compiled version of the main method from Figure 6.2. The output was generated by applying the javap 21 disassembler to the compiled version of the Stat class. Addresses 0 to 5 correspond to the constructor invocation. The new #1 instruction at address 0 creates an object of class Stat on the Java heap and pushes the corresponding reference onto the current thread’s stack. The constant #1 was defined by the Java compiler upon compilation, and refers to the class Stat. The next operation, dup, simply duplicates the topmost stack value, i.e. it pushes a copy of the aforementioned reference onto the stack. This will be the this reference for the constructor call. aload_0 pushes the reference of the args parameter onto the stack, because the following constructor call takes it as a parameter. The constant 0 refers to the local variable with index 0. invokespecial #2 eventually invokes the constructor which pops the args reference as well as the this reference off the stack. The following instruction, astore_1 assigns the remaining object reference to the local variable s at index 1 and pops it off the stack. aload_1 loads the reference from s and pushes it onto the stack again, so that calc() can be invoked with invokevirtual. The program terminates by returning from the main method. 21 http://download.oracle.com/javase/6/docs/technotes/tools/windows/javap.html 58 8.3. Implementation of the Fitness Function p u b l i c s t a t i c v o i d main ( j a v a . l a n g . S t r i n g [ ] ) ; Code : 0: bipush 123 2: invokestatic #1; // Method I n s t . e n t e r : ( I )V 5: new #2; // c l a s s S t a t 8: dup 9: aload_0 10: invokespecial #3; // Method "< i n i t > " : ( [ Ljava / l a n g / S t r i n g ; ) V 1 3 : astore_1 1 4 : aload_1 15: invokevirtual #4; // Method c a l c : ( ) V 1 8 : bipush 123 2 0 : invokestatic #5; // Method I n s t . l e a v e : ( I )V 2 3 : goto 34 2 6 : astore_2 2 7 : bipush 123 2 9 : invokestatic #5; // Method I n s t . l e a v e : ( I )V 3 2 : aload_2 3 3 : athrow 3 4 : return Exception table: from to target type 0 18 26 any 26 27 26 any Figure 8.11.: Output of the javap tool for the main method of the program shown in Figure 8.7. In order to get the modified version of the main method with the finally handler, one needs to add the instructions highlighted in Figure 8.11, which implement the calls to the methods Inst.enter() and Inst.leave() and the try-finally handler. The modifications include an exception table, which instructs the JVM to jump to the finally handler whenever an exception is raised by any of the instructions in the method body. The instruction bipush 123 simply pushes main’s (in this case fictional) method id as an argument to Inst.enter() onto the stack. Figure 8.11 might suggest that the instrumentation process doubles the size of the bytecode. However, this is only the case for very small methods, like the main() method. For example: The increase of size for the calc() method is only approximately 30%. 8.3.3. The ASM Framework To facilitate the steps listed above, this work makes use of ASM 22 , a framework dedicated to bytecode instrumentation. Section 8.2 introduced the concept of Java agents and that they offer an interface called ClassFileTransformer. Implementations of this interface must provide the following method: 22 http://asm.ow2.org/ 59 8. Implementation by te [ ] t r a n s f o r m ( C l a s s L o a d e r l o a d e r , S t r i n g className , C l a s s <?> c l a s s B e i n g R e d e f i n e d , ProtectionDomain protectionDomain , byte [ ] c l a s s f i l e B u f f e r ) A ClassFileTransformer object can be registered for bytecode modification via Instrumentation.addTransformer() (see Figure 8.5). After this happened, the object’s transform() method gets invoked each time a class is loaded into the JVM. Among other arguments, transform() receives the bytecode of the loaded class within classfileBuffer. It may either return the original bytecode or change the contents of the buffer to return a modified class. Parsing and modifying the raw bytecode is a difficult and error-prone task. This is where ASM comes into play. It provides the classes and interfaces ClassReader, ClassVisitor, MethodVisitor and ClassWriter, among others. The concept is similar to that of a SAX parser for XML: A ClassReader object takes the raw bytecode, reads it sequentially and generates events by calling the appropriate methods on a ClassVisitor and MethodVisitors. For example: The ClassVisitor interface provides the methods visitField() and visitMethod(). By implementing these methods, one can modify, add or remove existing methods or fields. The MethodVisitor interface provides the methods visitMethodInsn(...) or visitJumpInsn(...) among many others. The visitJumpInsn(...) is called whenever the ClassReader encounters a goto, if eq, if neq, if null, etc. instruction. visitMethodInsn(...) is called whenever it encounters an invokevirtual, invokespecial and so forth. Of course the ClassReader provides these methods with the necessary arguments, like the opcode of the instruction, the operands, the method signature, etc. Figure 8.12 shows this process. By delegating (or omitting) calls to a ClassWriter, an arbitrarily modified version of the original class can be generated. For this work the task is relatively straightforward, since the modifications are purely additive. Instrumented Classes. Since the implementation for the generation of the program call tree makes use of certain parts of the Java library, it is not possible to instrument these classes without provoking infinite recursion. Hence, only the classes that are loaded after the employed Java core classes, are instrumented. This means that Object, String and certain container classes are not instrumented. The methods of these classes will not be represented in the call tree. Nondeterministic Behavior Throughout the tests with the framework and the fitness function, it was observed that identical input sequences for the CTE produced call trees of different sizes, which introduced a certain nondeterministic factor. This might be due to the fact that the environment (i.e. the environment provided by the operating system) is slightly different in each run. For example: The control flow within the SUT might depend on variables such as the system time, the available 60 8.4. Operating the SUT ClassReader ClassVisitor visitMethod() MethodVisitor visitInsn() visitInsn() visitInsn() . . . visitMethod() visitInsn() visitInsn() . . . . . . Figure 8.12.: Events generated during ASM bytecode instrumentation. main memory and external files in the temporary files directory of the system, all of which are subject to change. However, the size differences of the call trees used to be small 23 . McMaster and Memon [MM08] reported similar effects. 8.4. Operating the SUT This section explains the central ideas for the implementation of the features presented in chapter 7. All of this functionality resides in the framework’s Agent component. 8.4.1. Accessing the SWT Classes Figure 8.13 shows the two important components which implement the functionality used to operate the SUT. The WidgetTreeBuilder takes care of step 3) of the sequence generation process and the ActionFinder deals with step 4). In order to be able to build a complete widget tree of the GUI, the WidgetTreeBuilder needs access to the SWT main object called Display. This object provides a method for finding all Shells (e.g. getShells()) and the Shells in turn provide a 23 The size difference between the call trees was usually below 0.5%. 61 8. Implementation Agent Widget Tree Builder Action Finder SWT Library SUT Figure 8.13.: The Agent component accesses the SWT classes of the SUT. method getChildren() to access the child controls. Having access to the Display object, means being able to generate a complete widget tree. Thus it is necessary to link the implementation against the SWT library. More precisely, one needs to link it against the same instance of the library that the SUT is using. Since the WidgetTreeBilder depends on the library, it cannot work until the SUT loaded the SWT module into the Virtual Machine. Hence, the linking process needs to be postponed. A straightforward way to accomplish this, is to write a custom classloader which takes care of resolving the SWT-dependencies. The necessary steps involved are: 1. Start the SUT with the attached Java agent. 2. Wait for the SWT classes to be loaded and acquire access to the SWT classloader. 3. Load the WidgetTreeBuilder classes using the custom classloader which links these against the SWT classes by passing load requests to the SWT classloader. 4. Start the sequence generation process. Step 2 can be implemented using the Instrumentation object obtained from the agent’s premain() method. It provides the method getAllLoadedClasses() through which the Display class and its Class object can be found. Through this Class object the corresponding SWT classloader can be obtained. 62 8.4. Operating the SUT 8.4.2. Generating Inputs In order to generate inputs, the framework makes use of the Java.awt.Robot24 class. It provides the methods keyPress(int), keyRelease(int), mouseMove(int, int), mousePress(int) and mouseRelease(int), among others. With the help of these five methods it is possible to generate the same set of mouse and keyboard inputs that a human tester can generate. 8.4.3. Replaying Sequences In order to verify whether the generated sequences execute properly or reveal faults, the tester needs to replay them. The framework’s Starter component implements this functionality. This is done as follows: The process is similar to the sequence generation process. The SUT is started, the GUI is scanned and the set of alternative actions is determined. However, instead of performing step 5), where the optimization algorithm selects an action, the framework checks whether the action to be replayed is contained within the set of alternative actions. If so, it is executed. These steps are repeated until the end of the sequence is reached and the replay succeeded. If the action to be replayed is not among the set of alternative actions then the replay failed. Failed replays may occur due to the following reason: Throughout the optimization process, sequences are usually generated in a rather fast way, i.e. the time between two subsequently executed actions is small. For example: In the experiment of this work, a delay of 80 ms has been used. On the one hand this delay slows down the optimization process, on the other hand it is necessary, since the SUT needs time to react to the inputs and to complete its internal operations before it changes the state of the GUI. Upon replay, the tester usually wants to see what the sequence does and will use a much higher delay to slow down the execution speed. This can lead to problems. Depending on how long the framework waits after an action has been executed, the resulting set of alternative actions might be different. For example: Figure 7.6 shows the “New File” dialog of the CTE, which pops up after a click on the “File” menu item followed by a click on the “New CTE Diagram” menu item. The dialog is modal, meaning that it blocks input to the underlying main window and its controls. When generating a sequence that opens this dialog, the framework waits 80 ms after the click to “New CTE Diagram”. After this time it rescans the GUI, determines the alternative actions and executes the next action. The dialog might need more than 80 ms to pop up. This means that the framework can still execute actions on the main window after the click to “New CTE Diagram”. However, if the tester decides to replay the generated sequence at a lower speed, e.g. with a 1 second delay after each action, the dialog has enough time to pop up before the next action is executed. Thus a sequence that used to perform actions on the main window, will 24 http://download.oracle.com/javase/1,5.0/docs/api/java/awt/Robot.html 63 8. Implementation fail to replay. Hence, certain sequences can only be replayed at a certain speed. If this happens too often, then one might need to adjust the delay value for sequence generation. This can be done in the settings.xml file shown in Figure B.1. 64 9. Experiment To get an impression of the framework’s performance, a first experiment has been carried out. The intend was to answer the following two questions: 1. Does the quality of the sequences improve over time? 2. How well does the ACO algorithm compare to the Random Search algorithm, i.e. an algorithm that generates sequences completely at random? Is the quality difference between the best sequences statistically significant? The first question is interesting because the framework uses a metaheuristic algorithm and by that performs stochastic sampling [Luk09]. The Random Search algorithm implements a static uniform distribution over the search space. So the average sequence quality within each generation should stay at a constant level. The ACO algorithm also starts with a uniform distribution, because at the beginning all action pheromones are equal. Throughout the course of the optimization process, the algorithm draws samples from this distribution, by generating new sequences. The sequences’ fitness values, in turn, are used to adapt the distribution. Hence, over time, the probability density over the regions with good sequences and thus the average sequence fitness in later generations, should increase. Ideally, the best sequence is generated at the end of the run. If the presented approach and its implementation work, these effects should be observable in the experiment. The answer to the second question will give a first hint on whether it pays to apply metaheuristics to this kind of problem, or whether a simple random strategy might suffice. 9.1. Setup and Results To answer both questions, the framework has been applied to the CTE. In order to keep the size of the search space at a moderate level, the sequence length has been set to 10. Since Random Search is merely a special case of Ant Colony Optimization, it can be simulated by setting appropriate values for the ACO parameters. Table 9.1 shows the setup for both strategies. The stopping criterion used for both algorithms was a limit on the number of generations. Figures 9.1 and 9.2 show the course of the experiment for the RS and the ACO run, respectively. Contrary to the random run, the average sequence fitness in the 65 9. Experiment desc k α ρ popsize generations seqlength aco rnd 15 all 0.3 0.0 0.5 0.0 10 10 80 80 10 10 pheromone default 4000 4000 Table 9.1.: Parameters of the runs. α is the learning rate, k refers to the k best sequences selected for pheromone update on each generation and ρ is the pseudo-proportionate selection probability. Sequence Fitness Fitness 80000 70000 60000 Fitness 50000 40000 30000 20000 10000 0 0 100 200 300 400 Sequence 500 600 700 800 Figure 9.1.: Course of the RS run. generations of the ACO run, improves over time and eventually reaches its peak. At the beginning the algorithms exhibit similar performances, by generating low quality sequences with occasional outliers. Table 9.2 shows the results of both runs. ACO outperformed RS with BestACO = 70370 compared to BestRS = 49056. Each run generated 800 sequences, in 80 generations, within approximately 104 minutes. As for the second question, it is interesting whether the results in Table 9.2 are coincidental or representative. In order to prove significance, the above experiment has been repeated 51 times and the fitness values of the best sequences have been recorded. For each strategy the arithmetic fitness mean x, the estimated standard error SEx and the corresponding confidence intervals for the significance level α = 1% 66 9.1. Setup and Results Sequence Fitness Fitness 80000 70000 60000 Fitness 50000 40000 30000 20000 10000 0 0 100 200 300 400 Sequence 500 600 700 800 Figure 9.2.: Course of the ACO run. desc ACO RS best 70370 49056 duration ≈ 104 minutes ≈ 104 minutes Table 9.2.: Results of the runs. 67 9. Experiment # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 26 27 28 29 30 31 32 33 34 35 Best Fitness ACO RS 73466.0 23178.0 54943.0 22546.0 53332.0 24275.0 67738.0 28214.0 47562.0 25817.0 50969.0 41181.0 53477.0 26401.0 47937.0 27673.0 53457.0 49447.0 49032.0 43095.0 85646.0 68032.0 83301.0 47056.0 53298.0 33986.0 47793.0 25830.0 54286.0 42963.0 54210.0 50378.0 73484.0 55217.0 56105.0 69057.0 38093.0 50284.0 74442.0 79564.0 76068.0 72459.0 63244.0 77944.0 76008.0 77025.0 76064.0 61205.0 56256.0 77359.0 23745.0 30220.0 57301.0 54265.0 40078.0 28079.0 27973.0 23735.0 25705.0 33485.0 34255.0 49056.0 49351.0 60893.0 25197.0 22323.0 51784.0 46426.0 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 70370.0 78471.0 60233.0 58845.0 58090.0 63610.0 81020.0 62057.0 75708.0 64592.0 75282.0 55594.0 62586.0 55702.0 59320.0 76083.0 31022.0 26827.0 44898.0 27875.0 28966.0 30185.0 29351.0 51421.0 47533.0 48465.0 48827.0 34935.0 66666.0 33855.0 39056.0 25056.0 Table 9.3.: Fitness values of the best sequences found on each run. desc ACO RS n 51 51 max 85646 68032 min 38093 22323 x 63800.63 37410.49 SEx 1619.92 1735.73 CI [59621.24, 67980.02] [32932.31, 41888.68] α 1% 1% Table 9.4.: Confidence intervals for the two strategies. have been calculated by applying the following equations: s 1 n SEx = n P (xi − x)2 i=1 √ n−1 CI = [x − SEx · z, x + SEx · z] z = Φ−1 (0.995) ≈ 2.58 with Φ−1 being the inverse of the cumulative standard normal distribution function. Table 9.3 shows the maximum sequence fitness of each run. Table 9.4 shows the confidence intervals for both strategies. Since these do not overlap, the difference between the mean fitness values of both algorithms is statistically significant. 68 9.2. Threats to Validity 9.2. Threats to Validity Averaged over 51 runs, ACO significantly outperformed RS. However, the max and min columns of Table 9.4 indicate that RS occasionally finds better sequences. The experiment shown above, has been carried out with only a single SUT. In order to prove that ACO is the superior algorithm, additional experiments with different SUTs need to be conducted. The ACO algorithm has been tested in earlier experiments with different parameter settings. Throughout these experiments a lot of experience with the algorithm has been gained. Hence, the presented parameter values might have been chosen in favor of ACO. 69 9. Experiment 70 10. Conclusion and Future Work This chapter reviews the presented approach and proposes topics for future research. 10.1. Conclusion The goal of this work was to show that it is possible to automatically generate input sequences for applications with a complex GUI. Therefore, a new approach has been proposed, which exploits metaheuristic optimization techniques for the generation of test sequences. The following features set this approach apart from earlier works in this field: 1. Due to the fact that the technique does not make use of a model of the GUI or existing input sequences or other handcrafted artifacts, it requires no human intervention. 2. A novel fitness function, based on the Call Tree Size criterion [MM08], is applied to guide the optimization process. 3. The source code of the SUT is not required and the instrumentation of the SUT is performed automatically. 4. The GUI of the SUT is operated using a rich set of actions employing mouse and keyboard inputs. The presented approach has been implemented as a framework which employs the Ant Colony Optimization algorithm to generate sequences with high fitness values. This framework has been applied to the Classification Tree Editor, a Java application based on SWT. In a first experiment it significantly outperformed a random sequence generation strategy. These results are encouraging and indicate the suitability of metaheuristic optimization techniques in the context of GUI testing. A strength of the presented approach is its flexibility: It allows not only for the use of different optimization algorithms and fitness functions, but can also be extended to work on different platforms, with different programming languages and different GUI toolkits. 71 10. Conclusion and Future Work 10.2. Future Work Generating Test Suites The approach presented in this work only generates single input sequences. The next step towards a comprehensive future GUI testing framework is to generate entire test suites. A possible extension for the current technique could look as follows: After generating the first sequence, one could strive to find a second sequence which is different from the first one, in the sense, that it generates a call tree with new, unseen leaves. Therefore, it would be necessary to adjust the fitness function after each optimization run. The second sequence would then only obtain fitness points if its call tree contains new leaves. Fault Sensitivity This is one of the most important future steps in order to evaluate the benefit that can be drawn from the presented technique. In order to find out whether the generated sequences or suites are actually fault-revealing, the framework needs to be applied to a set of SUTs from different application domains, such as graphical editors, type setting programs, internet browsers, etc. This requires a database with known faults of these SUTs and information on how to reveal them. Then it is possible to generate test suites and determine the actual amount of discovered faults. Once such an infrastructure has been established, different fitness functions can be tested and compared to one another. Other Fitness Functions The currently used fitness function is rather trivial, since it simply counts the call tree’s leaves. But it might also be interesting to consider the diversity of the methods contained within the call tree. For example: If we have a tree with only 10 different methods but with a large number of leaves, then currently we would prefer this tree over one with a 100 different methods, yet with a fewer amount of leaves. However, the second, lower rated tree might execute more code and address more aspects of the SUT. Another idea could be to reward the maximal depth of the tree, with the intend to foster sequences that generate complex and long call chains. Other criteria could be interesting as well. For example the classical code coverage criteria. Another idea could be to search for sequences which cause high memory consumption of the SUT. However, the use of all of these criteria remains subjective until their fault-revealing capabilities have been proved in an empirical study as suggested in the previous paragraph. Understanding and Improving the current Framework There are several parameters involved with the optimization process. Finding a good setup is a difficult task and the optimal values vary with different SUTs. The parameter set employed in the experiment is quite likely not the ideal one and the framework could probably perform better with the right values. Hence, it could be interesting to perform a 72 10.2. Future Work series of experiments with varying setups in order to determine the impact of the different parameters on the efficiency and effectivity of the optimization process. The ACO algorithm has been applied to a variety of different problems, often with different selection rules and pheromone update strategies [DB05]. It could be worthwhile to experiment with these. The current framework does not make use of a sophisticated termination criterion, but generates a fixed amount of sequences. This is not optimal and may have stopped the optimization in the experiment prematurely. A better criterion could improve the effectivity as well as the efficiency. However, it is generally difficult to define such a criterion, since the best possible fitness value is usually unknown ahead of time. Other Algorithms There are various metaheuristic algorithms that have been successfully applied to many different tasks. Thus it is possible that one or more of these are better suited for sequence generation than ACO. One of the challenges of sequence generation is the following: The fitness value of a sequence also depends on the order of the contained actions. ACO does not take this order into account when updating the action pheromones. If it finds a sequence with a high fitness value, then all contained actions also obtain a high pheromone value. But when generating sequences this order matters and certain actions might only be effective in the presence of others, which means that linkage exists among them. ACO disregards linkage among the trail components [Luk09]. Metaheuristics like the Bivariate Marginal Distribution Algorithm [PM99], which captures important pairwise dependencies among components using a dependency graph, might be capable of addressing these problems. This way the likelihood of an action being selected, would depend on the previously executed actions. Oracle In the future the presented framework might be capable of generating faultsensitive test suites, but a human tester would still have to verify the sequences in order to find possible faults. This is a time-consuming task. It could be interesting to probe certain parts of the SUT, like for example log files, for erroneous outputs to identify “suspicious” sequences automatically. More subtle faults however, like an incorrect text color, are difficult to find. Therefore it must be possible for the tester to specify conditions which need to be true before and / or after the execution of certain actions. This way one could specify the behaviour of the desired system and the framework could then test the SUT against these conditions and by trying to find sequences which violate them. Improving the Start Time of the SUT Currently, the SUT is restarted in each iteration of the sequence generation process, with the intend to bring it to an initial state. This is the highest cost associated with the optimization process. Starting SUTs like the CTE or Eclipse takes several seconds, even on modern hardware. 73 10. Conclusion and Future Work Speeding up this process means improving the optimization efficiency. One way to accomplish this could be to save the state of the loaded and initialized application and restore it each time a new sequence needs to be generated. Virtual Machines or solutions for single processes, like CryoPID 25 could help with this task. Another option is the parallelization of the sequence generation process. 25 http://cryopid.berlios.de/ 74 !"#"$%&'()*+($,-.'/0)',*-$1,.',-2. 3 !"#"$%&'()*+($,-.'/0)',*-$1,.',-2. ,.$ "$ 1,.'$ *6$ '5($ ,-.'/0)',*-.$ '5"'$ 7"8($ 09$ '5($ !"#"$ %&'()*+(:$ "-$ "%.'/")'$ 7")5,-($ 1"-20"2($ '5"'$ ,.$ 01',7"'(1& A.45,.$ Java Bytecode Instructions (;()0'(+$ %&$ '5($ !"#"$ #,/'0"1$ 7")5,-(<$ 45($ !"#"$ %&'()*+($ ,.$ 2(-(/"'(+$ %&$ 1"-20"2($ )*79,1(/.$ '"/2(',-2$ '5($ !"#" =1"'6*/7:$7*.'$-*'"%1&$'5($!"#"$9/*2/"77,-2$1"-20"2(< $3-(4*-,)$ $5'7(/$%&'(.$ 56)*+( !"#$%&'( $=(.)/,6',*-$ 8'")9 :%(;*/(<!:";'(/< ""1*"+ >? "//"&/(6:$,-+(;$!"#"10( 1*"+.$*-'*$'5($.'")8$"$/(6(/(-)($6/*7$"-$"//"& "".'*/( @> "//"&/(6:$,-+(;:$#"10($! .'*/(.$,-'*$"$/(6(/(-)($,-$"-$"//"& ")*-.'A-011 B3 !"-011 90.5(.$"$!"##$/(6(/(-)($*-'*$'5($.'")8 "1*"+ 3C !"*%E()'/(6 1*"+.$"$/(6(/(-)($*-'*$'5($.'")8$6/*7$"$1*)"1$#"/,"%1( %&!'() "1*"+AB ?" !"*%E()'/(6 1*"+.$"$/(6(/(-)($*-'*$'5($.'")8$6/*7$1*)"1$#"/,"%1($B "1*"+A3 ?% !"*%E()'/(6 1*"+.$"$/(6(/(-)($*-'*$'5($.'")8$6/*7$1*)"1$#"/,"%1($3 "1*"+A? ?) !"*%E()'/(6 1*"+.$"$/(6(/(-)($*-'*$'5($.'")8$6/*7$1*)"1$#"/,"%1($? "1*"+A> ?+ !"*%E()'/(6 1*"+.$"$/(6(/(-)($*-'*$'5($.'")8$6/*7$1*)"1$#"/,"%1($> "-(F"//"& %+ )*0-'$!""//"&/(6 )/("'(.$"$-(F$"//"&$*6$/(6(/(-)(.$*6$1(-2'5$*+"!,$"-+ )*79*-(-'$'&9($,+(-',6,(+$%&$'5($)1"..$/(6(/(-)($&!'() G&!'()-.,(/$00$1$2$&!'()-.,(3H$,-$'5($)*-.'"-'$9**1 "/('0/- %B *%E()'/(6$!"I(79'&J /('0/-.$"$/(6(/(-)($6/*7$"$7('5*+ "//"&1(-2'5 %( "//"&/(6$!"1(-2'5 2('.$'5($1(-2'5$*6$"-$"//"& ".'*/( >" *%E()'/(6$! .'*/(.$"$/(6(/(-)($,-'*$"$1*)"1$#"/,"%1($%&!'() ".'*/(AB K% *%E()'/(6$! .'*/(.$"$/(6(/(-)($,-'*$1*)"1$#"/,"%1($B ".'*/(A3 K) *%E()'/(6$! .'*/(.$"$/(6(/(-)($,-'*$1*)"1$#"/,"%1($3 ".'*/(A? K+ *%E()'/(6$! .'*/(.$"$/(6(/(-)($,-'*$1*)"1$#"/,"%1($? ".'*/(A> K( *%E()'/(6$! .'*/(.$"$/(6(/(-)($,-'*$1*)"1$#"/,"%1($> "'5/*F %6 *%E()'/(6$!"I(79'&J: *%E()'/(6 '5/*F.$"-$(//*/$*/$(;)(9',*-$G-*',)($'5"'$'5($/(.'$*6$'5( 3D$,-+(; ?D$,-+(;%&'(3:$,-+(;%&'(? 3D$,-+(; .'")8$,.$)1("/(+:$1("#,-2$*-1&$"$/(6(/(-)($'*$'5( 45/*F"%1(H %"1*"+ >> "//"&/(6:$,-+(;$!"#"10( 1*"+.$"$%&'($*/$L**1("-$#"10($6/*7$"-$"//"& %".'*/( @K "//"&/(6:$,-+(;:$#"10($! .'*/(.$"$%&'($*/$L**1("-$#"10($,-'*$"-$"//"& %,90.5 3B !"#"10( 90.5(.$"$-.,($*-'*$'5($.'")8$".$"-$,-'(2(/$45#"( )"1*"+ >K "//"&/(6:$,-+(;$!"#"10( 1*"+.$"$)5"/$6/*7$"-$"//"& )".'*/( @@ "//"&/(6:$,-+(;:$#"10($! .'*/(.$"$)5"/$,-'*$"-$"//"& )5()8)".' )B *%E()'/(6$!"*%E()'/(6 )5()8.$F5('5(/$"-$+-6(*,7(8$,.$*6$"$)(/'",-$'&9(:$'5( 3D$%&'( ?D$,-+(;%&'(3:$,-+(;%&'(? )1"..$/(6(/(-)($*6$F5,)5$,.$,-$'5($)*-.'"-'$9**1$"' &!'()$G&!'()-.,(/$00$1$2$&!'()-.,(3H +?6 CB #"10($!"/(.01' )*-#(/'.$"$+*0%1($'*$"$61*"' +?, M( #"10($!"/(.01' )*-#(/'.$"$+*0%1($'*$"-$,-' +?1 M6 #"10($!"/(.01' )*-#(/'.$"$+*0%1($'*$"$1*-2 +"++ N> #"10(3:$#"10(?$!"/(.01' "++.$'F*$+*0%1(. +"1*"+ >3 "//"&/(6:$,-+(;$!"#"10( 1*"+.$"$+*0%1($6/*7$"-$"//"& 75 Java bytecode instruction listings 2 dastore 52 arrayref, index, value Ä stores a double into an array dcmpg 98 value1, value2 Ä result compares two doubles dcmpl 97 value1, value2 Ä result compares two doubles dconst_0 0e Ä 0.0 pushes the constant 0.0 onto the stack dconst_1 0f Ä 1.0 pushes the constant 1.0 onto the stack ddiv 6f value1, value2 Ä result divides two doubles dload 18 Ä value loads a double value from a local variable #index dload_0 26 Ä value loads a double from local variable 0 dload_1 27 Ä value loads a double from local variable 1 dload_2 28 Ä value loads a double from local variable 2 dload_3 29 Ä value loads a double from local variable 3 dmul 6b value1, value2 Ä result multiplies two doubles dneg 77 value Ä result negates a double drem 73 value1, value2 Ä result gets the remainder from a division between two doubles dreturn af value Ä [empty] returns a double from a method dstore 39 value Ä stores a double value into a local variable #index dstore_0 47 value Ä stores a double into local variable 0 dstore_1 48 value Ä stores a double into local variable 1 dstore_2 49 value Ä stores a double into local variable 2 dstore_3 4a value Ä stores a double into local variable 3 dsub 67 value1, value2 Ä result subtracts a double from another dup 59 value Ä value, value duplicates the value on top of the stack dup_x1 5a value2, value1 Ä value1, value2, value1 inserts a copy of the top value into the stack two values from the top dup_x2 5b value3, value2, value1 Ä value1, value3, value2, value1 inserts a copy of the top value into the stack two (if value2 is double or long it takes up the entry of value3, too) or three values (if value2 is neither double nor long) from the top dup2 5c {value2, value1} Ä {value2, value1}, {value2, value1} duplicate top two stack words (two values, if value1 is not double nor long; a single value, if value1 is double or long) dup2_x1 5d value3, {value2, value1} Ä {value2, value1}, value3, {value2, value1} duplicate two words and insert beneath third word (see explanation above) dup2_x2 5e {value4, value3}, {value2, value1} Ä {value2, value1}, {value4, value3}, {value2, value1} duplicate two words and insert beneath fourth word f2d 8d value Ä result converts a float to a double f2i 8b value Ä result converts a float to an int f2l 8c value Ä result converts a float to a long fadd 62 value1, value2 Ä result adds two floats 1: index 1: index Java bytecode instruction listings 3 faload 30 arrayref, index Ä value loads a float from an array fastore 51 arrayref, index, value Ä stores a float in an array fcmpg 96 value1, value2 Ä result compares two floats fcmpl 95 value1, value2 Ä result compares two floats fconst_0 0b Ä 0.0f pushes 0.0f on the stack fconst_1 0c Ä 1.0f pushes 1.0f on the stack fconst_2 0d Ä 2.0f pushes 2.0f on the stack fdiv 6e value1, value2 Ä result divides two floats fload 17 Ä value loads a float value from a local variable #index fload_0 22 Ä value loads a float value from local variable 0 fload_1 23 Ä value loads a float value from local variable 1 fload_2 24 Ä value loads a float value from local variable 2 fload_3 25 Ä value loads a float value from local variable 3 fmul 6a value1, value2 Ä result multiplies two floats fneg 76 value Ä result negates a float frem 72 value1, value2 Ä result gets the remainder from a division between two floats freturn ae value Ä [empty] returns a float fstore 38 value Ä stores a float value into a local variable #index fstore_0 43 value Ä stores a float value into local variable 0 fstore_1 44 value Ä stores a float value into local variable 1 fstore_2 45 value Ä stores a float value into local variable 2 fstore_3 46 value Ä stores a float value into local variable 3 fsub 66 value1, value2 Ä result subtracts two floats getfield b4 2: index1, index2 objectref Ä value gets a field value of an object objectref, where the field is identified by field reference in the constant pool index (index1 << 8 + index2) getstatic b2 2: index1, index2 Ä value gets a static field value of a class, where the field is identified by field reference in the constant pool index (index1 << 8 + index2) goto a7 2: branchbyte1, branchbyte2 [no change] goes to another instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 + branchbyte2) goto_w c8 4: branchbyte1, branchbyte2, branchbyte3, branchbyte4 [no change] goes to another instruction at branchoffset (signed int constructed from unsigned bytes branchbyte1 << 24 + branchbyte2 << 16 + branchbyte3 << 8 + branchbyte4) i2b 91 value Ä result converts an int into a byte i2c 92 value Ä result converts an int into a character i2d 87 value Ä result converts an int into a double i2f 86 value Ä result converts an int into a float i2l 85 value Ä result converts an int into a long i2s 93 value Ä result converts an int into a short 1: index 1: index Java bytecode instruction listings 4 iadd 60 value1, value2 Ä result adds two ints together iaload 2e arrayref, index Ä value loads an int from an array iand 7e value1, value2 Ä result performs a bitwise and on two integers iastore 4f arrayref, index, value Ä stores an int into an array iconst_m1 02 Ä -1 loads the int value -1 onto the stack iconst_0 03 Ä0 loads the int value 0 onto the stack iconst_1 04 Ä1 loads the int value 1 onto the stack iconst_2 05 Ä2 loads the int value 2 onto the stack iconst_3 06 Ä3 loads the int value 3 onto the stack iconst_4 07 Ä4 loads the int value 4 onto the stack iconst_5 08 Ä5 loads the int value 5 onto the stack idiv 6c value1, value2 Ä result divides two integers if_acmpeq a5 2: branchbyte1, branchbyte2 value1, value2 Ä if references are equal, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 + branchbyte2) if_acmpne a6 2: branchbyte1, branchbyte2 value1, value2 Ä if references are not equal, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 + branchbyte2) if_icmpeq 9f 2: branchbyte1, branchbyte2 value1, value2 Ä if ints are equal, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 + branchbyte2) if_icmpne a0 2: branchbyte1, branchbyte2 value1, value2 Ä if ints are not equal, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 + branchbyte2) if_icmplt a1 2: branchbyte1, branchbyte2 value1, value2 Ä if value1 is less than value2, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 + branchbyte2) if_icmpge a2 2: branchbyte1, branchbyte2 value1, value2 Ä if value1 is greater than or equal to value2, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 + branchbyte2) if_icmpgt a3 2: branchbyte1, branchbyte2 value1, value2 Ä if value1 is greater than value2, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 + branchbyte2) if_icmple a4 2: branchbyte1, branchbyte2 value1, value2 Ä if value1 is less than or equal to value2, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 + branchbyte2) ifeq 99 2: branchbyte1, branchbyte2 value Ä if value is 0, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 + branchbyte2) ifne 9a 2: branchbyte1, branchbyte2 value Ä if value is not 0, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 + branchbyte2) iflt 9b 2: branchbyte1, branchbyte2 value Ä if value is less than 0, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 + branchbyte2) Java bytecode instruction listings 5 ifge 9c 2: branchbyte1, branchbyte2 value Ä if value is greater than or equal to 0, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 + branchbyte2) ifgt 9d 2: branchbyte1, branchbyte2 value Ä if value is greater than 0, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 + branchbyte2) ifle 9e 2: branchbyte1, branchbyte2 value Ä if value is less than or equal to 0, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 + branchbyte2) ifnonnull c7 2: branchbyte1, branchbyte2 value Ä if value is not null, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 + branchbyte2) ifnull c6 2: branchbyte1, branchbyte2 value Ä if value is null, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 + branchbyte2) iinc 84 2: index, const [No change] increment local variable #index by signed byte const iload 15 1: index Ä value loads an int value from a local variable #index iload_0 1a Ä value loads an int value from local variable 0 iload_1 1b Ä value loads an int value from local variable 1 iload_2 1c Ä value loads an int value from local variable 2 iload_3 1d Ä value loads an int value from local variable 3 imul 68 value1, value2 Ä result multiply two integers ineg 74 value Ä result negate int instanceof c1 2: indexbyte1, indexbyte2 objectref Ä result determines if an object objectref is of a given type, identified by class reference index in constant pool (indexbyte1 << 8 + indexbyte2) invokeinterface b9 4: indexbyte1, indexbyte2, count, 0 objectref, [arg1, arg2, ...] Ä invokes an interface method on object objectref, where the interface method is identified by method reference index in constant pool (indexbyte1 << 8 + indexbyte2) invokespecial b7 2: indexbyte1, indexbyte2 objectref, [arg1, arg2, ...] Ä invoke instance method on object objectref, where the method is identified by method reference index in constant pool (indexbyte1 << 8 + indexbyte2) invokestatic b8 2: indexbyte1, indexbyte2 [arg1, arg2, ...] Ä invoke a static method, where the method is identified by method reference index in constant pool (indexbyte1 << 8 + indexbyte2) invokevirtual b6 2: indexbyte1, indexbyte2 objectref, [arg1, arg2, ...] Ä invoke virtual method on object objectref, where the method is identified by method reference index in constant pool (indexbyte1 << 8 + indexbyte2) ior 80 value1, value2 Ä result bitwise int or irem 70 value1, value2 Ä result logical int remainder ireturn ac value Ä [empty] returns an integer from a method ishl 78 value1, value2 Ä result int shift left ishr 7a value1, value2 Ä result int arithmetic shift right istore 36 value Ä store int value into variable #index istore_0 3b value Ä store int value into variable 0 1: index Java bytecode instruction listings 6 istore_1 3c value Ä store int value into variable 1 istore_2 3d value Ä store int value into variable 2 istore_3 3e value Ä store int value into variable 3 isub 64 value1, value2 Ä result int subtract iushr 7c value1, value2 Ä result int logical shift right ixor 82 value1, value2 Ä result int xor jsr a8 2: branchbyte1, branchbyte2 Ä address jump to subroutine at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 + branchbyte2) and place the return address on the stack jsr_w c9 4: branchbyte1, branchbyte2, branchbyte3, branchbyte4 Ä address jump to subroutine at branchoffset (signed int constructed from unsigned bytes branchbyte1 << 24 + branchbyte2 << 16 + branchbyte3 << 8 + branchbyte4) and place the return address on the stack l2d 8a value Ä result converts a long to a double l2f 89 value Ä result converts a long to a float l2i 88 value Ä result converts a long to a int ladd 61 value1, value2 Ä result add two longs laload 2f arrayref, index Ä value load a long from an array land 7f value1, value2 Ä result bitwise and of two longs lastore 50 arrayref, index, value Ä store a long to an array lcmp 94 value1, value2 Ä result compares two longs values lconst_0 09 Ä 0L pushes the long 0 onto the stack lconst_1 0a Ä 1L pushes the long 1 onto the stack ldc 12 1: index Ä value pushes a constant #index from a constant pool (String, int or float) onto the stack ldc_w 13 2: indexbyte1, indexbyte2 Ä value pushes a constant #index from a constant pool (String, int or float) onto the stack (wide index is constructed as indexbyte1 << 8 + indexbyte2) ldc2_w 14 2: indexbyte1, indexbyte2 Ä value pushes a constant #index from a constant pool (double or long) onto the stack (wide index is constructed as indexbyte1 << 8 + indexbyte2) ldiv 6d value1, value2 Ä result divide two longs lload 16 Ä value load a long value from a local variable #index lload_0 1e Ä value load a long value from a local variable 0 lload_1 1f Ä value load a long value from a local variable 1 lload_2 20 Ä value load a long value from a local variable 2 lload_3 21 Ä value load a long value from a local variable 3 lmul 69 value1, value2 Ä result multiplies two longs lneg 75 value Ä result negates a long lookupswitch ab key Ä a target address is looked up from a table using a key and execution continues from the instruction at that address 1: index 4+: <0-3 bytes padding>, defaultbyte1, defaultbyte2, defaultbyte3, defaultbyte4, npairs1, npairs2, npairs3, npairs4, match-offset pairs... Java bytecode instruction listings 7 lor 81 value1, value2 Ä result bitwise or of two longs lrem 71 value1, value2 Ä result remainder of division of two longs lreturn ad value Ä [empty] returns a long value lshl 79 value1, value2 Ä result bitwise shift left of a long value1 by value2 positions lshr 7b value1, value2 Ä result bitwise shift right of a long value1 by value2 positions lstore 37 value Ä store a long value in a local variable #index lstore_0 3f value Ä store a long value in a local variable 0 lstore_1 40 value Ä store a long value in a local variable 1 lstore_2 41 value Ä store a long value in a local variable 2 lstore_3 42 value Ä store a long value in a local variable 3 lsub 65 value1, value2 Ä result subtract two longs lushr 7d value1, value2 Ä result bitwise shift right of a long value1 by value2 positions, unsigned lxor 83 value1, value2 Ä result bitwise exclusive or of two longs monitorenter c2 objectref Ä enter monitor for object ("grab the lock" - start of synchronized() section) monitorexit c3 objectref Ä exit monitor for object ("release the lock" - end of synchronized() section) multianewarray c5 3: indexbyte1, indexbyte2, dimensions count1, [count2,...] Ä arrayref create a new array of dimensions dimensions with elements of type identified by class reference in constant pool index (indexbyte1 << 8 + indexbyte2); the sizes of each dimension is identified by count1, [count2, etc.] new bb 2: indexbyte1, indexbyte2 Ä objectref creates new object of type identified by class reference in constant pool index (indexbyte1 << 8 + indexbyte2) newarray bc 1: atype count Ä arrayref creates new array with count elements of primitive type identified by atype nop 00 [No change] performs no operation pop 57 value Ä discards the top value on the stack pop2 58 {value2, value1} Ä discards the top two values on the stack (or one value, if it is a double or long) putfield b5 2: indexbyte1, indexbyte2 objectref, value Ä set field to value in an object objectref, where the field is identified by a field reference index in constant pool (indexbyte1 << 8 + indexbyte2) putstatic b3 2: indexbyte1, indexbyte2 value Ä set static field to value in a class, where the field is identified by a field reference index in constant pool (indexbyte1 << 8 + indexbyte2) ret a9 1: index [No change] continue execution from address taken from a local variable #index (the asymmetry with jsr is intentional) return b1 Ä [empty] return void from method saload 35 arrayref, index Ä value load short from array sastore 56 arrayref, index, value Ä store short to array sipush 11 Ä value pushes a short onto the stack swap 5f value2, value1 Ä value1, value2 swaps two top words on the stack (note that value1 and value2 must not be double or long) 1: index 2: byte1, byte2 Java bytecode instruction listings 8 tableswitch aa 4+: [0-3 bytes padding], defaultbyte1, defaultbyte2, defaultbyte3, defaultbyte4, lowbyte1, lowbyte2, lowbyte3, lowbyte4, highbyte1, highbyte2, highbyte3, highbyte4, jump offsets... index Ä wide c4 3/5: opcode, indexbyte1, indexbyte2 or iinc, indexbyte1, indexbyte2, countbyte1, countbyte2 [same as for corresponding execute opcode, where opcode is either iload, fload, instructions] aload, lload, dload, istore, fstore, astore, lstore, dstore, or ret, but assume the index is 16 bit; or execute iinc, where the index is 16 bits and the constant to increment by is a signed 16 bit short breakpoint ca reserved for breakpoints in Java debuggers; should not appear in any class file impdep1 fe reserved for implementation-dependent operations within debuggers; should not appear in any class file impdep2 ff reserved for implementation-dependent operations within debuggers; should not appear in any class file (no name) cb-fd these values are currently unassigned for opcodes and are reserved for future use xxxunusedxxx ba External links Ä Sun's Java Virtual Machine Specification [1] References [1] http:/ / java. sun. com/ docs/ books/ vmspec/ 2nd-edition/ html/ VMSpecTOC. doc. html continue execution from an address in the table at offset index this opcode is reserved "for historical reasons" B. Miscellaneous <?xml version=" 1 . 0 " e n c o d i n g="UTF−8" standalone=" y e s " ?> < s e t t i n g s> <t i m e o u t>30000</ t i m e o u t> <seqLength>10</ seqLength> <g e n e r a t i o n s>80</ g e n e r a t i o n s> <p o p s i z e>10</ p o p s i z e> <topKUpdateTrails>0</ topKUpdateTrails> <l e a r n i n g R a t e>0 . 3</ l e a r n i n g R a t e> <p s e u d o P r o p S e l e c t i o n P r o b a b i l i t y> 0.7 </ p s e u d o P r o p S e l e c t i o n P r o b a b i l i t y> <pheromoneDefault>4000</ pheromoneDefault> <a c t i o n D e l a y>80</ a c t i o n D e l a y> <a c t i o n D u r a t i o n>0</ a c t i o n D u r a t i o n> <o u t D i r>output</ o u t D i r> <s u s p D i r>s u s p i c i o u s</ s u s p D i r> <s u t> . . / c t e . exe</ s u t> <mainWindowName>CTE XL P r o f e s s i o n a l</mainWindowName> <c l e a n u p F i l e s> < f i l e > . . / workspace</ f i l e > < f i l e> C: \Dokumente und E i n s t e l l u n g e n \ B a u e r s f e l d \ d e f a u l t . c t e </ f i l e > < f i l e > . . / c o n f i g u r a t i o n / o r g . e c l i p s e . o s g i / . manager</ f i l e > </ c l e a n u p F i l e s> <s u s p i c i o u s> < !−−<s t d o u t>.</ s t d o u t>−−> <s t d e r r> .</ s t d e r r> </ s u s p i c i o u s> </ s e t t i n g s> Figure B.1.: settings.xml. 83 B. Miscellaneous Display Shell CBanner Composite Composite CoolBar CoolBar ToolItem CLabel ToolBar ProgressIndicator ToolItem ToolItem ToolBar ToolBar ToolBar ToolItem ToolItem ToolItem ToolItem ToolItem ToolItem ToolBar ToolBar ToolBar ToolItem ToolItem ToolItem ToolItem ToolItem ToolItem ToolItem ToolItem Combo Combo Combo ToolBar ToolBar CoolItem CoolItem CoolItem ToolItem ToolItem ToolItem ToolItem ToolItem ToolItem ToolBar PageBook ToolItem LayoutComposite ToolItem TabbedPropertyComposite PageBook TopNavigationElement ListElement BottomNavigationElement TabbedPropertyList CLabel LayoutComposite ListElement LayoutComposite ScrollBar LayoutComposite TreeItem TreeItem TreeItem TreeItem TreeItem TreeItem TreeItem ScrollBar TreeItem FigureCanvas ScrollBar TreeItem TreeColumn FreeformLayeredPane ScrollBar Tree TreeItem TreeColumn Sash Sash RulerComposite Composite TreeItem TreeItem ScrolledComposite TabbedPropertyTitle Toolbars SynchronizedTestCaseTable TreeItem ScrollBar LayoutComposite LayoutComposite FreeformLayer FreeformLayeredPane FreeformLayer ResizeHandle ResizeHandle DiagramRenderedScalableFreeformLayeredPane MoveHandle ResizeHandle ResizeHandle ResizeHandle FreeformLayer ResizeHandle GuideLayer FeedbackLayer ResizeHandle ResizeHandle LayoutComposite Tree TreeItemHasParentFigure TreeItemHasParentFigure BorderItemsAwareFreeFormLayer TreeItemHasParentFigure TreeItemHasParentFigure TreeItemHasParentFigure TreeItemHasParentFigure ConnectionLayerEx TreeItemHasParentFigure TreeItemHasParentFigure FreeformLayer TreeItemHasParentFigure TreeItemHasParentFigure TreeColumn TreeItem TreeColumn TreeItem TreeItem DefaultSizeNodeFigure DefaultSizeNodeFigure DefaultSizeNodeFigure DefaultSizeNodeFigure DefaultSizeNodeFigure DefaultSizeNodeFigure DefaultSizeNodeFigure DefaultSizeNodeFigure DefaultSizeNodeFigure DefaultSizeNodeFigure CteClassFigure CteClassFigure CteClassFigure CteClassificationFigure CteCompositionFigure CteClassFigure CteClassificationFigure CteClassFigure CteClassificationFigure CteCompositionFigure CteCompositionFigure WrappingLabel WrappingLabel WrappingLabel WrappingLabel WrappingLabel WrappingLabel WrappingLabel WrappingLabel WrappingLabel WrappingLabel WrappingLabel FlowPage FlowPage FlowPage FlowPage FlowPage FlowPage FlowPage FlowPage FlowPage FlowPage FlowPage TextFlowEx TextFlowEx TextFlowEx TextFlowEx TextFlowEx TextFlowEx TextFlowEx TextFlowEx TextFlowEx TextFlowEx TextFlowEx TreeItem DefaultSizeNodeFigure Figure B.2.: Widget Tree for the CTE. CoolItem CoolItem Composite ToolItem CoolItem ToolItem ToolItem Composite Composite ToolBar Composite Composite StatusLine Composite ToolItem Composite Composite ViewForm ToolBar Composite Composite CTabFolder CTabItem ToolItem Composite CTabFolder ViewForm Composite Canvas ProgressRegion$1 CoolBar CTabFolder CTabItem ToolItem TreeItem Composite MenuItem Sash ToolBar SashForm Tree TrimCommonUIHandle MenuItem Sash ViewForm Popup Menu SashForm MenuItem Composite Sash SashForm MenuItem CTabItem Menu CoolItem MenuItem MenuItem MenuItem MenuItem MenuItem MenuItem MenuItem MenuItem MenuItem MenuItem MenuItem MenuItem MenuItem MenuItem MenuItem Tree Figures MenuItem Menu MenuItem MenuItem MenuItem MenuItem MenuItem MenuItem MenuItem 84 MenuItem MenuItem MenuItem Main Menu Main Window Widgets Snippets Examples FAQ Tools Javadoc Documentation Community Bugs Contact Us SWT Home Eclipse Platform Eclipse Home SWT Widgets Below are screenshots and links to documentation for many of the widgets included in SWT. For a complete list of classes including those that don't screenshot well, see the SWT Javadoc. Browser Button (SWT.ARROW) Button (SWT.CHECK) javadoc - snippets javadoc - snippets javadoc - snippets Button (SWT.PUSH) Button (SWT.RADIO) Button (SWT.TOGGLE) javadoc - snippets javadoc - snippets javadoc - snippets Canvas Combo Composite javadoc - snippets javadoc - snippets javadoc - snippets CoolBar CTabFolder DateTime javadoc - snippets javadoc - snippets javadoc - snippets ExpandBar Group Label javadoc - snippets javadoc javadoc - snippets Link List Menu javadoc - snippets javadoc - snippets javadoc - snippets 85 Figure B.3.: SWT Widgets [Ecl11]. B. Miscellaneous ProgressBar Sash ScrolledComposite javadoc - snippets javadoc - snippets javadoc - snippets Shell Slider Scale javadoc - snippets javadoc - snippets javadoc - snippets Spinner StyledText TabFolder javadoc - snippets javadoc - snippets javadoc - snippets Table Text (SWT.SINGLE) Text (SWT.MULTI) javadoc - snippets javadoc - snippets javadoc - snippets ToolBar Tray Tree javadoc - snippets javadoc - snippets javadoc - snippets Figure B.4.: SWT Widgets [Ecl11]. 86 Bibliography [ADJ+ 11] Shay Artzi, Julian Dolby, Simon Holm Jensen, Anders Møller, and Frank Tip. A framework for automated testing of javascript web applications. In Proceeding of the 33rd international conference on Software engineering, ICSE ’11, pages 571–580, New York, NY, USA, 2011. ACM. [AOA05] Anneliese A. Andrews, Jeff Offutt, and Roger T. Alexander. Testing web applications by modeling with fsms. Software and Systems Modeling, 4:326–345, 2005. [BHMV08] Walter Binder, Jarle Hulaas, Philippe Moret, and Alex Villazón. Platform-independent profiling in a virtual execution environment. Software: Practice and Experience, 2008. [BPMP08] Alexander E.I. Brownlee, Martin Pelikan, John A.W. McCall, and Andrei Petrovski. An application of a multivariate estimation of distribution algorithm to cancer chemotherapy. In Proceedings of the 10th annual conference on Genetic and evolutionary computation, GECCO ’08, pages 463–464, New York, NY, USA, 2008. ACM. [BSS02] André Baresel, Harmen Sthamer, and Michael Schmidt. Fitness function design to improve evolutionary structural testing. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’02, pages 1329–1336, San Francisco, CA, USA, 2002. Morgan Kaufmann Publishers Inc. [CGMC03] Myra B. Cohen, Peter B. Gibbons, Warwick B. Mugridge, and Charles J. Colbourn. Constructing test suites for interaction testing. In Proceedings of the 25th International Conference on Software Engineering, ICSE ’03, pages 38–48, Washington, DC, USA, 2003. IEEE Computer Society. [DB05] Marco Dorigo and Christian Blum. Ant colony optimization theory: a survey. Theor. Comput. Sci., 344:243–278, November 2005. [DCG99] Marco Dorigo, Gianni Di Caro, and Luca M. Gambardella. Ant algorithms for discrete optimization. Artificial Life, 5:137–172, 1999. [DS09] Marco Dorigo and Thomas Stützle. Ant colony optimization: Overview and recent advances, 2009. [Ecl11] Eclipse.org. SWT Widget Gallery. http://www.eclipse.org/swt/ widgets/, 2011. [Online; accessed 02-July-2011]. V Bibliography [GADP89] S. Goss, S. Aron, J. Deneubourg, and J. Pasteels. Self-organized shortcuts in the Argentine ant. Naturwissenschaften, 76(12):579–581, December 1989. [GCD09] Brady J. Garvin, Myra B. Cohen, and Matthew B. Dwyer. An improved meta-heuristic search for constrained interaction testing. In Proceedings of the 2009 1st International Symposium on Search Based Software Engineering, SSBSE ’09, pages 13–22, Washington, DC, USA, 2009. IEEE Computer Society. [HCM10] Si Huang, Myra B. Cohen, and Atif M. Memon. Repairing gui test suites using a genetic algorithm. In ICST ’10: Proceedings of the 2010 Third International Conference on Software Testing, Verification and Validation, pages 245–254, Washington, DC, USA, 2010. IEEE Computer Society. [KG96] David J. Kasik and Harry G. George. Toward automatic generation of novice user test scripts. In CHI ’96: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 244–251, New York, NY, USA, 1996. ACM. [KS01] Dawid Kurzyniec and Vaidy Sunderam. Efficient cooperation between java and native codes – jni performance benchmark. In In The 2001 International Conference on Parallel and Distributed Processing Techniques and Applications, 2001. [Lia99] Sheng Liang. Java Native Interface: Programmer’s Guide and Reference. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1st edition, 1999. [Luk09] Sean Luke. Essentials of Metaheuristics. Lulu, 2009. http://cs.gmu. edu/~sean/book/metaheuristics/. [LY99] Tim Lindholm and Frank Yellin. Java(TM) Virtual Machine Specification, The (2nd Edition). Prentice Hall PTR, 2 edition, April 1999. [MBN03] Atif Memon, Ishan Banerjee, and Adithya Nagarajan. Gui ripping: Reverse engineering of graphical user interfaces for testing. In Proceedings of the 10th Working Conference on Reverse Engineering, WCRE ’03, pages 260–, Washington, DC, USA, 2003. IEEE Computer Society. [McM04] Phil McMinn. Search-based software test data generation: a survey: Research articles. Softw. Test. Verif. Reliab., 14:105–156, June 2004. [Mem01] Atif M. Memon. A comprehensive framework for testing graphical user interfaces. Ph.D., 2001. Advisors: Mary Lou Soffa and Martha Pollack; Committee members: Prof. Rajiv Gupta (University of Arizona), Prof. Adele E. Howe (Colorado State University), Prof. Lori Pollock (University of Delaware). VI Bibliography [Mem07] Atif Memon. Softw. test. verif. reliab.; an event-flow model of gui-based applications for testing: Research articles. 17(3):137–157, 2007. [MM08] Scott McMaster and Atif Memon. Call-stack coverage for gui test suite reduction. IEEE Transactions on Software Engineering, 34:99–115, 2008. [MSP01] Atif M. Memon, Mary Lou Soffa, and Martha E. Pollack. Coverage criteria for gui testing. In ESEC/FSE-9: Proceedings of the 8th European software engineering conference held jointly with 9th ACM SIGSOFT international symposium on Foundations of software engineering, pages 256–267, New York, NY, USA, 2001. ACM. [MT11] Alessandro Marchetto and Paolo Tonella. Using search-based algorithms for ajax event sequence generation during testing. Empirical Softw. Engg., 16:103–140, February 2011. [Ora06] Oracle. JVM Tool Interface. http://download.oracle.com/javase/ 6/docs/platform/jvmti/jvmti.html, 2006. [Online; accessed 25-May2011]. [PM99] Martin Pelikan and Heinz Mühlenbein. Marginal distributions in evolutionary algorithms. In Proceedings of the International Conference on Genetic Algorithms Mendel 1998, pages 90–95, 1999. [Str06] Jaymie Strecker. An empirical evaluation of test adequacy criteria for event-driven programs, 2006. [SWB02] Harmen Sthamer, Joachim Wegener, and Andre Baresel. Using evolutionary testing to improve efficiency and quality in software testing. In In Proceedings of the 2nd Asia-Pacific Conference on Software Testing Analysis and Review (AsiaSTAR, pages 22–24, 2002. [Wap07] Stefan Wappler. Automatic Generation Of Object-Oriented Unit Tests Using Genetic Programming. PhD thesis, Institut für Softwaretechnik und Theoretische Informatik, Elektrotechnik und Informatik, Technische Universitat Berlin, 19 December 2007. [Weg01] Joachim Wegener. Evolutionärer test des zeitverhaltens von realzeitsystemen. Shaker Verlag, 2001. [WGGS96] Joachim Wegener, Klaus Grimm, Matthias Grochtmann, and Harmen Sthamer. Systematic testing of real-time systems. In In Proceedings of the 4th European Conference on Software Testing, Analysis and Review (EuroSTAR 1996), 1996. [Wik11] Wikipedia. Java bytecode instruction listings — wikipedia, the free encyclopedia. http://en.wikipedia.org/w/index.php?title=Java_ bytecode_instruction_listings&oldid=429777649, 2011. [Online; accessed 24-May-2011]. VII Bibliography [WK00] Steve Wilson and Jeff Kesselman. Java Platform Performance: Strategies and Tactics. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2000. [WM97] David H. Wolpert and William G. Macready. No free lunch theorems for optimization. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 1(1):67–82, 1997. [WW06] Stefan Wappler and Joachim Wegener. Evolutionary unit testing of object-oriented software using strongly-typed genetic programming. In Proceedings of the 8th annual conference on Genetic and evolutionary computation, GECCO ’06, pages 1925–1932, New York, NY, USA, 2006. ACM. [WWW07] Andreas Windisch, Stefan Wappler, and Joachim Wegener. Applying particle swarm optimization to software testing. In Proceedings of the 9th annual conference on Genetic and evolutionary computation, GECCO ’07, pages 1121–1128, New York, NY, USA, 2007. ACM. VIII A Metaheuristic Approach to automatic Test Case Generation for GUI-Based Applications. Erklärungen Selbstständigkeitserklärung Ich erkläre hiermit, dass ich die vorliegende Arbeit selbstständig und nur unter Verwendung der angegebenen Quellen und Hilfsmittel angefertigt habe. Berlin, Datum Unterschrift Einverständniserklärung Ich erkläre hiermit mein Einverständnis, dass die vorliegende Arbeit in der Bibliothek des Institutes für Informatik der Humboldt-Universität zu Berlin ausgestellt werden darf. Berlin, Datum Unterschrift IX