Analyzing the Change-Proneness of APIs and web APIs

Transcription

Analyzing the Change-Proneness
of APIs and web APIs
Analyzing the Change-Proneness
of APIs and web APIs
PROEFSCHRIFT
ter verkrijging van de graad van doctor
aan de Technische Universiteit Delft,
op gezag van de Rector Magnificus prof. ir. K.C.A.M. Luyben,
voorzitter van het College voor Promoties,
in het openbaar te verdedigen op woensdag 7 januari 2015 om 12.30
uur door
Daniele ROMANO
Master of Science in Computer Science - University Of Sannio
geboren te Benevento, Italy.
Dit proefschrift is goedgekeurd door de promotors:
Prof. dr. A. van Deursen
Prof. dr. M. Pinzger
Samenstelling promotiecomissie:
Rector Magnificus
Prof. Dr. A. van Deursen
Prof. Dr. M. Pinzger
Prof. Dr. G. Antoniol
Dr. Alexander Serebrenik
Dr. Cesare Pautasso
Prof. Dr. Ir. D.M. van Solingen
Prof. Dr. Frances Brazier
voorzitter
Delft University of Technology, The Netherlands, promotor
University of Klagenfurt, Austria, promotor
École Polytechnique de Montréal, Canada
Eindhoven University of Technology, The Netherlands
University of Lugano, Switzerland
Delft University of Technology, The Netherlands
Delft University of Technology, The Netherlands
This work was carried out as part of the Re-Engineering Service-Oriented Systems
(ReSOS) project. This project was partially funded by the NWO-Jacquard program
and supported by Software Improvement Group and KPMG.
SERG
Copyright c 2014 by Daniele Romano
Cover image by Craig S. Kaplan, University of Waterloo.
”Pain is inevitable. Suffering is optional.”
Haruki Murakami
Acknowledgments
The 10th of October, 2010 I had my job interview at the Software Engineering
Research Group (SERG) of the Technology University of Delft. It was an unbelievably sunny and warm day and I immediately fell in love with the research
performed at the SERG group, the people, the city of Delft, and the Dutch sun.
After 4 amazing years I can say that the only thing I was wrong was when I
thought "Dutch weather is not that bad". When I started my research in November 2010 all other expectations became reality. I am really happy to have spent
the last 4 year in such a competitive research group with amazing people. I
wish to thank all those who have supported me on this journey starting from
Martin Pinzger and Arie van Deursen who gave me the opportunity to pursue
this PhD.
First of all, I would like to thank my supervisor Martin Pinzger whose
guidance went far beyond my expectations. He always gave me his honest
and professional guidance in performing scientific research. I am very thankful for his enthusiastic and human approach that made him not only a good
supervisor but also a great friend. Thanks a lot Martin! All the time we spent
together discussing about research or simply enjoying leisure time has been
important for my professional and private life and now it is part of me. I will
never forget it. Also, I will never forget the only time when you were not able
to guide me (on the top of an Austrian hill). That was funny!
Furthermore, I would like to thank all my colleagues who have always
provided me with valuable feedback that has been important to improve the
quality of my research. Especially, I would like to thank Andy Zaidman for his
unending willingness in helping me as well as anyone in the group. Thanks
Andy! You are in my list of best people I have met in my entire life.
Finally, I would like to thank all my friends and my family who always
distracted me from my dedication in performing my research activities. This
has been really important even though I have not always been able to disconvii
nect my mind from my research. Especially, I want to thank my family who
accepted my willingness to move abroad and all its consequences. Claudio,
Grazia, Maria Elena, Guido, nonna Elena, zio Tonino I love you all a lot! I am
sure one day I will regret to have spent part of my life abroad to pursue my
professional goals and not with you. Thanks a lot for accepting it. We are a
great family and the geographical distance will never change anything.
Delft,
November 2014
Daniele Romano
viii
Contents
Acknowledgements
vii
1 Introduction
1
1.1
Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.2
Approach and Research Questions . . . . . . . . . . . . . . . . . .
8
1.3
Research Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.5
Origin of Chapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2 Change-Prone Java Interfaces
17
2.1
Interface Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2
The Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3
Empirical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.5
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.6
Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . 36
3 Change-Prone Java APIs
41
3.1
Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2
Empirical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3
Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.4
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
ix
3.5
Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . 60
4 Fine-Grained WSDL Changes
63
4.1
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2
WSDLDiff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.3
Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.4
Conclusion & Future Work . . . . . . . . . . . . . . . . . . . . . . . 78
5 Dependencies among Web APIs
81
5.1
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.2
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.3
Study Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.4
Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.5
Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.6
Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.7
6 Change-Prone Web APIs
99
6.1
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.2
Research Questions and Approach . . . . . . . . . . . . . . . . . . 103
6.3
Online Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.4
Quantitative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.5
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.6
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.7
Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7 Refactoring Fat APIs
125
7.1
Problem Statement and Solution . . . . . . . . . . . . . . . . . . . 127
7.2
Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
7.3
Random and Local Search . . . . . . . . . . . . . . . . . . . . . . . 135
7.4
Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
7.5
Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
7.6
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.7
Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . 148
x
8 Refactoring Chatty web APIs
149
8.1
Problem Statement and Solution . . . . . . . . . . . . . . . . . . . 151
8.2
The Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 156
8.3
Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
8.4
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
8.5
9 Conclusion
167
9.1
Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
9.2
The Research Questions Revisited . . . . . . . . . . . . . . . . . . . 170
9.3
Recommendations for Future Work . . . . . . . . . . . . . . . . . . 175
9.4
Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
Bibliography
179
Summary
197
Samenvatting
201
Curriculum Vitae
205
xi
1.
Introduction
Several years of research on software maintenance have produced numerous
approaches to identify and predict change-prone components in a software
system. Among others, source code metrics and heuristics to detect antipatterns and code smells have been widely validated as indicators of changes.
However, these indicators have been mainly proposed and validated for objectoriented systems. There is still the need to define and validate indicators of
changes for systems implemented in other programming paradigms such as
the service-oriented one.
In recent years there has been a tendency to adopt Service-Oriented Architectures (SOAs) [Josuttis, 2007] in companies and government organizations
for two main reasons. First, SOAs allow companies to organize and use distributed capabilities (i.e., services) that may be under the control of different
organizations or different departments within the same organization [Brown
and Hamilton, 2006]. Second, organizations benefit from the loose coupling
between clients and services. However, clients and services are still coupled
and changes in the services can impact negatively their clients and entire systems. The dependencies removed in SOAs are the dependencies between
clients and the underlying technologies used to implement services. Clients
and services are still coupled through function coupling and data structure coupling [Daigneau, 2011]. In fact, clients depend 1) on the functionalities implemented by services (i.e., function coupling) and 2) on the data structures that
a service’s instance receives and returns (i.e., data structure coupling). Both
are specified in its interface, that we refer to as web API throughout this thesis.
For this reason web APIs are considered contracts between clients and service providers and they should remain as stable as possible [Daigneau, 2011;
Murer et al., 2010]. However, like any other software component, services
evolve to satisfy changing or new functional and non functional requirements.
In this PhD research we investigate quality indicators that can highlight
change-prone web APIs. Web APIs can be split into two main categories
1
2
Chapter 1. Introduction
SOAP/WSDL (WS-*) APIs and REST APIs [Pautasso et al., 2008]. In this dissertation we focus on SOAP/WSDL APIs [Alonso et al., 2010]. First, we investigate which indicators can highlight change-prone APIs. Changes in the implementation logic can cause changes in the web APIs, especially when legacy
APIs are made available through web APIs. Then, we analyze indicators that
highlight change-prone web APIs. Finally, based on design practices that can
cause changes in APIs and web APIs we propose techniques to automatically
refactor them.
In this introductory chapter, first, we present services, their history, and the
importance of designing and implementing stable web APIs (Section 1.1). In
Section 1.2 we present the research approach, the research questions, and the
contributions of this PhD thesis. In Section 1.3 we show the research method
used to answer our research questions. Section 1.4 discusses the related work.
Finally, we present the outline of this thesis and we present the peer reviewed
publications on which the chapters of this thesis are based (Section 1.5).
1.1
Services
The term service has been introduced to refer to software functions that carry
out business tasks. Business tasks include tasks such as providing access to files
and databases, performing functions like authentication or logging, bridging
technological gaps, etc. Services can be implemented using many technologies that range from the older CORBA and DCOM to the newer REST and
SOAP/WSDL technologies. Services have become popular and are widely used
to ease the integration of heterogeneous systems. In fact, the main goal of
services is to share business tasks across systems that run on different hardware platforms (e.g., Linux, Windows, Mac OS, Android, iOS) and are implemented through different software frameworks and programming languages
(e.g., Java, .NET, Objective-C).
1.1.1
Software Integration with Web Services
The benefits of using services instead of other software components to ease
integration is well discussed in the book Service Design Patterns by Daigneau
[Daigneau, 2011].
Objects have been the first components used for integrating business tasks
across different software systems [Daigneau, 2011]. An object (e.g., a Java
class) can encapsulate business functions or data and it can be reused in different software systems. To reuse an object developers instantiate it and access
their business tasks invoking their methods. The main problem of objects is
1.1. Services
3
Figure 1.1: Components are reused through platform specific interfaces.
Taken from [Daigneau, 2011].
that it is challenging to reuse them in software systems implemented with
different programming languages.
To overcome this problem component technologies have been proposed.
Components are deployable binary software units that can be easily integrated
into software systems implemented in different programming languages. The
business tasks encapsulated into them are accessible through binary interfaces
that describe their methods, attributes, and events as shown in Figure 1.1.
Unlike with objects, developers do not have access to the internals of the components but only to their interfaces. The interfaces, however, are described
through platform-specific languages (e.g., Microsoft Interface Definition Language). While reusing components within systems implemented in different
programming languages is easy, developers are now constrained to reuse components in specific platforms (e.g., Microsoft computing platforms).
To address this problem objects have been deployed on servers allowing
clients to access their business tasks invoking their methods remotely (Figure 1.2). Distributed objects can be reused by different software systems independently of the platforms on which clients and objects are deployed. The
most popular technologies to invoke distributed objects are CORBA, DCOM,
Java Remote Method Invocation (RMI), and .NET Remoting. As shown in
Figure 1.2 a client invokes the remote object through a proxy. The proxy
forwards the invocation to a stub that is deployed on the distributed object’s
server. Then, the stub is responsible of invoking the distributed object.
This design pattern has its drawbacks as well. First, the implementation is
4
Figure 1.2: Distributed objects invoked over a network by their clients. Taken
from [Daigneau, 2011].
not easy for developers. The serialization and deserialization of the messages
exchanged is not standardized. As a consequence, the design pattern works
well if both client and server use the same technologies to create the channel. Otherwise, technical problems can arise frequently. Other problems are
due to the fact that servers maintain states between client calls that can be
extremely expensive. Maintaining states requires to implement proper techniques to perform effectively load-balancing and it can cause a degradation of
the server memory utilization with an increasing number of clients.
Web services have been conceived to solve the aforementioned problems
of the local objects, components, and distributed objects. They provide a standard means of interoperating between different software applications, running on
a variety of platforms and/or frameworks and based on "stateless" interactions
in the sense that the meaning of a message does not depend on the state of the
conversation [W3C, 2004]. The W3C has defined a web service as a software
system designed to support interoperable machine-to-machine interaction over a
network. It has an interface described in a machine-processable format (specifically WSDL). Other systems interact with the web service in a manner prescribed
by its description using SOAP messages, typically conveyed using HTTP with an
XML serialization in conjunction with other Web-related standards.
To reach a high portability between different platforms the W3C defined a
Web Services Architecture Stack based on XML languages that offers standard,
flexible and inherently extensible data formats. Among these languages SOAP
(Simple Object Access Protocol) and WSDL (Web Services Description Language)
are the core languages to invoke a web service and to describe its interface.
SOAP is a protocol that specifies the data structures of the messages exchanged
with the web services and auxiliary data structures to represent other information such as header information or error information occurred while processing the message. WSDL describes the web services’ interface in terms of
1) operations exposed in web services, 2) addresses or connection endpoints
1.1. Services
5
to web services, 3) protocols to bind web services, 4) operations and messages
to invoke web services. Note that WSDL interfaces can be mapped to any implementation language, platform, object model, or messaging system. As a
consequence a WSDL interface is a contract between web services providers
and its clients that hides the implementation behind the web services.
The architectural styles of the SOAP/WSDL web services is also known
as RPC (Remote Procedure Call) API highlighting the fact that clients invoke
procedures over a network. However, the W3C defined another architectural
style for web services called Resource API. According to this style, web services
exposes resources (e.g., text files, media files) and not actions like in the RPC
API style. Clients have access and can manipulate these resources through
representations (e.g., XHTML, XML, JSON). When a client receives a resource
representation from a web service it receives the current state of the resource.
If it sends a representation of the resource to a web service it possibly alters
the state of the resource. For this reason this architectural style is also known
as Representational State Transfer or REST APIs [Fielding, 2000]. Resource
APIs use HTTP as application protocol. Specifically, PUT is used to create
or update resources, GET is used to retrieve a resource representation, and
DELETE removes a resource. For a detailed comparison between SOAP/WSDL
web services and REST APIs we refer the reader to the work by [Pautasso
et al., 2008]
1.1.2
Change-Proneness of Web APIs
Using web services allows software engineers to reduce coupling between distributed components and, hence, eases the integration among such components. Web services eliminate the dependencies between the clients and the
underlying technologies used by a web service. Eliminating dependencies on
technologies reduces the coupling but it does not decouple completely clients
and web services. There are still four different levels of coupling, namely, the
function coupling, the data structure coupling, the temporal coupling, and the
URI coupling [Daigneau, 2011]. For more details on coupling in service oriented system we refer the reader to the work by [Pautasso and Wilde, 2009].
First of all, clients invoke web services to execute a business task (i.e., RPC
API) or to retrieve, update, create, or delete a resource (i.e., Resource API).
Clients depend indirectly on the business logic implemented by web services.
This coupling is called function coupling. Second, the clients depend on the
data structures used to invoke a web service and to receive the results of the
invocations. These data structures are defined in the API of web services that
we refer to as web API throughout this book. This dependency is also known as
6
data structure coupling. Third, clients and web services are coupled through
temporal coupling. This level of coupling indicates that the web service should
be operational when a client invokes it. Finally, clients are coupled to the web
services URIs (i.e., URI coupling). As a consequence, clients depend on the
implementations, the web APIs, the reliability, and the URIs of web services.
Changes to these four factors are problematic for clients and they can break
them.
In this PhD thesis we investigate the change-proneness of APIs and web
APIs, focusing on SOAP/WSDL APIs [Alonso et al., 2010]. We decided to
focus on web APIs, and hence on data structure coupling, because they are
considered contracts between clients and web services specifying how they
should interact. One of the key factors for deploying successful web APIs is
assuring an adequate level of stability. Changes in a web API might break
the clients’ systems forcing them to continuously adapt them to new versions
of the web API. For this reason, assessing the stability of web APIs is key to
reduce the likelihood of continuous updates.
1.1.3
Performance of Web APIs
During the frequent discussions with industry, practitioners kept repeating that
performance issues are one of the causes that lead web APIs to be changed.
Web APIs are invoked over a network and, hence, the latency can be significantly higher than calling a similar web API when it is deployed on the same
machine than the client. When a client invokes a method of a web API the
request should be serialized in a stream of bytes, transmitted over a network,
deserialized on the server side, and dispatched to the web services. The same
steps should be executed when the web service returns the results back to
the client. As a consequence designers should pay attention in designing a
web API that can execute a use case with the lowest number of messages exchanged between clients and web services. To reduce the latency designers
should usually prefer web APIs that exchange few chunky messages instead of
many smaller messages [Daigneau, 2011]. In this way they can avoid chatty
conversations that increase the latency.
To better understand this problem consider the redesign of web APIs adopted
at Netflix [Jacobson, 2012] and shown in Figure 1.3. At the beginning of its
history Netflix had adopted an one-size-fit-all (OSFA) Rest API approach to
provide its services to its clients. This approach is shown in Figure 1.3a. According to this approach there is a unique Rest API invoked by all the different
clients. To satisfy the requirements of all clients this API requires a large number of interactions with clients that should invoke multiple times the API to
1.1. Services
7
Network Border Network Border REST API AUTH SIMILAR MOVIES RATINGS MEMBER DATA MOVE DATA (a) One-size-fit-all (OSFA) Rest API approach at Netflix. Each client
should invoke multiple times the single Rest API. Taken from [Jacobson, 2012].
Network Border Network Border JAVA API AUTH SIMILAR MOVIES RATINGS MEMBER DATA MOVE DATA (b) Each client invokes once its specifically designed Rest API reducing the chattiness and improving the latency. Taken from [Jacobson,
2012].
Figure 1.3: Web APIs redesign at Netflix [Jacobson, 2012].
8
execute a use case, as shown by the colored arrows in Figure 1.3a. Among
other issues, this approach degrades the performance since network calls are
expensive. To overcome this problem engineers at Netflix have adopted a new
approach (shown in Figure 1.3b) reducing the latency in some cases by several
seconds. In this new approach, the clients make a single chunky request (black
arrow in Figure 1.3b) to their specific endpoint designed to handle the request
of a specific client. As of consequence each different client has its own web
API with which it interacts with a single request per use case. These ad hoc
web APIs communicate locally with a fine-grained Java API. The functionality
of this API is similar to the original Rest APIs. However, its fine-grained methods are invoked locally while the clients perform only a single remote request.
Thanks to this new approach engineers at Netflix have reduced the chattiness
of their web API and improved considerably the latency. This story shows
the relevance to design web APIs with an adequate granularity. Inadequate
granularity can cause performance issues that force web APIs to be changed.
1.2
Approach and Research Questions
The work presented in this PhD thesis is part of the work performed within
the ReSOS (Re-enegineering Service-Oriented Systems) project. ReSOS began
in November 2010 and it is aimed at improving the quality of service-oriented
systems. However, the term quality is generic and it includes many different
quality attributes (e.g., reliability, efficiency, security, maintainability). Based
on the literature (e.g., [Daigneau, 2011]) and on the frequent discussions with
our industrial partners (KPMG1 and SIG2 ) and collaborators we found that the
stability of web APIs is crucial for designing and maintaining service-oriented
systems. As discussed in the previous section, web services became a popular
means to integrate software systems that may belong to different organizations. As a consequence, web APIs are considered contracts [Daigneau, 2011;
Murer et al., 2010] for integrating systems and they should stay as stable as
possible.
Based on these discussions with our industrial partners and collaborators,
we set up a research approach consisting of the following research tracks:
• Track 1: Analysis of change-prone APIs
• Track 2: Analysis of change-prone web APIs
• Track 3: Refactoring of change-prone web APIs
1
2
http://www.kpmg.com/
http://www.sig.eu/en/
1.2. Approach and Research Questions
1.2.1
9
Track 1: Change-Prone APIs
Each web service is implemented by an implementation logic that is hidden
to its clients through its web API. Changes to the implementation logic can
be propagated and affect the web API. Among all the software units composing the implementation logic, APIs are likely to be mapped directly into web
APIs. This scenario happens especially when a legacy API is made available
through a web service. For this reason, in the first track we analyze the changeproneness of APIs, where we refer to API as the set of public methods declared
in a software unit. To perform this study we use existing techniques to mine
software repositories and to extract changes performed in the APIs. We then
analyze whether there is a correlation between the amount of changes an API
undergoes and the values of source code metrics and/or the presence of antipatterns. The outcome of this track will consist in a set of quality indicators
(e.g., heuristics and software metrics) that can highlight change-prone APIs
and assist software engineers to design stable APIs. In our context, these indicators are particularly useful to check the stability of APIs when they are
mapped directly to web APIs.
To investigate change-prone APIs, we first focus on the following research
question:
Research Question 1: Which software metrics do indicate change-prone APIs?
We investigate this research question in Chapter 2 by empirically investigating the correlation between source code metrics and the number of finegrained source code changes performed in the interfaces of ten Java opensource systems. Moreover, we use the metrics to train prediction models used
to predict change-prone Java interfaces.
In Chapter 3 we answer our second research question:
Research Question 2: What is the impact of antipatterns on the
change-proneness of APIs?
Previous studies showed that classes with antipatterns change more frequently
than classes without antipatterns. In Chapter 3 we answer this research questions by extending these studies and taking into account fine-grained source
code changes extracted from 16 Java open-source systems. In particular we
investigate: (1) whether classes with antipatterns are more change-prone
than classes without; (2) whether the type of antipattern impacts the changeproneness of Java classes; and (3) whether certain types of changes are per-
10
formed more frequently in classes affected by a certain antipattern. Performing this analysis we retrieve the set of antipatterns that are more correlated
with changes performed in APIs.
1.2.2
Track 2: Change-Prone Web APIs
The second track consists of investigating the change-proneness of web APIs
through the analysis of their evolution. This analysis can help us in identifying
bad design practices that can increase the probability that a web API will be
changed in the future.
To perform this study we detect and extract changes performed in web
APIs. This task is performed by a tool that compares two subsequent versions
of a web API and extracts changes taking into account the syntax of the web
API specification. In this way we can extract the type of a change performed
in the interface as they have been classified in [Leitner et al., 2008]. Knowing
the type of a change is particularly useful for two reasons. First, we can see
which element is affected by the change and how it changes. Second, we can
classify the changes depending on the impact they can have on the clients. In
fact, changes can be divided into breaking changes and non-breaking changes
depending on whether web service client developers need to update their code
or not [Daigneau, 2011].
Once we are able to extract and classify changes we investigate heuristics
and software metrics that can be used as indicators of change-prone web APIs.
Similar to Track 1, we then investigate the correlation between them and the
changes performed in the web API.
To perform such analysis, we first need a tool to extract fine-grained changes
among different version of a web API. Then, such analysis might require a
tool to track the dependencies among web APIs. As already described in Section 1.1.2, even though services are loosely coupled they are still coupled
through function and data structure coupling. Coupling can be a good quality indicator in service-oriented systems like it has been already proved for
systems implemented in other programming paradigms. We expect that a service with a higher incoming and outgoing coupling can show a higher response
time. However, measuring coupling in service-oriented systems is more challenging than for systems implemented in other paradigms. This is mainly due
to the dynamic and distributed nature of service-oriented systems.
Besides coupling, we analyze other attributes that can affect change-proneness
such as cohesion. We argue that a web API should be cohesive to prevent
changes in the future. A low cohesive web API can affect the comprehension
of the web API resulting in a lower reusability. Moreover, a web API with
11
different responsibilities can be a bottleneck that can affect response time because of the different clients invoking it.
To analyze the impact of these quality attributes on change-proneness we
analyze existing antipatterns defined in literature and described in Section 1.4.
The outcome of this study consists of a set of heuristics and metrics that can
assist software engineers in designing web APIs that are less change-prone.
To perform this track, first we implement a tool to extract fine-grained
changes between different versions of web APIs and we answer the following
research question:
Research Question 3: How can we extract fine-grained changes among
subsequent versions of web APIs?
We answer this research question in Chapter 4 by proposing a tool called
WSDLDiff able to extract fine-grained changes from subsequent versions of a
web API defined in WSDL. In contrast to existing approaches, WSDLDiff takes
into account the syntax of WSDL and extracts the WSDL elements affected by
changes and the types of changes. We show a first study aimed at analyzing
the evolution of web APIs using the fine-grained changes extracted from the
subsequent versions of four real world WSDL APIs. Based on the results of this
study web service subscribers can highlight the most frequent types of changes
affecting a WSDL API. This information is relevant to assess the risk associated
to the usage of web services and to subscribe to the most stable ones.
As second step in Track 2 we propose a portable approach to infer the dynamic dependencies among web services at run time answering the following
research question:
Research Question 4: How can we mine the full chain of dynamic
dependencies among web services?
We answer this research question in Chapter 5 by proposing an approach
able to extract dynamic dependencies among web services. The approach is
based on vector clocks, originally conceived and used to order events in distributed environments. We use vector clocks to order web service executions
and to infer causal dependencies among web services. We show the feasibility of the approach by implementing it into the Apache CXF framework3
and instrumenting SOAP messages. Moreover, we show two experiments to
investigate the impact of the approach on the response time.
3
http://cxf.apache.org
12
Finally, we conclude Track 2 and investigate the change-proneness of web
APIs answering the following research question:
Research Question 5: What are the scenarios in which developers change web
APIs with low internal and external cohesion?
We address this research question in Chapter 6. We present a qualitative
and quantitative study of the change-proneness of web APIs with low external and internal cohesion. The internal cohesion measures the cohesion of
the operations (also referred as methods) declared in a web API. The external
cohesion measures the extent to which the operations of a web API are used
by external consumers (also called clients). First, we report on an online survey to investigate the maintenance scenarios that cause changes to web APIs.
Then, we define an internal cohesion metric and analyze its correlation with
the changes performed in ten well known WSDL APIs. The goal of the study is
to provide several insights into the interface, method, and data-type changeproneness of web APIs with low internal and external cohesion. The choice
of focusing on internal and external cohesion, instead of other attributes, is
based on our previous and related work and discussed in Chapter 6.
1.2.3
Track 3: Refactoring Web APIs
Track 1 and Track 2 give useful insights into the change-proneness of APIs
and web APIs. Based on the findings of these tracks in Track 3 we investigate
techniques to assist software engineers in refactoring change-prone web APIs.
Among all change-prone indicators found in Track 1 and Track 2 we focus on
external cohesion and we define techniques to refactor web APIs with low external cohesion. We focus on this attribute because it highlights both changeprone APIs (Track 1) and change-prone web APIs (Track 2). Web APIs, and
in general APIs, with low external cohesion can be refactored through the Interface Segregation Principle [Martin, 2002]. As a consequence, as first step
in this track, we use search based software engineering techniques to refactor
APIs with low external cohesion answering the following research question:
Research Question 6: Which search based techniques can be used to apply the
Interface Segregation Principle?
We answer this research question in Chapter 7. We formulate the problem
of applying the Interface Segregation Principle as a multi-objective clustering
problem and we propose a genetic algorithm to solve it. We evaluate the
capability of the proposed genetic algorithm with 42,318 public Java APIs
13
whose clients’ usage has been mined from the Maven repository. The capability
of the genetic algorithm is then compared with the capability of other search
based approaches (i.e.,, random and simulated annealing approaches).
The last part of this track consists in refactoring fine-grained web APIs (i.e.,
chatty APIs). As discussed in Section 1.1.3 fine-grained APIs can be changed
over time to improve the performance and to reduce the number of remote
invocations. In this part we answer the following research question:
Research Question 7: Which search based techniques can transform a
fine-grained APIs into multiple coarse-grained APIs reducing the total number of
remote invocations?
In Chapter 8 we answer this research question by proposing a genetic algorithm that mines the clients’ usage of web service operations and suggests
Façade web services whose granularity reflects the usage of each different type
of clients. These Façade web services can be deployed on top of the original
web service and they become contracts for the different types of clients satisfying the Consumer-Driven Contracts pattern [Daigneau, 2011]. According
to this pattern the granularity of a web API, in terms of exposed operations,
should reflect the clients’ usage.
1.2.4
Contributions
The contributions of this PhD research can be summarized as follows:
• A set of validated quality indicators, comprising metrics and heuristics,
to highlight change-prone APIs;
• A set of validated quality indicators, comprising metrics, heuristics, techniques, and tools to highlight change-prone web APIs;
• A tool to mine fine-grained changes between different versions of a web
API;
• An approach to infer the dynamic dependencies among web service at
run time;
• An approach to refactor web APIs, and in general APIs, with low external
cohesion applying the Interface Segregation Principle;
• An approach to refactor fine-grained web APIs into coarse-grained web
APIs with a lower number of required remote invocations.
14
1.3
Research Method
Our research has been done in close collaboration with our industrial partners
and collaborators following an industry-as-laboratory approach [Potts, 1993].
The involvement of the industry in our research is crucial to address challenges
faced by practitioners and develop techniques and tools capable of assisting
them in solving real world problems. Frequent discussions allowed us to focus
on their main problems and agree on sustainable solutions. As a consequence,
all the problems addressed in this thesis have arisen from these discussions
with the industrial parties. This step has been particularly useful to define
the aforementioned research questions and the directions of our research (i.e.,
change-proneness of APIs and web APIs).
To answer our research questions we used different research methods.
Research questions 1, 2, and 5 are aimed at validating indicators that highlight change-prone APIs and web APIs. They have been mainly answered performing quantitative studies based on mining software repositories techniques
[Kagdi et al., 2007] and using statistics [Sheskin, 2007] and machine learning
techniques [Witten and Frank, 2005]. We performed these studies analyzing
open source systems from different domains. The reason behind the choice of
these systems is two-fold. First, industrial parties are reluctant to release their
systems’ repositories and to allow public discussions about them. Second, using open source systems allows other researchers to compare our findings with
theirs and also to verify and extend our work. Whenever the available data
was not enough to draw statistical conclusions (i.e., Research Question 5) we
followed a mixed-methods approach [Creswell and Clark, 2010] which is a
combination of quantitative and qualitative methods. In this case the results
of statistical tests are complemented with an online survey [Floyd J. Fowler,
2009].
The remaining research questions are aimed at validating approaches to
analyze service oriented systems (i.e., research question 3 and 4) and to refactor APIs and web APIs (i.e., research questions 6 and 7). Whenever the available data was not enough to validate these approaches (i.e., Research question 4 and 7) we use synthetic data and performed controlled experiments
[Wohlin et al., 2000]. The approaches used to refactor APIs and web APIs
have been implemented and evaluated with state of the art search-based techniques [Harman et al., 2012].
1.4. Related Work
1.4
15
Related Work
In this section we present an overview of related work while the main chapters
of this PhD thesis provide more details.
Many studies (e.g., [Perepletchikov et al., 2010; Moha et al., 2012; RotemGal-Oz, 2012; Král and Zemlicka, 2007]) propose quality indicators for serviceoriented systems. However, these indicators have been poorly validated mainly
because of the lack of availability of such systems.
Perepletchikov et al. [2010, 2006] defined a set of cohesion and coupling
metrics for service-oriented systems. They analyzed cohesion in the context of
web services and proposed four different types of cohesion metrics for measuring analyzability [Perepletchikov et al., 2010]. Furthermore, they proposed
three different coupling measures for web services and they showed their impact on maintainability [Perepletchikov et al., 2006].
The most recent work on web services antipatterns has been proposed by
Moha et al. [2012]. They proposed an approach to specify and detect an extensive set of antipatterns that encompass concepts like granularity, cohesion
and duplication. Their tool is capable of detecting the most popular web services antipatterns defined in literature. Besides these antipatterns, they specified three more antipatterns, namely: bottleneck service, service chain and data
service. Bottleneck service is a web service used by many web services and it
is affected by a high incoming and outgoing coupling that can affect response
time. Service chain appears when a business task is achieved by a long chain
of consecutive web services invocations. Data service is a web service that performs simple information retrieval or data access operations that can affect
the cohesion.
Rotem-Gal-Oz [2012] defined the knot antipattern as a set of low cohesive web services which are tightly coupled. This antipattern can cause low
usability and high response time.
The sand pile defined by Král and Zemlicka [2007] appears when many
fine-grained web services share common data that may be available through a
web service affected by the data service antipattern.
Cherbakov et al. [2006] proposed the duplicate service antipattern that affect services sharing similar methods and that can cause maintainability issues.
Dudney et al. [2002] defined a set of antipatterns for J2EE applications.
Among these we investigate the multi service, tiny service and chatty service antipatterns. The multi service is a service that provides different business operations that are low cohesive and can affect availability and response time. Tiny
services are small web services with few methods that are used together. This
16
antipattern can affect the reusability of such services. Finally the chatty service
antipattern affects services that communicate with each other with small data.
This antipattern can affect the response time.
All the aforementioned studies suggest and detect antipatterns for designing web APIs but they do not investigate the effects of these antipatterns on
the change-proneness and do not suggest techniques to refactor web APIs.
1.5
Origin of Chapters
The chapters of this thesis have been published before as peer-reviewed publications or are under review. As a consequence they are self-contained and,
hence, they might contain some redundancy in the background, motivation,
and implication sections.
The author of this thesis is the main contributor of all chapters and all publications have been co-authored by Martin Pinzger. The following list provides
an overview of these publications:
Chapter 2 was published in the 27th International Conference on Software
Maintenance (ICSM 2011) [Romano and Pinzger, 2011a].
Chapter 3 was published in the 19th Working Conference on Reverse Engineering (WCRE 2012) [Romano et al., 2012].
Chapter 4 was published in the 19th International Conference on Web Services
(ICWS 2012) [Romano and Pinzger, 2012].
Chapter 5 was published in the 4th International Conference on Service Oriented Computing and Application (SOCA 2011) [Romano et al., 2011].
Chapter 6 is currently under review and published as technical report [Romano et al., 2013].
Chapter 7 was published in the 30th International Conference on Software
Maintenance and Evolution (ICSME 2014) [Romano et al., 2014].
Chapter 8 was published in the 10th World Congress on Services (Services
2014) [Romano and Pinzger, 2014].
2.
Change-Prone Java Interfaces
Recent empirical studies have investigated the use of source code metrics to
predict the change- and defect-proneness of source code files and classes. While
results showed strong correlations and good predictive power of these metrics,
they do not distinguish between interface, abstract or concrete classes. In particular, interfaces declare contracts that are meant to remain stable during the
evolution of a software system while the implementation in concrete classes is
more likely to change.
This chapter aims at investigating to which extent the existing source code
metrics can be used for predicting change-prone Java interfaces. We empirically investigate the correlation between metrics and the number of fine-grained
source code changes in interfaces of ten Java open-source systems. Then, we
evaluate the metrics to calculate models for predicting change-prone Java interfaces.
Our results show that the external interface cohesion metric exhibits the
strongest correlation with the number of source code changes. This metric also
improves the performance of prediction models to classify Java interfaces into
change-prone and not change-prone.1
2.1
Interface Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2
The Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3
Empirical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.5
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.6
Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . 36
Software systems are continuously subjected to changes. Those changes
are necessary to add new features, to adapt to a new environment, to fix bugs
1
This chapter was published in the 27th International Conference on Software Maintenance
(ICSM 2011) [Romano and Pinzger, 2011a]
17
18
Chapter 2. Change-Prone Java Interfaces
or to refactor the source code. However, the maintenance of software systems
is also risky and costly.
Several approaches have been developed to optimize the maintenance activities and reduce the costs. They range from automated reverse engineering
techniques to ease program comprehension to prediction models that can help
identifying the change- and defect-prone parts in the source code. Developers should focus on understanding these change- and defect prone parts in
order to take appropriate counter measures to minimize the number of future
changes [Girba et al., 2004].
Many of these prediction models have been developed using source code
metrics, such as by Briand et al. [2002], Subramanyam and Krishnan [2003],
and Menzies et al. [2007]. While those prediction models showed good performance, they work on file and class level. None of them takes the kind
of class into account, whether it is a concrete class, abstract class, or interface
that is change- or defect-prone. We believe that changes in interfaces can have
a stronger impact than changes in concrete and abstract classes, and should
therefore be treated separately. Interfaces are meant to represent contracts
among modules and logic units in a software system. For this reason, they
are supposed to be more stable to avoid contract violations and to reduce the
effort to maintain a software system.
In this chapter, we focus on Java interfaces and investigate the predictive
power of various source code metrics to classify Java interfaces into changeprone and not change-prone. Concerning the source code metrics, we take into
account (1) the set of metrics defined by Chidamber and Kemerer [1994];
(2) a set of metrics to measure the complexity and the usage of interfaces;
and (3) two metrics to measure the external cohesion of Java interfaces. The
number of fine-grained source code changes (#SCC), as introduced by Fluri
et al. [2007], is used to distinguish between change-prone and not changeprone interfaces.
We selected the Chidamber and Kemerer (C&K) metrics suite because it is
widely used and it has been validated by several approaches, such as [Rombach, 1987], [Li and Henry, 1993], [Basili et al., 1996]. The two external
cohesion metrics are Interface Usage Cohesion (IUC) and a clustering metric.
These metrics are meant as heuristics to indicate violations of the Interface
Segregation Principle (ISP) as described by Martin [2002]. We believe that
the violation of the ISP can impact the maintenance of interfaces and the software system as a whole. The complexity and usage metrics for interfaces have
been added to provide a broader set of interface metrics for our study.
To investigate our claim, we perform an empirical study with the source
2.1. Interface Metrics
19
code and versioning data of ten Java open source systems, namely: eight plugin projects from the Eclipse platform, Hibernate2 and Hibernate3. In the study,
we address the following two research hypotheses:
• H1: IUC has a stronger correlation with the #SCC of interfaces than the
C&K metrics
• H2: IUC can improve the performance of prediction models to classify
Java interfaces into change- and not change-prone
The results show that most of the C&K perform well for predicting changeprone concrete and abstract classes but are limited in predicting change-prone
Java interfaces, therefore confirming our claim that interfaces need to be
treated separately. The IUC metric exhibits the strongest correlation with
#SCC of Java interfaces and proves to be an adequate metric to compute
prediction models for classifying Java interfaces.
The remainder of this chapter is organized as follows. Section 2.1 discusses
the C&K metrics and their effectiveness when used for measuring the size
and complexity of interfaces. We furthermore introduce the IUC metric and
several other interface complexity and usage metrics. Section 2.2 describes
the approach used to measure the metrics and to mine the fine-grained source
code changes from versioning repositories. The empirical study and results
are presented in Section 2.3. Section 2.4 discusses the results and threats to
validity. Related work is presented in Section 2.5. We draw our conclusions
and outline directions for future work in Section 2.6.
2.1
Interface Metrics
In this section, we present the set of source code metrics used in our empirical
study. We furthermore discuss their applicability to measure the size, complexity, and cohesion of Java interfaces. We then present the IUC metric and
motivate its application to predict change-prone interfaces. At the end of the
section, we list additional metrics to measure the complexity and the usage
of interfaces. Those metrics are meant to provide further validation of the
predictive power of the IUC metric.
2.1.1
Object-Oriented Metrics & Interfaces
Among the existing product metrics [Henderson-Sellers, 1996], we focus on
the object-oriented metrics introduced by Chidamber and Kemerer [1994].
They have been widely used as quality indicators of object-oriented software
systems. These metrics are:
20
• Coupling Between Objects (CBO)
• Lack of Cohesion Of Methods (LCOM)
• Number Of Children (NOC)
• Depth of Inheritance Tree (DIT)
• Response For Classes (RFC)
• Weighted Methods per Class (WMC)
We selected the C&K metrics mainly because prior work demonstrated
their usefulness for building models for change prediction, e.g., [Li and Henry,
1993] [Zhou and Leung, 2007], as well as defect prediction, e.g., [Basili et al.,
1996]. In the following, we briefly describe each metric and discuss its application to interfaces.
Coupling Between Objects (CBO)
The CBO metric represents the number of data types a class is coupled with.
More specifically, it counts the unique number of reference types that occur
through method calls, method parameters, return types, exceptions, and field
accesses. If applied to interfaces, this metric is limited to method parameters,
return types and exceptions leaving out method calls and field accesses.
Lack of Cohesion Of Methods (LCOM)
The LCOM metric counts the number of pairwise methods without any shared
instance variable, minus the number of pairwise methods that share at least
one instance variable. More precisely, the LCOM metric revised in [HendersonSellers et al., 1996] is defined as:
P

a
1
µ
Aj
−m
j=1
a
LCOM =
1−m
where a represents the number of attributes of a class, m the number of methods, and µ(A j ) the number of methods which access each attribute A j of a
class. Perfect cohesion is defined as all methods accessing all variables, in
that case the value of LCOM is 0. In contrast, if all methods do not share any
instance variable, the value of LCOM is 1.
The LCOM metric is not applicable to interfaces since interfaces do not
contain logic and consequently attribute accesses. For instance, the commercial metric tool Understand2 outputs either 0 or 1 as values for LCOM for an
2
http://www.scitools.com/
21
interface. The value 1 denotes that the interface also contains the definition
of constant attributes, otherwise the value for LCOM is 0. This limits the use
of LCOM for computing prediction models.
Weighted Methods per Class (WMC)
WMC is the sum of the cyclomatic complexities of all methods declared by a
class. Formally, the metric is defined as:
W MC =
n
X
ci
i=1
where ci is the cyclomatic complexity of the ith method of a class. In case of
Understand, this metric corresponds to the Number Of Methods (NOM), since
the complexity of each method declared in an interface is 1. In case of the
Metrics tool3 this metric is always 0 for interfaces. This limits the predictive
power of this metric for predicting change-prone interfaces.
Number Of Children (NOC)
The NOC metric counts the number of directly derived classes of a class or
interface. Even though this metric is sound for interfaces, we argue that its
application for predicting change prone interfaces is limited. The main reason
being that interfaces inherit only the type definition (i.e., sub-typing) while
abstract classes and concrete classes also inherit the business logic.
Depth of Inheritance Tree (DIT)
The DIT metric denotes the length of the longest path from a sub-class to
its base class in an inheritance structure. The idea behind the usage of DIT
as change-proneness indicator is that classes contained in a deep inheritance
structure are more likely to change (e.g., changes in a super-class cause changes
in its sub-classes). Similar to NOC, we believe that this metric is more useful
for abstract and concrete classes than for interfaces.
Response For Classes (RFC)
The RFC metric counts the number of local methods (including inherited methods) of a class. This metric remains valid for interfaces, but it is close to the
WMC metric since the only added information is the count of the inherited
method.
In summary, while most of the C&K metrics are adequate metrics for abstract and concrete classes they are not as powerful for interfaces. Moreover,
3
http://metrics.sourceforge.net/
22
these metrics fall short in expressing the cohesion of interfaces, therefore we
introduce the two external cohesion metrics as presented in the following section.
2.1.2
External Cohesion Metrics of Interfaces
Developers should not design fat interfaces that are interfaces whose clients invoke different methods. This problem has been formalized in the Interface Segregation Principle (ISP) described by Martin [2002]. The ISP principle states
that fat interfaces need to be split into smaller interfaces according to the
clients of an interface. Any client should only know about the set of methods provided by an interface that are used by the client. In literature the lack
of conformance to the ISP principle is mainly associated to a higher risk for
clients to change when an interface is changed. To the best of our knowledge
there exists no empirical evidence that underlines this association.
In order to measure the violation of the ISP principle, we use two cohesion
metrics: the external cohesion metric for services called Service Interface Usage
Cohesion (SIUC) taken from Perepletchikov et al. [Perepletchikov et al., 2007,
2010] and a clustering metric.
In the following, we refer to the SIUC metric as Interface Usage Cohesion
(IUC) because we apply it in the context of object-oriented systems. The metric
is defined as:
Pn used_methods( j,i)
I U C(i) =
j=1 num_methods(i)
n
where j denotes a client of the interface i; used_methods (j,i) is the function
which computes the number of methods defined in i and used by the client
j; num _methods(i) returns the total number of methods defined in i; and n
denotes the number of clients of the interface i. The external cohesion defined
by Perepletchikov et al., and hence the IUC metric, states that there is a strong
external cohesion if every client uses all methods of an interface. We argue
that interfaces with strong external cohesion (the value of IUC is close to one)
are less likely to change. On the other hand, when there is a high lack of
external cohesion (the value of IUC is close to zero) the interface is more
likely to change due to the larger number of clients.
Consider the example in Figure 2.1a that shows an interface for providing
bank services. The service is used by two different clients, namely the Professional Client and the Student Client. The two clients share only one interface
method, namely the method accountBalance(). Since this method is shared by
two different clients, it is more likely to change to satisfy the requirements of
23
(a) Different clients share a method
(b) Different clients do not share any methods
Figure 2.1: An example of lack of external cohesion
the different clients. The design of the BankServices interface does not conform
(3/4+2/4)
to ISP. The value of IUC for this interface is
= 5/8.
2
Consider another example depicted in Figure 2.1b. It shows the same
interface, except the shared method accountBalance() has been split into two
different methods to serve the two different clients. The design of the interface
still violates the ISP and changes in the clients can lead to changes in the
interface. In fact, the clients depend upon methods that are not used, and
the implementing classes implement methods that are not needed. The IUC
3/5+2/5
= 1/2 which denotes a lower cohesion compared to
of this interface is
2
the interface in Figure 2.1a. The lower cohesion is mainly due to the higher
number of methods, namely 5.
Another heuristic to measure the external cohesion is the ClusterClients(i)
24
metric. This metric counts the number of clients of an interface i that do not
share any method. Higher values for this metric indicate lower cohesion. For
the interface in Figure 2.1a the value of ClusterClients is 0 and for the interface
in Figure 2.1b the value is 2. We use this metric to investigate whether the
contribution of the shared methods, as computed by the IUC metric, is relevant
to predict change-prone interfaces.
2.1.3
Complexity and Usage Metrics for Interfaces
In addition to the object-oriented metrics we validate the IUC metric against
several other metrics defined to measure the complexity and usage of an interface. The complexity metrics are:
• NOM(i): counts the number of methods declared in the interface i;
• Arguments(i): counts the total number of arguments of the declared
methods in the interface i;
• APP(i): measures the mean size of method declarations of an interface
i and is equal to Arguments(i) divided by NOM(i), as defined by Boxall
and Araban [2004];
The usage metrics are:
• Clients(i): counts the number of distinct classes that invoke the interface
i;
• Invocations(i): counts the number of static invocations of the methods
declared in the interface i;
• Implementing_Classes(i): counts the number of direct classes that implement the interface i.
2.2
The Approach
In this section, we illustrate the approach used to extract the fine-grained
source code changes, to measure the metrics and to perform the experiments
aimed at addressing our research hypotheses. Figure 2.2 shows an overview
of our approach that consists of three stages: (A) in the first stage we checkout
the source code of the projects from their versioning repositories and we measure the source code metrics; (B) we then compute the number of SCC from
the versioning data for each class and interface; (C) finally we use the metrics
2.2. The Approach
25
and the number of SCC to perform our experiments with the PASWStatistics4
and RapidMiner5 toolkits.
2.2.1
Source Code Metrics Computation
The first step of the process consists of checking out the source code of each
project from the versioning repositories. The source code of each project then
is parsed with the Evolizer Famix Importer, belonging to the Evolizer6 tool set.
The parser extracts a FAMIX model that represents the source code entities and
their relationships [Tichelaar et al., 2000]. Figure 2.3 shows the core of the
FAMIX meta model. The model represents inheritance relationships among
classes, the methods belonging to a class, the attribute accessed by a method
and the invocations among methods. For more details we refer the reader to
[Tichelaar et al., 2000].
After obtaining the FAMIX model, the next step consists of measuring the
source code metrics of classes and interfaces. We use the Understand tool to
measure the C&K metrics. We decided to use the Understand tool because in
our view it provides the most precise measurement of these metrics for interfaces. We use the FAMIX model to measure the external cohesion, complexity
and usage metrics of interfaces. For example, to measure the Invocations(i)
metric we count the number of invocation objects in the FAMIX model that
point to a method of the interface i.
2.2.2
SCC Extraction
The first step of the SCC extraction stage consists of retrieving the versioning data from the repositories (e.g., CVS, SVN, or GIT) for which we use the
Evolizer Version Control Connector [Gall et al., 2009]. The versioning repositories provide log entries that contain information about revisions of files that
belong to the system under analysis. For each log entry, it extracts the revision
number, the revision timestamp, the name of the developer who checked-in
the revision, the commit message, the total number of lines modified (LM),
and the source code.
In the second step, we use ChangeDistiller [Gall et al., 2009] to extract the
fine-grained source code changes (SCC) from the various source code revisions
of each file. ChangeDistiller implements a tree differencing algorithm, that
compares the Abstract Syntax Trees (ASTs) between all direct subsequent revisions of a file. Each change represents a tree edit operation that is required
4
http://www.spss.com/software/statistics/
http://rapid-i.com/content/view/181/196/
6
http://www.evolizer.org/
5
26
to transform one version of the AST into the other. In this way we can track
fine-grained source changes down to the statement level. Based on this information we count the number of fine-grained source code changes (#SCC) for
each class and interface over the selected observation period.
2.2.3
Correlation and Prediction Analysis
We use the collection of metric values and #SCC of each class and interface as
input to our experiments. First, we use the PASWStatistics tool to perform a
correlation analysis between the source code metrics and the #SCC. Then, we
use the RapidMiner tool to analyze the predictive power of the source code
metrics to discriminate between change- and not change-prone interfaces. We
perform a series of classification experiments with different machine learning
algorithms, namely: Support Vector Machine, Naive Bayes Network and Neural
Nets. The next section details the empirical study.
2.3
Empirical Study
The goal of this empirical study is to evaluate the possibility of using the IUC
metric for predicting the change-prone interfaces and to highlight the limited
predictive power of the C&K metrics. The perspective is that of a researcher,
interested in investigating whether the traditional object-oriented metrics are
useful to predict change-prone interfaces. The results of our study are also
interesting for quality engineers who want to monitor the quality of their software systems, using an external cohesion metric for interfaces.
The context of this study consists of ten open-source systems, widely used
in both, the academic and industrial community. These systems are eight plugins from the Eclipse7 platform and the Hibernate2 and Hibernate3 systems.8
Eclipse is a popular open source system that has been studied extensively by
the research community (e.g., [Businge et al., 2010], [Businge et al., 2013],
[Businge, 2013] [Bernstein et al., 2007], [Nagappan et al., 2010], [Zimmermann et al., 2007], and [Zimmermann et al., 2009]). Hibernate is an objectrelational mapping (ORM) library for the Java language.
Table 2.1 shows an overview of the dataset used in our empirical study.
The #Files is the number of unique Java files, #Interfaces is the number of
unique Java interfaces, #Rev is the total number of Java file revisions, #SCC
is the number of fine-grained source code changes performed within the given
time period (Time).
7
8
http://www.eclipse.org/
http://www.hibernate.org/
2.3. Empirical Study
27
Table 2.1: Dataset used in the empirical study
Project
Hibernate3
Hibernate2
eclipse.debug.core
eclipse.debug.ui
eclipse.jface
eclipse.jdt.debug
eclipse.team.core
eclipse.team.cvs.core
eclipse.team.ui
eclipse.update.core
#Files
#Interfaces
#Rev
#SCC
Time[M,Y]
970
494
188
793
381
469
172
189
293
274
165(17%)
69(14%)
97(52%)
129(16%)
105(28%)
140(30%)
44(26%)
25(13%)
45(15%)
71(26%)
30774
13584
8295
41860
22136
11711
3726
12343
20183
7425
34960
22960
11670
55259
27041
33895
4551
23311
32267
25617
Jun04-Mar11
Jan03-Mar11
May01-Mar11
May01-Mar11
Sep02-Mar11
Jun01-Mar11
Nov01-Mar11
Nov01-Mar11
Nov01- Mar11
Oct01-Mar11
In this study, we address the following two research hypotheses:
• H1: IUC has a stronger correlation with the #SCC of interfaces than the
C&K metrics
• H2: IUC can improve the performance of prediction models to classify
Java interfaces into change- and not change-prone
We first perform an initial analysis of the extracted information, in terms
of number of changes and in terms of metric values. Figure 2.4 shows the box
plots of the #SCC of Java classes and interfaces mined from the versioning
repositories of each project.
The results show that on average the number of changes involving Java
classes are at least one order of magnitude higher than the ones involving
Java interfaces. This result is not surprising since interfaces can be considered
contracts among modules, and in general among logic units of a system.
Figure 2.5 shows the values of the C&K metrics for classes and interfaces
over all ten projects. The values of the CBO metric are in general lower for interfaces, since it counts only the number of reference types in the parameters,
return types and thrown exceptions in the method signatures. The values of
the RFC metric are higher for classes than for interfaces. Also the values of the
DIT metric are in general higher for classes than for interfaces.
Analyzing the LCOM we can notice that Java classes have a low median
LCOM and hence a high cohesion. On the other hand, interpreting the LCOM
of interfaces we can state that most of them do not expose any attributes in
their body. In fact, the Understand tool registers a 0 LCOM when there are no
attribute declarations, and 1 if there are some. The values of WMC confirm
28
Table 2.2: Spearman rank correlation between the C&K metrics and the #SCC
computed for Java classes and Java interfaces (** marks significant correlations at α= 0.01, * marks significant correlations at α= 0.05, values in bold
mark a significant correlation)
Project
CBOc
CBOi
NOCc
NOCi
RFCc
RFCi
Hibernate3
Hibernate2
eclipse.debug.core
eclipse.debug.ui
eclipse.jface
eclipse.jdt.debug
eclipse.team.core
eclipse.team.ui
eclipse.update.core
0.590**
0.352**
0.560**
0.566**
0.570**
0.502**
0.453**
0.655**
0.532**
0.649**
0.535**
0.373**
0.484**
0.216*
0.239*
0.512**
0.367*
0.688**
0.301*
0.499**
0.109**
0.134**
-0.025
0.087*
0.257**
0.154**
0.180*
0.347**
0.152**
0.026
0.029
0.065
0.105
0.033
0.012
0.256**
0.102
-0.013
-0.003
-0.007
0.338**
0.273**
0.431**
0.291**
0.516**
0.132
0.435**
0.407**
0.382**
0.364**
0.592**
0.325**
0.486**
0.152
0.174**
0.349**
0.497**
0.738**
0.299*
0.381**
Median
0.563
0.428
0.143
0.031
0.373
0.365
Project
DITc
DITi
LCOMc
LCOMi
WMCc
WMCi
-0.098**
0.156**
0.065
0.473**
0.173**
0.089
0.060
0.145
0.039
0.007
0.058
-0.010
0.232*
0.324**
0.103
-0.049
0.243
0.618**
-0.103*
0.146
0.367**
0.269**
0.564
0.626**
0.563**
0.237**
0.335**
0.477**
0.493**
0.326**
0.103
0.006
0.337
0.214*
0.320**
0.238**
0.400
0.610**
0.395**
0.482**
0.617**
0.455**
0.600**
-0.048
0.754**
0.668**
0.561**
0.753**
0.595**
0.735**
0.657**
0.522**
0.597**
0.131
0.137
0.489**
0.451**
0.744**
0.299*
0.729**
0.422
0.328
0.608
0.505
Hibernate3
Hibernate2
eclipse.debug.core
eclipse.debug.ui
eclipse.jface
eclipse.jdt.debug
eclipse.team.core
eclipse.team.ui
eclipse.update.core
Median
0.023
0.124
the assumptions made in Section III about the loss of meaning of this metric
when applied to interfaces. In fact, the values of WMC correspond exactly to
the value of the NOM (Number of Methods). As expected, we registered higher
values of the NOC for interfaces than for classes. This is due to the number of
implementing classes that are counted as children by Understand.
2.3.1
Correlation between metrics and #SCC
The next step in our study aims at investigating the correlation between the
metrics and the #SCC mined from the versioning repositories. We used the
Spearman rank correlation analysis to identify highly-correlated metrics. Spearman compares the ordered ranks of the variables to measure a monotonic relationship. Differently to the Pearson correlation, the Spearman correlation
does not make assumptions about the distribution, variances and the type of
the relationship [S.Weardon and Chilko, 2004]. A Spearman value of +1 and
29
-1 indicates high positive or high negative correlation, whereas 0 indicates
that the variables under analysis do not correlate at all. Values greater than
+0.5 and lower than -0.5 are considered to be substantial; values greater than
+0.7 and lower than -0.7 are considered to be strong correlations.
To test the hypothesis H1, we performed two correlation analyses: (1)
we analyze the correlation among the C&K metrics and the #SCC in Java
classes and Java interfaces. An insignificant correlation of the C&K metrics for
interfaces is a precondition for any further analysis of the interface complexity
and usage metrics. (2) We explore the extent to which the interface cohesion,
complexity and usage metrics correlate with #SCC.
Table 2.2 lists the results of the correlation analysis between the C&K metrics and #SCC for classes and interfaces in each project. The heading Xc indicates the correlation of the metric X with the #SCC of classes, and Xi the
correlation with the #SCC of interfaces.
The first important result is that only the metrics CBOc and WMCc have
a substantial correlation with the #SCC of Java classes, since their median
correlation is greater than 0.5. In five projects out of ten WMCc exhibits a
substantial correlation and in three cases the correlation is strong. Similarly,
the CBOc metric shows a substantial correlation in eight cases but no strong
correlations. The other metrics do not show a significant correlation with the
#SCC.
The median correlation values of the C&K metrics applied to interfaces
are significantly lower. Among the six metrics WMCi exhibits the strongest
correlation with #SCC. It shows three substantial and two strong correlations.
CBOi shows a substantial correlation for three projects.
We applied the same correlation analysis to the interface complexity and
usage metrics defined in Section III. We report the result in Table 2.3. IUCi is
the only metric that exposes a substantial correlation with the #SCC of interfaces. This metric shows a median correlation value of -0.605, having a substantial correlation in six projects and a strong correlation in one project. The
negative correlation is due to the nature of the metric and it means that the
IUCi value is inversely proportional to the #SCC. More precisely, the stronger
the external cohesion is (values of IUCi close to one) the less frequently an
interface changes.
Concerning the other metrics, the NOMi shows the strongest correlation
with the #SCC. This result is not surprising since the more methods are declared in the interface the more likely the interface changes. Surprisingly,
neither the number of clients nor the number of invocations result in a sub-
30
Table 2.3: Spearman rank correlation between the interface complexity and
usage metrics and #SCC (** marks significant correlations at α= 0.01, * marks
significant correlations at α= 0.05, values in bold mark a significant correlation)
Project
Hibernate3
Hibernate2
eclipse.debug.core
eclipse.debug.ui
eclipse.jface
eclipse.jdt.debug
eclipse.team.core
eclipse.team.ui
eclipse.update.core
Median
Project
Hibernate3
Hibernate2
eclipse.debug.core
eclipse.debug.ui
eclipse.jface
eclipse.jdt.debug
eclipse.team.core
eclipse.team.ui
eclipse.update.core
Median
IUCi
Clientsi
Invocationsi
ClustersClientsi
-0.601**
-0.373**
-0.682**
-0.508**
-0.363**
-0.605**
-0.475**
-0.819**
-0.618**
-0.656**
0.433**
0.104
0.327**
0.498**
0.099
0.471
0.278
0.608**
0.270
0.656**
0.544**
0.165
0.317**
0.497**
0.205*
0.495**
0.261
0.557**
0.290
0.677**
0.302**
0.016
0.273**
0.418**
0.106**
0.474**
0.328*
0.369
0.056
0.606**
0.327
0.317
0.328
-0.605
ImplementingClassesi
0.021
0.054
0.070
0.139
0.063
0.223
0.102
-0.037
-0.003
-0.095
0.063
Argumentsi
APPi
NOMi
0.668**
0.531**
0.298**
0.128
0.207*
0.474**
0.241
0.614**
0.144
0.433**
0.450**
0.288**
0.125
-0.022
0.110
0.361**
0.138
0.383
-0.107*
0.278
0.657**
0.522**
0.597**
0.131
0.137
0.489**
0.451**
0.744**
0.299*
0.729**
0.365
0.208
0.505
stantial correlation with the #SCC. The Argumentsi metric correlates only in
three projects out of ten, while the APPi shows a correlation only for one
project. The ClustersClientsi metric shows a substantial correlation only in one
project. Therefore we conclude that the contribution of the number of methods shared among different clients is relevant for the correlation analysis. The
weakest correlation is by the ImpementingClassesi metric.
Based on this result we can accept H1. Among the selected metrics, the
IUCi metric exhibits the strongest correlation with #SCC of interfaces. This result confirms our belief that the violation of the Interface Segregation Principle
can impact the robustness of interfaces.
2.3.2
31
Prediction analysis
To test the research hypothesis H2, we analyzed whether the IUC metric can
improve prediction models to classify interfaces into change-prone and not
change-prone. We performed a series of classification experiments with three
different machine learning algorithms. Prior work [Lessmann et al., 2008]
showed that some machine learning techniques perform better than others,
even though they state that performance differences among classifiers are
marginal and not necessarily significant. For that reason we used the following
classifiers: Support Vector Machine (LibSVM), Naive Bayes Network (NBayes)
and Neural Nets (NN) provided by the RapidMiner toolkit.
For each project, we binned the interfaces into change-prone and not changeprone using the median of the #SCC per project:
¨
change-prone
if # SCC > median
interface =
not change-prone
otherwise
First, we trained the machine learning algorithms using the following object
oriented metrics: CBO, RFC, LCOM, WMC. We selected these metrics because
they showed the strongest correlation with the #SCC. We refer to this set of
metrics as OO. Next, the training is performed using the OO metrics plus the
IUC metric. We refer to this set of metrics as IUC.
In order to evaluate the classifications models, we use the area under the
curve statistic (AUC). In addition we report the precision (P) and recall (R)
of each model. AUC represents the probability, that, when choosing randomly
a change-prone and a not change-prone interface, the trained model assigns
a higher score to the change-prone interface [Green and Swets, 1966]. We
trained the models using 10 fold cross-validation and we considered models
with an AUC value greater than 0.7 to have adequate classification performance [Lessmann et al., 2008].
Table 2.4 reports the results obtained with the NBayes learner. The results
show that the median AUC is higher when we include the IUC metric. Moreover, for each project we obtained an adequate performance (AUC>0.7) with
the IUC. Only for two projects (JDT Debug and Team UI) out of ten we registered a better performance for the OO metrics. Using the LibSVN (see Table
2.5) and the NN (see Table 2.6) classifiers we obtained similar results. With
LibSVN, in eight projects the IUC metrics outperformed the OO metrics. Using
NN, in seven projects out of ten the IUC metrics outperformed the OO metrics.
The median values of the Precision and Recall show similar results for most
of the projects. In several projects, however, the Precision and Recall is affected by the lack of information about interfaces (i.e., a high percentage of
32
interfaces did not change during the observed time period). For instance, in
the eclipse.jface project the number of interfaces that did not change is 81%
(85 out of 105). The result is that the prediction model computed with the
NN learner showed a Precision and Recall of 0.
Table 2.4: AUC, Precision and Recall using Naive Bayes Network (NBayes)
with OO and IUC to classify interfaces into change-prone and not change-prone.
Bold values highlights the best AUC value per project.
Project
AUCOO
POO
ROO
AUCIUC
PIUC
RIUC
eclipse.debug.core
eclipse.debug.ui
hibernate2
hibernate3
eclipse.jdt.debug
eclipse.jface
eclipse.team.core
eclipse.team.ui
eclipse.update.core
0.55
0.75
0.66
0.745
0.835
0.79
0.639
0.708
0.88
0.782
90
93
63.81
78.62
88.61
69.67
50
68.75
85
67.49
75
38
40.33
32.02
57.92
47.67
28.33
48.13
70
46.5
0.75
0.79
0.72
0.807
0.862
0.738
0.734
0.792
0.8
0.811
92.6
94.1
69
84.22
82.8
77.71
53.85
58.33
78.95
81.19
83.33
55.23
41
85.33
56.31
45.38
48.33
43.33
75
61.67
Median
0.747
74.14
47.08
0.791
80.07
55.77
To investigate whether the difference between the AUC values of OO and
IUC metrics are significant we performed the Related Samples Wilcoxon SignedRanks Test. The results of the test show a significant difference at α= 0.05 for
the median AUC obtained with Support Vector Machine (LibSVN). The difference between the medians obtained with NBayes and NN was not significant.
Based on these results we can partially accept the hypothesis H2. The
additional information provided by the IUC metric can improve the median
performance of the prediction models by up to 9.2%. The Wilcoxon test confirmed this improvement for the LibSVM learner, however not for NBayes and
NN learners. This result highlights the need to analyze a wider dataset in
order to provide a more precise validation.
2.3.3
Summary of Results
The results of our empirical study can be summarized as follows:
The IUC metric shows a stronger correlation with the #SCC of interfaces
than the C&K metrics. With a median Spearman rank correlation of -0.605,
the IUC shows a stronger correlation with the #SCC on Java interfaces than
the C&K metrics. Only the WMC metric shows a substantial correlation in five
projects out of ten, with a median value of 0.505, hence we accepted H1.
33
Table 2.5: AUC, Precision and Recall using Support Vector Machine (LibSVN)
with OO and IUC to classify interfaces into change-prone and not change-prone.
Bold values highlights the best AUC value per project.
Project
AUCOO
POO
ROO
AUCIUC
PIUC
RIUC
eclipse.debug.core
eclipse.debug.ui
hibernate2
hibernate3
eclipse.jdt.debug
eclipse.jface
eclipse.team.core
eclipse.team.ui
eclipse.update.core
0.692
0.806
0.71
0.735
0.64
0.741
0.607
0.617
0.74
0.794
55.61
82.61
75
70
52
67.17
66.67
66.67
73.33
86.67
54.2
46
21.33
40
33.45
56.24
45
45
70
56.83
0.811
0.828
0.742
0.708
0.856
0.82
0.778
0.608
0.883
0.817
90.91
89.47
80.83
66.76
82.4
68.56
72
58.33
83.33
81
83.33
52.5
26.8
45
73.36
58.33
62
45
75
64.17
Median
0.722
68.58
45.5
0.814
80.91
60.16
Table 2.6: AUC, Precision and Recall using Neural Nets (NN) with OO and
IUC to classify interfaces into change-prone and not change-prone. Bold values
highlights the best AUC value per project.
Project
AUCOO
POO
ROO
AUCIUC
PIUC
RIUC
eclipse.debug.core
eclipse.debug.ui
hibernate2
hibernate3
eclipse.jdt.debug
eclipse.jface
eclipse.team.core
eclipse.team.ui
eclipse.update.core
0.8
0.85
0.748
0.702
0.874
0.77
0.553
0.725
0.65
0.675
71.43
80
79.33
53.85
83.17
73.39
0
53.33
83.33
70
71.43
80
44.67
50
69.52
63.24
0
50
75
58.33
0.8
0.875
0.766
0.747
0.843
0.762
0.542
0.85
0.75
0.744
87.5
91.67
78.05
50
78.49
80.5
0
61.11
78.95
78.33
100
70
58.5
45
69.05
58.05
0
63.33
75
56.67
Median
0.736
72.41
60.78
0.764
78.41
60.69
The IUC metric improves the performance of prediction models to classify
change- and not change-prone interfaces. The models trained with the Support
Vector Machine (LibSVN) and NBayes using the IUC metric set outperformed
the models computed with the OO metric set in eight out of ten projects. Using
the NN learner, the models of seven projects showed better performance with
the IUC metric set. This improvement in performance is significant for the
models trained with the Support Vector Machine (LibSVN), however not for
the other two learners. Therefore, we partially accepted H2.
34
2.4
Discussion
This section discusses the implications of our results and the threats to validity.
2.4.1
Implications of Results
The implications of the results of our study are interesting for researchers,
quality engineers and, in general, for developers and software architects.
The results of our study can be used by researchers interested in investigating software systems through the analysis of source code metrics. Studies
based on source code metrics should take into account the nature of the entities that are measured. This can help to obtain more accurate results.
Quality engineers should consider the possibility to enlarge their metric
suite. In particular, the set of metrics should include specific metrics for measuring the cohesion of interfaces, such as the IUC metric. The C&K metrics are
limited in measuring this cohesion of interfaces.
Finally, developers and software architects should use the IUC metric to
measure the conformance to the ISP. Our results showed that low IUC values,
indicating a violation of the ISP, can increase the effort needed to maintain
software systems.
2.4.2
Threats to Validity
We consider the following threats to validity: construct, internal, conclusion,
external and reliability validity. Threats to construct validity concern the relationship between theory and observation. In our study, this threat can be
due to the fact that we measured the metrics on the last version of the source
code. Previous studies in literature also used metrics collected from a single
release (e.g., [Mauczka A., 2009] [Alshayeb and Li, 2003]). We mitigated this
threat by collecting the metrics from the last release, since this release reflects
the history of a system. Nevertheless, we believe that further validation with
metrics measured over time (i.e., from different releases) is desirable.
Threats to internal validity concern factors that may affect an independent
variable. In our study, the independent variables (values of the metrics and
#SCC) are computed using deterministic algorithms (provided by the Understand and Evolizer tools) that always deliver the same results.
Threats to conclusion validity concern the relationship between the treatment and the outcome. Wherever possible, we used proper statistical tests to
support our conclusions for the two research questions. We used the Spearman correlation, which does not make any assumption on the underlying data
distribution to test H1. To address H2 we selected a set of three machine learn-
2.5. Related Work
35
ing techniques. Further techniques can be applied to build predictive models,
even though previous work [Lessmann et al., 2008] states that performance
differences among classifiers are not significant.
Threats to external validity concern the generalization of our findings. In
our study, this threat can be due to the fact that eight out of ten projects stem
from the Eclipse platform. Therefore, the generalizability of our findings and
conclusions should be verified for other projects. Nevertheless, we considered
systems of different size and different roles in the Eclipse platform. Eclipse has
been widely used by the scientific community and we can compare our findings with previous work. Moreover, we added two projects from Hibernate.
As a matter of fact, any result from empirical work is in general threatened by
the bias of their datasets [Menzies et al., 2007].
Threats to reliability validity concern the possibility of replicating our study
and obtaining consistent results. The analyzed systems are open source systems and hence publicly available; the tools used to implement our approach
(Evolizer and ChangeDistiller) are available from the reported web sites.
2.5
Related Work
In this section, we discuss previous work related to the usage of change prediction models to guide and understand maintenance of software systems.
Rombach was among the first researchers to investigate the impact of
software structure on maintainability aspects [Rombach, 1987], [Rombach,
1990]. He focused on comprehensibility, locality, modifiability, and reusability
in a distributed system environment, highligthing the impact of the interconnectivity between components.
In literature several approaches used source code metrics to predict the
change-prone classes. Khoshgoftaar and Szabo [1994] presented an approach
to predict maintenance measured as lines changed. They trained a regression
model and a neural network using size and complexity metrics. Li and Henry
used the C&K metrics to predict maintenance in terms of lines changed [Li
and Henry, 1993]. The results show that these metrics can significantly improve a prediction model compared to traditional metrics. In 2009, Mauczka
et al. measured the relationship of code changes with source-level software
metrics [Mauczka A., 2009]. This work focuses on evaluating the C&K metrics suite against failure data. Zhou et al. [2009] used three size metrics to
examine the potentially confounding effect of class size on the associations
between object-oriented metrics and change-proneness. A further validation
of the object-oriented metrics was provided by Alshayeb and Li [2003]. This
36
work highlights the capability of those metrics in two different iterative processes. The results show that the object-oriented metrics are effective in predicting design efforts and source lines modified in the short-cycled agile process. On the other hand they are ineffective in predicting the same aspects in
the long-cycled framework process.
Object-oriented metrics were not only successfully applied for maintenance but als for defect prediction. Basili et al. [1996] empirically investigated the suite of object-oriented design metrics as predictors of fault-prone
classes. Subramanyam and Krishnan [2003] validated the C&K metrics suite
in determining software defects. Their findings show that the effects of those
metrics on defects vary across the data set from two different programming
languages, C++ and Java.
Besides the correlation between metrics and change proneness, other design practices have been investigated in correlation with the number of changes.
Khomh et al. [2009] investigated the impact of classes with code smells on
change-proneness. They showed that classes with code smells are more changeprone than classes without and that specific smells are more correlated than
others. Penta et al. [2008] developed an exploratory study to analyze the
change-proneness of design patterns and the kinds of changes occurring to
classes involved in design patterns.
A complementary branch of change prediction is the detection of change
couplings. Shirabad et al. [2003] used a decision tree to identify files that
are change coupled. Zimmermann et al. [2004] developed the ROSE tool that
suggests change coupled source code entities to developers. They are able
to detect coupled entities on a fine-grained level. Robbes et al. [2008] used
fine-grained source changes to detect several kinds of distinct logical couplings
between files. Canfora et al. [2010] use the multivariate time series analysis
and forecasting to determine whether a change occurred on a software artifact
was consequentially related to changes on other artifacts.
Our work is complementary to the existing work since (1) we explore limitations of the C&K metrics in predicting the change-prone Java interfaces; (2)
we investigate the impact of the ISP violation as measured by the IUC metric
on the change-proneness of interfaces.
2.6
Conclusions and Future Work
Interfaces declare contracts that are meant to remain stable during the evolution of a software system while the implementation in concrete classes is more
likely to change. This leads to a different evolutionary behavior of interfaces
2.6. Conclusions and Future Work
37
compared to concrete classes.
In this chapter, we empirically investigated this behavior with the C&K
metrics that are widely used to evaluate the quality of the implementation of
classes and interfaces. The results of our study with eight Eclipse plug-in and
two Hibernate projects showed that:
• The IUC metric shows a stronger correlation with #SCC than the C&K
metrics when applied to interfaces (we accepted H1)
• The IUC metric can improve the performance of prediction models in
classifying Java interfaces into change-prone and not change-prone (we
partially accepted H2)
Our findings provide a starting point for studying the quality of interfaces
and the impact of design violations, such as the ISP, on the maintenance of
software systems. In particular, the acceptance of the hypothesis H1 implicates
engineers should measure the quality of interfaces with specific interface cohesion metrics. Software designers and architects should follow the interface
design principles, in particular the ISP. Furthermore, researchers should consider distinguishing between classes and interfaces when investigating models
to estimate and predict the change-prone interfaces.
In future work, we plan to evaluate the IUC metric with more open source
and also commercial software systems. Furthermore, we plan to analyze the
performance of our models taking into account releases (train the model with
a previous release to predict the change-prone interfaces of the next release).
Another direction of future work is to apply our models to other types of systems, such as Component Based Systems (CBS) and Service Oriented Systems
(SOS), in which interfaces play a fundamental role.
38
A. Source Code Metrics
Computation
B. SCC Extraction
C. Correlation and
Prediction Analysis
Figure 2.2: Overview of the data extraction and measurement process
2.6. Conclusions and Future Work
39
Superclass
Inheritance
BelongsToClass
Class
Subclass
BelongsToClass
Attribute
InvokedBy
Invocation
Method
Accesses
Access
AccessedIn
Invokes
Figure 2.3: Core of the FAMIX meta model [Tichelaar et al., 2000]
Type
class
interface
160
150
140
130
120
110
#SCC
100
90
80
70
60
50
40
30
20
10
0
. e
se or
lip e.c
ec dat
i
up
.u
am
.te
se
am
te
re
re
o
s.c
.cv
.co
am
ce
re
ug
i
.u
eb
t.d
te
.
se
lip
lip
ec
ec
.
se
jd
jfa
.
se
lip
lip
ec
ec
.
se
g
bu
co
.
ug
eb
e
.d
.d
se
lip
lip
ec
ec
se
e3
at
2
te
na
rn
be
r
be
lip
ec
hi
Hi
Project
Figure 2.4: Box plots of the #SCC of interfaces and classes per project
40
100
Metric
CBO
DIT
LCOM
NOC
RFC
WMC
80
Value
60
40
20
0
class
interface
Type
Figure 2.5: Box plots of the C&K metric values for classes and interfaces measured over all selected projects
3.
Change-Prone Java APIs
Antipatterns are poor solutions to design and implementation problems which
are claimed to make object oriented systems hard to maintain. Recent studies
showed that classes with antipatterns change more frequently than classes without antipatterns. In this chapter, we detail these analyses by taking into account
fine-grained source code changes (SCC) extracted from 16 Java open source systems. In particular we investigate: whether classes with antipatterns are more
change-prone (in terms of SCC) than classes without; (2) whether the type of antipattern impacts the change-proneness of Java classes; and (3) whether certain
types of changes are performed more frequently in classes affected by a certain
antipattern.
Our results show that: 1) the number of SCC performed in classes affected
by antipatterns is statistically greater than the number of SCC performed in
classes with no antipattern; 2) classes participating in the three antipatterns
ComplexClass, SpaghettiCode, and SwissArmyKnife are more change-prone than
classes affected by other antipatterns; and 3) certain types of changes are more
likely to be performed in classes affected by certain antipatterns, such as API
changes are likely to be performed in classes affected by the ComplexClass,
SpaghettiCode, and SwissArmyKnife antipatterns.1
3.1
Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2
Empirical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3
Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.4
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.5
Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . 60
Over the past two decades, maintenance costs have grown to more than
50% and up to 90% of the overall costs of software systems [Erlikh, 2000].
To help reduce the cost of maintenance, researchers have proposed several
1
This chapter was published in the 19th Working Conference on Reverse Engineering (WCRE
2012) [Romano et al., 2012].
41
42
Chapter 3. Change-Prone Java APIs
approaches to ease program comprehension, and identify change- and bugprone parts of the source code of software systems. These approaches include
source code metrics (e.g., [Mauczka A., 2009]) and heuristics to assess the
design of a software system (e.g., [Posnett et al., 2011; Khomh et al., 2012;
Thummalapenta et al., 2010]).
Recently, Khomh et al. analyzing the impact of antipatterns on the changeproneness of software units [Khomh et al., 2012]. Antipatterns [Brown et al.,
1998] are “poor” solutions to design and implementation problems. In contrast to design patterns [Gamma et al., 1995] which are “good" solutions to
recurring design problems. Antipatterns are typically introduced in software
systems by developers lacking the adequate knowledge or experience in solving a particular problem or having misapplied some design patterns. Coplien
and Harrison [2005] described an antipattern as “something that looks like a
good idea, but which back-fires badly when applied”. Previous studies, such as
ours [Khomh et al., 2012], support this description by showing that software
units, i.e., classes, affected by antipatterns are more likely to undergo changes
than other units.
Existing literature proposes many different antipatterns, such as the 40 antipatterns described by Brown et al. [1998]. Furthermore, antipatterns occur
in large numbers and affect large portions of some software systems. For instance, we found that more than 45% of the classes in the systems studied in
[Khomh et al., 2012] contained at least one antipattern. Because of the diversity and the large number of antipatterns, support is needed, for instance by
software engineers, to identify the risky classes affected by antipatterns that
lead to errors and increase development and maintenance costs. For this, we
need to obtain a deeper understanding of the change-proneness of different
antipatterns and the types of changes occurring in classes affected by them.
Providing this deeper understanding is the main objective of this chapter.
In this chapter we investigate the extent to which antipatterns can be used
as indicators of changes in Java classes. The goal of this study is to investigate
which antipattern is more likely to lead to changes and which types of changes
are likely to appear in classes affected by certain antipatterns. Differently to
existing studies (i.e., [Khomh et al., 2009, 2012]), the approach of our study
is based on the analysis of fine-grained source code changes (SCC) mined
from version control systems [Fluri et al., 2007; Gall et al., 2009]. This approach allows us to analyze the types of changes performed in classes affected
by a particular antipattern which was not possible with previous approaches.
Moreover, we take into account the significance of the change types [Fluri and
Gall, 2006] and we filter out irrelevant change types (e.g., changes to com-
43
ments and copyrights), that account for more than 10% of all changes in our
dataset.
Using the data of fine-grained source code changes and antipatterns, we
aim at providing answers to the following three research questions:
• RQ1: Are Java classes affected by antipatterns more change-prone than
Java classes not affected by any antipattern?
This research question is aimed at replicating the previous study [Khomh
et al., 2012] with fine-grained source code changes (SCC).
• RQ2: Are Java classes affected by certain types of antipatterns more
change-prone than Java classes affected by other antipatterns – i.e., does
the type of antipattern impact change-proneness?
The results from this research question can assist software engineers in
identifying the risky classes affected by antipatterns.
• RQ3: Are particular types of changes more likely to be performed in
Java classes affected by certain types of antipatterns?
The results of this question will assist software engineers in prioritizing
antipatterns that need to be resolved to prevent certain types of changes
in a system. For example changes in the method declarations of a class
exposing a public API.
To answer our research questions, we perform an empirical study with data
extracted from 16 Java open-source software systems. Our main outcomes
are:
• The number of SCC performed in classes affected by antipatterns is statistically greater than the number of SCC performed in other classes.
• Classes affected by ComplexClass, SpaghettiCode, and SwissArmyKnife
are more change-prone than classes affected by other antipatterns.
• Changes in APIs are more likely to appear in classes affected by the ComplexClass, SpaghettiCode, and SwissArmyKnife; methods are more likely
to be added/deleted in classes affected by ComplexClass and SpaghettiCode; changes in executable statements are likely in AntiSingleton, ComplexClass, SpaghettiCode, and SwissArmyKnife; changes in conditional
statements and else-parts are more likely in classes affected by SpaghettiCode.
44
These findings suggest that software engineers should consider detecting and resolving instances of certain antipatterns to prevent certain types
of changes. For instance, they should resolve instances of the ComplexClass,
SpaghettiCode, and SwissArmyKnife to prevent frequent changes in the APIs.
The remainder of this chapter is organized as follows. Section 3.1 describes the approach used to mine fine-grained source code changes and to
detect Java classes participating in antipatterns. The study design and our
findings are presented in Section 3.2. Section 3.3 discusses threats to the validity of the results of our study. Section 3.4 presents related work. We draw
our conclusions and outline directions for future work in Section 3.5.
3.1
Data Collection
In this section, we describe the approach used to gather the data needed to
perform our study. The data consist of the fine-grained source code changes
(SCC), performed in each Java class along the history of the systems under
analysis, and the type and number of antipatterns in which a class participates
during its evolution. Figure 3.1 shows an overview of our approach consisting
of 4 steps. In the following we describe each step in details.
3.1.1
Importing Versioning Data
The first step concerns retrieving the versioning data for the Java classes from
the version control systems (e.g., CVS, SVN or GIT). To perform this step we
use the Evolizer Version Control Connector (EVCC) [Gall et al., 2009], belonging to the Evolizer2 tool set. For each class EVCC fetches and parses the log
entries from the versioning repository. Per log entry, EVCC extracts the revision
numbers, the revision timestamps, the name of the developers who checked-in
the revision, the commit messages, the total number of lines modified, and the
source code. This information plus the source code of each revision of Java
class is stored into the Evolizer repository.
3.1.2
Fine-Grained Source Code Changes Extraction
In the second step, ChangeDistiller is used [Fluri et al., 2007] to extract the
fine-grained source code changes (SCC) between the subsequent versions of a
Java class. ChangeDistiller first parses the source code from the two subsequent versions of a Java class and creates the corresponding Abstract Syntax
Trees (ASTs). Second, the two ASTs are compared using a tree differencing
algorithm that outputs the differences in form of the tree-edit operations add,
2
http://www.evolizer.org/
3.1. Data Collection
45
Source Code
Repository
1. Versioning
Data Importer
3. Antipatterns
Detector
Revision Info
Subsequent Versions
Classes affected
by antipatterns
2. Fine-Grained
Source Code
Changes Extractor
Fine-grained source
code changes (SCCs)
4. Data
Preparation
Figure 3.1: Overview of the approach to extract fine-grained source code
changes and antipatterns for Java classes.
delete, update, and move. Next, each edit operation for a given node in the
AST is annotated with the semantic information of the source code entity it
represents and is classified as a specific change type based on a taxonomy of
code changes [Fluri and Gall, 2006]. For instance, the insertion of a node
representing an else-part in the AST is classified as else-part insert
change type. The result is a list of change types between two subsequent
versions of each Java class which is stored into the Evolizer repository.
3.1.3
Antipatterns Detection
The third step of our approach is detecting the antipatterns that occur in Java
classes. This is achieved by DECOR (Defect dEtection for CORrection) [Moha
et al., 2008a,b, 2010]. DECOR provides a domain-specific language to describe antipatterns through a set of rules (e.g., lexical, structural, internal,
etc.) and an algorithm to detect antipatterns’ in Java classes.
We use the predefined specifications of antipatterns and run DECOR on
46
the different source code releases of our systems under analysis. Among the
antipatterns detectable with DECOR we select the following twelve antipatterns:
• AntiSingleton: A class that provides mutable class variables, which consequently could be used as global variables.
• Blob: A class that is too large and not cohesive enough, that monopolises
most of the processing, takes most of the decisions, and is associated to
data classes.
• ClassDataShouldBePrivate (CDSBP): A class that exposes its fields, thus
violating the principle of encapsulation.
• ComplexClass (ComplexC): A class that has (at least) one large and complex method, in terms of cyclomatic complexity and LOCs.
• LazyClass (LazyC): A class that has few fields and methods (with little
complexity).
• LongMethod (LongM): A class that has a method that is overly long, in
term of LOCs.
• LongParameterList (LPL): A class that has (at least) one method with a
too long list of parameters with respect to the average number of parameters per methods in the system.
• MessageChain (MsgC): A class that uses a long chain of method invocations to realise (at least) one of its functionality.
• RefusedParentBequest (RPB): A class that redefines inherited method
using empty bodies, thus breaking polymorphism.
• SpaghettiCode (Spaghetti): A class declaring long methods with no parameters and using global variables. These methods interact too much
using complex decision algorithms. This class does not exploit and prevents the use of polymorphism and inheritance.
• SpeculativeGenerality (SG): A class that is defined as abstract but that
has very few children, which do not make use of its methods.
• SwissArmyKnife (Swiss): A class whose methods can be divided in disjunct set of many methods, thus providing many different unrelated
functionalities.
3.1. Data Collection
47
Per release, we obtain a list of detected antipatterns for each Java class.
We choose this subset of antipatterns because (1) they are well-described by
Brown et al. [1998], (2) they appear frequently in the different releases of the
systems under analysis and (3) they are representative of design and implementation problems with data, complexity, size, and the features provided by
Java classes. Moreover they allow us to compare our findings with those of a
previous study [Khomh et al., 2012].
3.1.4
Data Preparation
In this step, the fine-grained source code changes are grouped and linked with
the antipatterns. ChangeDistiller currently supports more than 40 types of
source code changes that cover the majority of modifications to entities of
object oriented programming languages [Fluri and Gall, 2006]. We group
these change types into five categories. Grouping them facilitates the analysis
of the contingency between different types of changes and the interpretation
of the results.
Table 3.1: Categories of source code changes [Giger et al., 2011].
Category
API
oState
func
stmt
cond
Description
Changes that involve the declaration of
classes (e.g., class renaming and class API
changes) and the signature of methods
(e.g., modifier changes, method renaming,
return type changes, changes of the parameter list).
Changes that affect object states of classes
(e.g., fields addition and deletion).
Changes that affect the functionality of a
class (e.g., methods addition and deletion)
Changes that modify executable statements (e.g., statements insertion and deletion)
Changes that alter condition expressions in
control structures and the modification of
else-parts
The different categories are shown in Table 3.1 together with a short description of each category. Per Java class revision we count the number of
changes for each category. Per Java class we compute the sum for each change
type category over the Java class revisions between two subsequent releases
k and k+1. Finally, for each Java class we add the number of antipatterns
detected in the Java class at release k. We did not normalize the number of
48
changes in classes by the number of lines of code, because we wanted our
results to be comparable to previous studies. Furthermore, one of previous
studies [Khomh et al., 2012] has shown that size alone is not the dominating
factor affecting the change proneness of classes with antipatterns.
The resulting list contains for each release k a list of Java classes with
the number of detected instances of the twelve antipatterns at release k plus
the number of fine-grained changes per change type category that occurred
between the two subsequent releases k and k+1. The analyses performed on
these data will be described in the next section.
3.2
Empirical Study
The goal of this empirical study is to investigate the association between antipatterns and the change proneness of Java classes. We performed the empirical study with 16 open-source systems from different domains, implemented
in Java and widely used in academic and industrial communities. Table 3.2
shows an overview of the dataset. #Files denotes the number of Java files in
the last release, #Releases denotes the number of releases analyzed, #SCC denotes the number of fine-grained source code changes in the given time period
(Time) and #SCC’ denotes the number of fine-grained source code changes
without counting changes performed in the comments and copyrights. In total, changes due to comments and copyrights modifications account for approximately 11% of all the changes (i.e., 64021 out of 585614). This high
percentage highlights the necessity to filter out changes related to comments
and copyrights, in order to avoid biasing the results.
Table 3.3 shows the number of antipatterns detected by DECOR in the
first and last release of the analyzed systems. Basically, all systems contain
instances of most of the 12 antipatterns. In particular, rapid miner and vuze
contain the largest number of antipatterns which is not surprising since they
also are the largest systems in our sample set. According to our numbers, the
antipatterns LongMethod (LongM), MessageChain (MsgC), and RefusedParentBequest (RPB) occur most frequently while SpaghettiCode (Spaghetti), SpeculatigeGenerality (SG), and SwissAmryKnife (Swiss) occur less frequently. Overall, the frequency of antipatterns and changes allows us to investigate the
three research questions stated at the beginning of this chapter.
The raw data used to perform our analysis are available on our web site.3
In the following, we state the hypotheses, explain the analysis methods, and
3
http://swerl.tudelft.nl/twiki/pub/DanieleRomano/WebHome/
WCRE12rawData.zip
49
Table 3.2: Dataset used in our empirical study.
System
argo
hibernate2
hibernate3
eclipse.debug.core
eclipse.debug.ui
eclipse.jface
eclipse.jdt.debug
eclipse.team.core
eclipse.team.ui
jabref
mylyn
rhino
rapidminer
vuze
xerces
#Files
1716
494
970
188
793
381
469
172
189
293
1996
1288
184
2061
3265
710
#Releases
9
10
20
12
22
17
16
6
11
13
30
17
8
4
29
20
#SCC
97767
26099
37271
7600
40551
14072
14983
2318
13070
9787
41665
67050
14795
9899
119138
69549
#SCC’
79414
23638
34440
6555
37306
11789
13647
1790
11544
8948
37983
63601
13693
9277
113570
54398
Time [M,Y]
Oct02-Mar09
Jan03-Mar11
Jun04-Mar11
May01-Mar11
May01-Mar11
Sep02-Mar11
Jun01-Mar11
Nov01-Mar11
Nov01-Mar11
Nov01- Mar11
Dec03-Oct11
Dec06-Jun09
May99-Aug07
Oct09-Aug10
Dec06-Apr10
Dec00-Dec12
report on the results for each research question.
3.2.1
Investigation of RQ1
The goal of RQ1 is to analyze the change-proneness of Java classes affected
by antipatterns, compared to the change-proneness of classes not affected by
antipatterns. We address RQ1 by testing the following two null hypotheses:
• H1a : The proportion of classes changed at least once between two releases is not different between classes that are affected by antipatterns
and classes not affected by antipatterns.
• H1b : The distribution of SCC performed in classes between two releases
is not different for classes affected by antipatterns and classes not affected by antipatterns.
Analysis Method
For investigating H1a we classify the Java classes of each system and release
k into change-prone if there was at least one change in between two subsequent releases (k and k+1). Otherwise they are classified as not change-prone.
This binary variable (we refer to it as change-proneness(k,k+1)) denotes the
dependent variable. As independent variable we also use a binary variable that
denotes whether a Java class is affected by at least one antipattern in a given
release k. We refer to this variable as antipatterns(k).
50
Table 3.3: Number of antipatterns detected with DECOR in the first and last
releases of the analyzed systems.
System
argo
hibernate2
hibernate3
eclipse.debug.core
eclipse.debug.ui
eclipse.jface
eclipse.jdt.debug
eclipse.team.core
eclipse.team.ui
jabref
mylyn
rhino
rapidminer
vuze
xerces
#Antisingleton
352-3
113-104
176-232
1-22
18-146
8-25
17-44
1-12
9- 64
9-64
12-139
4-70
16-18
11-19
179-145
10-22
#Blob
26-169
34-37
52-75
7-14
13-70
7-22
26-27
2-7
1-21
1-21
10-136
43-101
5-11
130-161
199-282
8-59
#CDBSP
136-51
33-17
31-50
0-12
0-70
6-32
1-74
1-10
2-6
2-6
8-400
61-174
4-18
145-203
189-270
14-134
#ComplexC
56-195
30-37
58-8
1-8
11-50
5-13
30-33
1-5
1-21
1-21
9-144
43-83
9-19
152-156
138-193
13-44
#LazyC
16-53
5-3
9-12
0-9
0-22
6-22
8-42
0-4
0-0
0-0
1-126
2-16
4-9
10-15
29-215
6-21
#LongM
172-354
56-72
121-194
5-22
30-176
22-60
68-78
8-33
17-79
17-79
21-365
132-300
11-33
450-568
381-473
29-96
System
argo
hibernate2
hibernate3
eclipse.debug.core
eclipse.debug.ui
eclipse.jface
eclipse.jdt.debug
eclipse.team.core
eclipse.team.ui
jabref
mylyn
rhino
rapidminer
vuze
xerces
#LPL
195-334
34-19
48-74
0-18
25-41
19-45
37-40
0-26
1-51
1-51
2-169
43-66
9-8
214-270
217-295
16-130
#MsgC
130-197
51-101
157-236
3-6
6-53
22-34
78-80
1-15
4-45
4-45
2-332
98-135
15-51
583-674
514-773
19-99
#RPB
65-513
93-97
123-202
0-11
6-73
5-14
80-82
0-7
0-13
0-13
2-295
34-165
3-7
781-1068
476-637
3-37
#Spaghetti
22-1
15-4
9-12
0-1
3-8
0-2
3-3
0-1
0-1
0-1
1-16
2-0
0-0
1-1
22-16
2-1
#SG
9-34
2-1
3-8
1-1
2-24
7-21
1-2
3-10
2-10
2-10
0-17
12-35
0-2
12-28
21-27
5-4
#Swiss
3-4
0-0
3-9
0-2
0-7
0-2
1-1
0-0
0-0
0-0
0-1
1-1
0-1
3-1
35-70
10-11
Next, we use the Fisher’s exact test [Sheskin, 2007] to test for each release k of each system whether there is an association between antipatterns(k)
and change-proneness(k,k+1) of classes. We then use the odds ratio (ORs)
[Sheskin, 2007] to measure the probability that a Java class will be changed
between two releases (k and k+1) if it is affected by at least one antipattern
p/(1−p)
in the release k. OR is defined as OR = q/(1−q) and it measures the ratio of the
odds p of an event occurring in one group (i.e., experimental group) to the odds
q of it occurring in another group (i.e., control group). In this case, the event is
a change in a Java class, the experimental group is the set of classes affected by
51
at least one antipattern and the control group is the set of classes not affected
by any antipattern. ORs equal to 1 indicate that a change can appear with the
same probability in both groups. ORs greater than 1 indicate that the change
is more likely to appear in a class affected by at least one antipattern. ORs less
than 1 indicate that classes not affected by antipatterns are more likely to be
changed.
Concerning H1b we use the Mann-Whitney test to analyze for each release k whether there is a significant difference in
the distributions of
#SCC(k,k+1) performed in Java classes affected by antipatterns and in Java
classes not affected by any antipattern. We apply the Cliff’s Delta d effect size
[Grissom and Kim, 2005] to measure the magnitude of the difference. Cliff’s
Delta estimates the probability that a value selected from one group is greater
than a value selected from the other group. Cliff’s Delta ranges between +1
if all selected values from one group are higher than the selected values in
the other group and -1 if the reverse is true. 0 expresses two overlapping
distributions. The effect size is considered negligible for d < 0.147, small for
0.147 ≤ d < 0.33, medium for 0.33 ≤ d < 0.47 and large for d ≥ 0.47 [Grissom and Kim, 2005]. We chose the Mann-Whitney test and Cliff’s Delta effect
size because the values of the SCC per class are non-normally distributed.
Furthermore, our different levels (small, medium, and large) facilitate the interpretation of the results. The Cliff’s Delta effect size has been computed with
the orddom package4 available for the R environment.5
Results
The odds ratios computed to test H1a are summarized in Table 3.4. Table 3.4
shows for each system the total number of releases (#Releases) and the number of releases that showed a p-value for the Fisher’s exact test smaller than
0.01 and odds ratios greater than 1 (ORs>1). The results show that, except for
three systems (eclipse.team.cvs.core, jabref and rhino), in most of the analyzed
releases, Java classes affected by at least one antipattern are more changeprone than other classes. In total, for 190 out of 244 releases (≈82%), classes
affected by at least one antipattern are more change-prone. These results
allow us to reject H1a and accept the alternative hypothesis that Java classes
affected by antipatterns are more likely to be changed than classes not affected
by them.
Table 3.5 shows the p-values of the Mann-Whitney tests and values of the
Cliff’s Delta d effect size for testing H1b . Only in 18 releases (≈7%) there is
4
5
http://cran.r-project.org/web/packages/orddom/index.html
http://www.r-project.org/
52
Table 3.4: Total number of releases (#Releases) and number of releases
for which Fisher’s exact test and OR show a significant association between
change-proneness and antipatterns in Java classes.
System
argo
hibernate2
hibernate3
eclipse.debug.core
eclipse.debug.ui
eclipse.jface
eclipse.jdt.debug
eclipse.team.core
eclipse.team.ui
jabref
mylyn
rhino
rapidminer
vuze
xerces
Total
#Releases
9
10
20
12
22
17
16
6
11
13
30
17
8
4
29
20
244
Fisher p-value < 0.01 & OR >1
9
10
19
8
20
16
16
4
5
9
3
17
2
4
29
19
190
no significant difference (Mann-Whitney p-value≥0.01) between the distributions of SCC performed in classes affected by antipatterns and in other classes.
In the other 226 releases (≈93%) the difference is significant (Mann-Whitney
p-value<0.01). Concerning the effect size we found that this difference is
small (0.147≤d<0.33) in 102 releases (≈42%), medium (0.33≤d<0.47) in
26 releases (≈11%), large (0.47≤d) in 9 releases (≈4%) and negligible (d <
0.147) in 89 releases (≈36%). Based on these results we reject H1b and accept the alternative hypothesis that in most cases Java classes with antipatterns undergo more changes during the next release than classes that are free
of antipatterns.
Based on these findings we can answer RQ1: Java classes affected by antipatterns are more change-prone than other classes. The results confirm the
findings of the previous study [Khomh et al., 2012], this time taking into account the type of changes, and filtering out non source code changes such as
changes to indentations and comments.
3.2.2
The goal of RQ2 is to test whether certain antipatterns lead to more changes
in Java classes than other antipatterns. The basic idea is to assist software engineers in identifying the most change-prone classes affected by antipatterns.
53
Table 3.5: p-values of the Mann-Whitney (M-W) tests and Cliff’s Delta d showing the magnitude of the difference between the distribution of SCC in classes
affected and not affected by antipatterns.
System
argo
hibernate2
hibernate3
eclipse.debug.core
eclipse.debug.ui
eclipse.jface
eclipse.jdt.debug
eclipse.team.core
eclipse.team.ui
jabref
mylyn
rhino
rapidminer
vuze
xerces
total
#Releases
9
10
20
12
22
17
16
6
11
13
30
17
8
4
29
20
244
0.47≤d
0
0
0
4
0
0
0
0
1
1
0
0
2
0
0
1
9
M-W<0.01
0.33≤d<0.47
0.147≤d<0.33
1
6
1
6
3
7
2
4
0
14
0
12
1
8
1
3
3
4
4
3
3
11
2
9
0
0
0
0
2
7
3
8
26
102
≥0.01
d≤0.147
2
3
10
1
8
4
5
0
3
1
16
6
0
4
20
6
89
0
0
0
1
0
1
2
2
0
4
0
0
6
0
0
2
18
They should be resolved first. We address RQ2 by testing the following null
hypotheses:
• H2: The distribution of SCC is not different for classes affected by different antipatterns.
Analysis Method
As dependent variable we use the number of SCC performed in a class between
two releases #SCC(k,k+1). As independent variable we use a binary variable
for each antipattern that denotes whether a class is affected by a particular
antipattern. To test H2 we use the Mann-Whitney test and Cliff’s Delta d
effect size over all releases for a system. We selected all releases per system
since some releases had too few data points (e.g., there have been only 6 SCC
between releases 1.6R3 and 1.6R4 of Rhino). The orddom package used to
compute Cliff’s Delta d is not optimized for very big data sets. Therefore, in
cases of systems with more than 5000 data points (i.e., more than 5000 classes
experiencing changes over the revision history), we randomly sampled 5000
data points 30 times and computed the average of the obtained Cliff’s Delta
values. This sampling allows us to compute Cliff’s Delta values for each system
54
with a confidence level of 99% and a confidence interval of 0.004; which is a
very precise estimation.
Table 3.6: Cliff’s Delta d effect sizes of cases for which Mann-Whitney shows
a significant difference (p-value<0.01) or NA otherwise. Values in bold denote the largest difference per system. For the underlined systems we applied
random sampling.
System
argo
hibernate2
hibernate3
eclipse.debug.core
eclipse.debug.ui
eclipse.jface
eclipse.jdt.debug
eclipse.team.core
eclipse.team.ui
jabref
mylyn
rhino
rapidminer
vuze
xerces
Median
#AS
0.311
0.143
0.171
0.553
0.169
0.461
0.277
0.422
0.026
0.290
0.089
-0.020
0.276
0.051
0.151
0.302
0.223
#Blob
0.098
0.112
0.086
0.352
0.299
NA
0.182
0.433
0.374
0.293
0.001
0.150
NA
0.060
0.076
0.104
0.131
#CDBSP
0.331
0.193
0.064
0.419
0.150
NA
0.078
NA
0.085
0.212
0.019
0.177
0.393
-0.001
0.079
0.044
0.117
#ComplexC
0.226
0.500
0.386
0.889
0.454
0.411
0.485
0.581
0.723
0.395
0.094
0.388
0.119
0.141
0.211
0.541
0.403
#LazyC
-0.012
NA
-0.110
NA
0.147
NA
0.103
NA
NA
NA
NA
NA
NA
NA
NA
NA
0.045
#LongM
0.192
0.149
-0.172
0.544
0.231
0.266
0.250
0.33
0.331
0.265
0.072
0.192
0.067
0.051
0.121
0.269
0.211
System
argo
hibernate2
hibernate3
eclipse.debug.core
eclipse.debug.ui
eclipse.jface
eclipse.jdt.debug
eclipse.team.core
eclipse.team.ui
jabref
mylyn
rhino
rapidminer
vuze
xerces
Median
#LPL
0.148
0.347
0.169
0.691
0.169
NA
0.295
0.107
0.172
0.163
0.044
0.232
0.025
0.080
0.106
0.327
0.169
#MsgC
0.248
0.250
0.170
0.289
0.227
0.385
0.137
0.315
0.329
0.187
0.042
0.228
0.100
0.051
0.140
0.122
0.207
#RPB
0.035
-0.032
0.016
0.435
NA
NA
0.051
NA
NA
NA
-0.006
0.063
NA
-0.002
-0.021
0.036
0.025
#Spaghetti
0.354
0.262
0.191
NA
0.377
NA
0.361
NA
NA
0.642
0.356
NA
0.928
NA
0.308
0.153
0.355
#SG
0.030
NA
NA
0.298
0.009
NA
NA
0.373
NA
0.183
NA
NA
NA
NA
0.028
0.307
0.183
#Swiss
0.528
0.654
0.662
0.650
0.514
NA
0.919
NA
NA
NA
0.966
NA
NA
0.600
0.213
0.565
0.625
Results
Table 3.6 shows the values for Cliff’s Delta d effect size for which the p-value
of the Mann-Whitney is significant (p-value<0.01). NA denotes p-values for
Mann-Whitney greater than 0.01 and consequently Cliff’s Delta is not computed.
55
The results of the Mann-Whitney tests show that, except for the LazyClass
and SpeculativeGenerality (SG), the distributions of SCC performed in classes
affected by a specific antipattern are different from the distribution of SCC
performed in classes not affected by that antipattern. According to the median
values for Cliff’s Delta shown in the last row of Table 3.6, this difference is
large for SwissArmyKnife (Swiss), medium for 2 antipatterns (0.33≤d<0.47),
small for 5 antipatterns (0.147≤d<0.33) and negligible for 4 antipatterns.
Note, that for classes affected by LazyClass and SG the Mann-Whitney test was
significant only in 4 and respectively 7 systems.
Looking at the values in bold we can see that classes affected by the ComplexClass (ComplexC), SpaghettiCode (Spaghetti) and SwissArmyKnife (Swiss)
antipatterns are more change-prone than classes affected by any other antipattern. More specifically, in 8 systems out of 16 the Cliff’s Delta effect size
is highest for classes affected by SwissArmyKnife. In 4 systems the Cliff’s Delta
effect size is higher for classes affected by ComplexClass. In the other 3 systems
the highest effect size is for classes affected by SpaghettiCode. Only in one system, namely eclipse.jface, the Antisingleton antipattern shows the highest value
for Cliff’s Delta.
Based on these results we reject H2 and we conclude that among all
classes the classes affected by the ComplexClass, SpaghettiCode, and SwissArmyKnife antipatterns are more change-prone. These results detail the findings in [Khomh et al., 2012] by highlighting three antipatterns that are more
change-prone than the other antipatterns. Moreover, the new findings allow
us to advice software engineers to focus on detecting instances of these three
change-prone antipatterns and fix them first.
3.2.3
To address RQ3, we analyze the relationship between different antipatterns
and different types of changes. The goal is to further assist software engineers
by verifying whether a particular type of changes is more likely to be performed in classes affected by a specific antipattern. This knowledge can help
engineers to avoid or fix certain antipatterns leading to changes that impact
large parts of the rest of a software system, such as changes in the method
declarations of a class that exposes a public API. We answer RQ3 by testing
the following null hypothesis:
• H3: The distributions of different types of SCC performed in classes
affected by different antipatterns are not different.
56
Analysis Method
To test H3 we categorize the changes mined with ChangeDistiller in five different categories as listed in Table 3.1. As dependent variables we use the change
type categories representing the number of SCC that fall in each category. As
for H2, the independent variables are the set of binary variables that denote
whether a class is affected by a specific antipattern or not. We test the difference in the distributions of SCC per category using the Mann-Whitney test and
compute the magnitude of the difference with the Cliff’s Delta d effect size. In
order to have enough data about each change type category we use the data
from all systems as input for this analysis. Similar to H2, we use the random
sampling approach for computing Cliff’s Delta and we report the mean effect
size of the 30 random samples.
Results
Table 3.7 lists the results of this analysis. Values in bold denotes differences
that are at least small according to Cliff’s Delta. They show that changes in
the class and methods declaration (API) are more likely to appear in classes
affected by the ComplexClass, SpaghettiCode and SwissArmKnife antipatterns.
Changes in the functionalities (func) are likely in classes affected by the ComplexClass and SpaghettiCode antipatterns. Changes in the execution statements
(stmt) are likely to appear in classes affected by the Antisingleton, ComplexClass, SpaghettiCode and SwissArmyKnife antipatterns. Finally, changes in the
condition expressions and else-parts (cond) are more frequent in classes affected by the SpaghettiCode antipattern. Based on these results we reject H3
and conclude that classes affected by different antipatterns undergo different
types of changes.
3.2.4
Manual Inspection
To further highlight the relationship between antipatterns and change proneness we manually inspected several classes affected by antipatterns that have
been resolved. For these classes we analyzed the number of changes before
and after the removal of the antipatterns. The analysis clearly shows that when
classes are affected by an antipattern they undergo a considerably higher number of changes.
For instance, the class org.apache.xerces.StandardParserConfiguration from the
Xerces system. This class was affected by the ComplexClass antipattern until
the release 2.0.2. Before release 2.0.2, the class underwent on average 64.5
changes per release. The average number of changes decreased to 5.2 after the
antipattern was removed. Furthermore, the average number of API changes
decreased from 2 to 0.07.
57
Table 3.7: Cliff’s Delta d effect sizes of cases for which Mann-Whitney shows a
significant difference (p-value<0.01) or NA otherwise. Values in bold denote
an effect size that is at least small (d > 0.147).
Group
API
oState
func
stmt
cond
#Antisingleton
0.131
0.080
0.084
0.157
0.080
#Blob
0.077
0.048
0.057
0.077
0.035
#CDBSP
0.038
0.031
0.019
0.051
0.028
#ComplexC
0.213
0.144
0.153
0.252
0.138
#LazyC
-0.043
NA
-0.040
NA
-0.020
#LongM
0.073
0.042
0.053
0.140
0.059
Group
API
oState
func
stmt
cond
#LPL
0.095
0.060
0.076
0.146
0.081
#MsgC
0.075
0.045
0.054
0.120
0.058
#RPB
0.001
-0.001
-0.002
0.100
0.001
#Spaghetti
0.207
0.126
0.149
0.308
0.178
#SG
0.029
-0.001
NA
0.007
0.100
#Swiss
0.150
0.109
0.142
0.245
0.136
As another example, consider the views.memory.AddMemoryBlockAction class
from the eclipse.debug.ui system. This class was affected by the SpaghettiCode
antipattern until the release 3.2. The average number of changes decreased
from 79.83 to 1.5 after the release 3.2. Moreover the average number of cond
changes decreased from 2.67 to 0.1.
3.2.5
Implications of Results
In summary, we see two main implications of our results that concern software
engineers and researchers. Concerning the researcher, our results provide a
deeper insight into the effects of antipatterns on the change-proneness of Java
classes. First, we confirmed the results from [Khomh et al., 2012] but this
time taking into account the type of changes (see RQ1). Second, we identified
three antipatterns, namely ComplexClass, SpaghettiCode and SwissArmyKnife
that lead to change-prone classes (see RQ2). Third and most of all, we showed
that certain antipatterns lead to certain types of changes (see RQ3). This helps
to focus our research on a sub-set of antipatterns, namely the most changeprone ones.
Regarding the software engineer, the results of our study have several implications. In particular, the results for RQ2 and RQ3 show that software
engineers should focus on detecting and resolving the three antipatterns ComplexClass, SpaghettiCode and SwissArmyKnife. Classes affected by these antipatterns turned out to be the most change-prone ones, therefore resolving
instances of these antipatterns helps to prevent changes in their APIs. In particular, because API changes can have a significant impact on the implementa-
58
tion of the other parts of a software system therefore should be prevented.
For instance, consider the scenario in which APIs are made available through
web services. The responsible software engineers want to assure the robustness of these classes to minimize the possibility of breaking the clients of the
web services. Based on the results of our study they can use DECOR to detect
instances of the ComplexClass, SpaghettiCode and SwissArmyKnife antipatterns
in the set of API classes. These are the antipatterns they should resolve first in
order to reduce the probability that APIs are changed and, hence, that clients
are broken.
3.3
Threats to Validity
This section discusses the threats to validity that can affect the results of our
empirical study.
Threats to construct validity concern the relationship between theory and
observation. In our study, this threat can be due to the fact that we considered
SCC performed in between two subsequent releases. However, the effects of
antipatterns can manifest themselves after the next immediate release whenever the class affected by antipatterns needs to be changed. We mitigated this
threat by testing all the hypotheses taking into account all the SCC performed
after a release for which we obtained similar results.
Threats to internal validity concern factors that may affect an independent variable. In our study, both the independent and dependent variables
are computed using deterministic algorithms (implemented in ChangeDistiller
and DECOR) delivering always the same results.
Threats to conclusion validity concern the relationship between the treatment and the outcome. To mitigate these threats our conclusions have been
supported by proper statistical tests, in particular by non-parametric tests that
do not require any assumption on the underlying data distribution.
Threats to external validity concern the generalization of our findings. Every result obtained through empirical studies is threatened by the bias of their
datasets [Menzies et al., 2007]. To mitigate these threats we tested our hypotheses over 16 open-source systems of different size and from different domains.
Threats to reliability validity concern the possibility of replicating our study
and obtaining consistent results. We mitigated these threats by providing all
the details necessary to replicate our empirical study. The systems under analysis are open-source and the source code repositories are publicly available.
3.4. Related Work
59
Moreover, we published on-line the raw data to allow other researches to replicate our study and to test other hypotheses on our dataset.
3.4
Related Work
In this section, we discuss the related literature on antipatterns in relation to
software evolution.
Code Smells/Antipatterns Detection Techniques. The first book on antipatterns in object-oriented development was written in Webster [1995]. The
book made several contributions on conceptual, political, coding, and qualityassurance problems. Fowler [1999] defined 22 code smells, suggesting where
developers should apply refactorings. Mantyla [2003] and Wake [2003] proposed classifications for code smells. Brown et al. [1998] described 40 antipatterns, including the Blob, the Spaghetti Code, and the MessageChain. These
books provide in-depth views on heuristics, code smells, and antipatterns, and
are the basis of all approaches to detect (semi-)automatically code smells and
antipatterns, such as DECOR [Moha et al., 2010] used in this study.
Several approaches to specify and detect code smells and antipatterns exist in the literature. They range from manual approaches, based on inspection techniques [Travassos et al., 1999], to metric-based heuristics [Marinescu, 2004; Munro, 2005; Oliveto et al., 2010], using rules and thresholds on various metrics or Bayesian belief networks [Khomh et al., 2011].
Some approaches for complex software analysis use visualization [Dhambri
et al., 2008; Simon et al., 2001]. Although visualization is sometimes considered as an interesting compromise between fully automatic detection techniques, which are efficient but loose track of the context, and manual inspections, which are slow and subjective [Langelier et al., 2005], visualization requires human expertise and is thus time-consuming. Sometimes, visualization
techniques are used to present the results of automatic detection approaches
[Lanza and Marinescu, 2006; van Emden and Moonen, 2002]. This previous
work significantly contributed to the specification and detection of antipatterns. The approach used in this study, DECOR, builds on this previous work.
Code Smells/Antipatterns and Software Evolution. Deligiannis et al. [Ignatios et al., 2003, 2004] proposed the first quantitative study of the relation
between antipatterns and software quality. They performed a controlled experiments with 20 students on two software systems to understand the impact
of Blobs on the understandability and maintainability of software systems. The
results of their study suggested that Blob classes considerably affect the evolu-
60
tion of design structures, in particular the use of inheritance. Bois et al. [2006]
showed that the decomposition of Blob classes into a number of collaborating
classes using refactorings can improve comprehension. Abbes et al. [2011]
conducted three experiments, with 24 subjects each, to investigate whether
the occurrence of antipatterns does affect the understandability of systems
by developers during comprehension and maintenance tasks. They concluded
that although the occurrence of one antipattern does not significantly decrease
developers’ performance, a combination of two antipatterns impedes significantly developers’ performance during comprehension and maintenance tasks.
Li and Shatnawi [2007] investigated the relationship between the probability of a class to be faulty and some antipatterns based on three versions
of Eclipse and showed that classes with antipatterns Blob, Shotgun Surgery,
and Long Method have a higher probability to be faulty than other classes.
Olbrich et al. [2009], analyzed the historical data of Lucene and Xerces over
several years and concluded that classes with the antipatterns Blob and Shotgun Surgery have a higher change frequency than other classes; with Blob
classes featuring more changes. However, they did not investigated the kinds
of changes performed on the antipatterns.
Using Azureus and Eclipse, we investigated the impact of code smells on
the change-proneness of classes and showed that in general, the likelihood
for classes with code smells to change is very high [Khomh et al., 2009]. In
[Khomh et al., 2012] we also investigated the relation between the presence
of antipatterns and the change- and fault-proneness of classes. We found that
classes participating in antipatterns are significantly more likely to be subject
to changes and to be involved in fault-fixing changes than other classes. Furthermore, we also investigated the kind of changes, namely structural and
non-structural changes, experienced by classes with antipatterns. Structural
changes are changes that alter a class interface while non-structural changes
are changes to method bodies. We found that in general structural changes
are more likely to occur in classes participating in antipatterns. The main difference with this work is that we detailed the changes into 40 types of source
code changes classified in 5 change type categories. This detailed information
about changes allowed us to analyze which antipatterns lead to which types of
source code changes. Also, this work is performed with more systems, namely
16, compared to previous work which was done with only 4 systems.
3.5
Conclusion and Future Work
Antipatterns have been defined to denote poor solutions to design and implementation problems. Previous studies have shown that classes affected by
3.5. Conclusion and Future Work
61
antipatterns are more change-prone than other classes. In this chapter we provide a deeper insight into which antipatterns lead to which types of changes
in Java classes. We analyzed the change-proneness of these classes taking into
account 40 types of fine-grained source code changes (SCC) extracted from
the version control repositories of 16 Java open-source systems. Our results
show that:
• Classes affected by antipatterns change more frequently along the evolution of a system, confirming previous findings (see RQ1).
• Classes affected by the ComplexClass, SpaghettiCode and SwissArmyKnife
antipatterns are more likely to be changed than classes affected by other
antipatterns (see RQ2).
• Certain antipatterns lead to certain types of source code changes, such
as API changes are more likely to appear in classes affected by the ComplexClass, SpaghettiCode and SwissArmyKnife antipatterns (see RQ3).
Our results have several implications on software engineers and researchers.
Regarding researchers our results suggest to focus our efforts on understanding a subset of antipatterns that lead to change-prone classes or changes with
a high impact on the other parts of a software system. Concerning software
engineers, our results provide strong evidence to use antipatterns detection
tools, such as DECOR, to detect and resolve ComplexClass, SpaghettiCode and
SwissArmyKnife antipatterns. Resolving them shows to be beneficial in terms
of preventing source code changes, such as API changes, that impact other
parts of a system.
In future work, we plan to perform a more extended qualitative analysis
of antipatterns. We also plan to enlarge our data set and analyze industrial
software systems. Another direction of future work is to analyze the types
of changes performed when antipatterns are introduced and when they are
resolved. These analysis are needed to further estimate the development and
maintenance costs caused by antipatterns.
.
4
Fine-Grained WSDL Changes
In the service-oriented paradigm web service interfaces are considered contracts
between web service consumers and providers. However, these interfaces are
continuously evolving over time to satisfy changes in the requirements and to
fix bugs. Changes in a web service interface typically affect the systems of its
consumers. Therefore, it is essential for consumers to recognize which types
of changes occur in a web service interface in order to analyze the impact on
his/her systems.
In this chapter we propose a tool called WSDLDiff to extract fine-grained
changes from subsequent versions of a web service interface defined in WSDL.
In contrast to existing approaches, WSDLDiff takes into account the syntax of
WSDL and extracts the WSDL elements affected by changes and the types of
changes. With WSDLDiff we performed a study aimed at analyzing the evolution
of web services using the fine-grained changes extracted from the subsequent
versions of four real world WSDL interfaces.
The results of our study show that the analysis of the fine-grained changes
helps web service consumers to highlight the most frequent types of changes
affecting a WSDL interface. This information can be relevant for web service
consumers who want to assess the risk associated to the usage of web services
and to subscribe to the most stable ones.1
4.1
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2
WSDLDiff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.3
Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.4
Conclusion & Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Over the last decades, the evolution of software systems has been studied
in order to analyze and enhance the software development and maintenance
processes. Among other applications, the information mined from the evolu1
This chapter was published in the 19th International Conference on Web Services (ICWS
2012) [Romano and Pinzger, 2012].
63
64
Chapter 4. Fine-Grained WSDL Changes
tion of software systems has been applied to investigate the causes of changes
in software components [Khomh et al., 2009; Penta et al., 2008]. Software
engineering researchers have developed several tools to extract information
about changes from software artifacts [Fluri et al., 2007] [Tsantalis et al.,
2011] [Xing and Stroulia, 2005a] and to analyze their evolution.
In service-oriented systems understanding and coping with changes is even
more critical and challenging because of the distributed and dynamic nature
of services [Papazoglou, 2008]. In fact, service providers do not necessarily know the service consumers and how changes on a service can impact
the existing service clients. For this reason service interfaces are considered
contracts between providers and consumers and they should be as stable as
possible [Erl, 2007]. On the other hand, services are continuously evolving
to satisfy changes in the requirements and to fix bugs. Recognizing the types
of changes is fundamental for understanding how a service interface evolves
over time. This can help service consumers to quantify the risk associated
to the usage of a particular service and to compare the evolution of different
services with similar features. Moreover, detailed information about changes
allow software engineering researchers to analyze the causes of changes in a
service interface.
In order to analyze the evolution of WSDL2 interfaces, Fokaefs et al. [2011]
propose a tool called VTracker. This tool is based on the Zhang-Shasha’s treeedit distance [Zhang and Shasha, 1989] comparing WSDL interfaces as XML3
documents. However, VTracker does not take into account the syntax of WSDL
interfaces. As consequence, their approach outputs only the percentage of
added, changed and removed XML elements. We argue that this information
is inadequate to analyze the evolution of WSDL interfaces without manually
checking the types of changes and the WSDL elements affected by changes.
Moreover, their approach of transforming a WSDL interface into a simplified
representation can lead to the detection of multiple changes while there has
been only one change.
In this chapter we propose a tool called WSDLDiff that compares subsequent versions of WSDL interfaces to automatically extract the changes. In
contrast to VTracker, WSDLDiff takes into account the syntax of WSDL and
XSD,4 used to define data types in a WSDL interface. In particular, WSDLDiff
extracts the types of the elements affected by changes (e.g., Operation, Message, XSDType) and the types of changes (e.g., removal, addition, move, at2
http://www.w3.org/TR/wsdl
http://www.w3.org/XML/
4
http://www.w3.org/XML/Schema
3
4.1. Related Work
65
tribute value update). We refer to these changes as fine-grained changes. The
fine-grained changes extraction process of WSDLDiff is based on the UMLDiff
algorithm [Xing and Stroulia, 2005a] and has been implemented on top of the
Eclipse Modeling Framework (EMF).5
With WSDLDiff we performed a study aimed at analyzing the evolution of
web services using the fine-grained changes extracted from subsequent versions of four real world WSDL interfaces. We address the following two research questions:
• RQ1: What is the percentage of added, changed and removed elements
of a WSDL interface?
• RQ2: Which types of changes are made to the elements of a WSDL
interface?
The study shows that different WSDL interfaces are affected by different types
of changes highlighting how they are maintained with different strategies.
While in one case mainly Operations were added continuously, in the other
three cases the data type specifications were the most affected by changes.
Moreover, we found that in all four WSDL interfaces under analysis there is a
type of change that is predominant. From this information web service consumers can be aware of the frequent types of changes when subscribing to a
web service and they can compare the evolution of web services that provide
similar features in order to subscribe to the most stable web service.
The remainder of this chapter is organized as follows. In Section 4.1 we
report the related work and we discuss the main differences with our work.
Section 4.2 describes the WSDLDiff tool and the process to extract fine-grained
changes implemented into it. The study and results are presented in Section 4.3. We draw our conclusions and outline directions for future work in
Section 4.4.
4.1
Related Work
Fokaefs et al. [2011] analyzed the evolution of web services using a tool called
VTracker. This tool is based on the Zhang-Shasha’s tree edit distance algorithm
[Zhang and Shasha, 1989], which calculates the minimum edit distance between two trees. In this study the WSDL interfaces are compared as XML files.
Specifically the authors created an intermediate XML representation to reduce
5
http://www.eclipse.org/modeling/emf/
66
the verbosity of the WSDL specification. In this simplified XML representation, among other transformations, the authors trace the references between
messages parameters (Parts) and data types (XSDTypes) and they replace the
references with the data types themselves. The output of their analysis consists of the percentage of added, changed and removed elements among the
XML models of two WSDL interfaces. There are two main differences between
our work and the approach proposed by Fokaefs et al. First, we compute the
changes between WSDL models taking into account the syntax of WSDL and
XSD and, hence, extracting the type of the elements affected by changes (e.g.,
Operation, Message, XSDType) and the types of changes (e.g., removal, addition, move, attribute value update). For example, WSDLDiff extracts differences in the order of the elements only if it is relevant, such as changes in the
order of Parts defined in a Message. Our approach is aware of irrelevant order
changes, such as changes in the order of XSDTypes defined in the WSDL types
definition. This allows us to analyze the evolution of a WSDL interface only
looking at the changes without manually inspecting the XML coarse-grained
changes. Second, WSDLDiff does not replace the references to data types with
the data types themselves. This transformation can lead to the detection of a
change in a data type multiple times while there has been only one change.
Wang and Capretz [2009] proposed an impact analysis model based on
service dependency. The authors analyze the service dependencies graph
model, service dependencies and the relation matrix. Based on this information they infer the impact of the service evolution. However, they do not
propose any technique to analyze the evolution of web services. Aversano
et al. [2005] proposed an approach to understand how relationships between
sets of services change across service evolution. Their approach is based on
formal concept analysis. They used the concept lattice to highlight hierarchy
relationships and to identify commonalities and differences between services.
While the work proposed by Aversano et al. consists in extracting relationships
among services, our work focuses on the evolution of single web services using
fine-grained changes. As future work the two approaches can be integrated to
correlate different types of changes with the different relationships.
In literature several approaches have been proposed to measure the similarity of web services (e.g., [Liu et al., 2010] [Plebani and Pernici, 2009]).
However, these approaches compute the similarity amongst WSDL interfaces
to assist the search and classification of web services and not to analyze their
evolution.
Concerning the model differencing techniques, the approach proposed by
Xing et al. [Xing and Stroulia, 2005a] [Xing and Stroulia, 2005b] is most
4.2. WSDLDiff
67
relevant for our work. In fact, their algorithm to infer differences among
UML6 diagrams has been implemented by the EMF Compare7 that we used to
implement our tool WSDLDiff. The authors proposed the UMLDiff algorithm
for detecting structural changes between the designs of subsequent versions of
object oriented systems, represented through UML diagrams. This algorithm
has been later adapted in the EMF Compare to compare models conforming
to any arbitrary metamodel and not only UML models [Brun and Pierantonio,
2008].
Several approaches have been proposed to classify changes in service interfaces. For instance Feng et al. [2011] and Treiber et al. [2008] have proposed
approaches to classify the changes of web services taking into account their
impact to different stakeholders. These classifications can be easily integrated
in our tool to classify the different fine-grained changes extracted along the
evolution of a web service.
As can be deduced from the overview of related work there currently does
not exist any tool for extracting fine-grained changes amongst web services.
In this chapter, we present such a tool based on the UMLDiff algorithm [Xing
and Stroulia, 2005a].
4.2
WSDLDiff
In this section, we illustrate the WSDLDiff tool used to extract the fine-grained
changes between two versions of a WSDL interface. Since the tool is based
on the Eclipse Modeling Framework, we first present an overview of this framework and then we describe the fine-grained changes extraction process implemented by WSDLDiff. A first prototype of WSDLDiff is available on our web
site.8
4.2.1
Eclipse Modeling Framework
The Eclipse Modeling Framework (EMF) is a modeling framework that lets developers build tools and other applications based on a structured data model.
This framework provides tools to produce a set of Java classes from a model
specification and a set of adapter classes that enable viewing and editing of
the models. The models are described by meta models called Ecore.
As part of the EMF project, there is the EMF Compare plug-in. It provides
comparison and merge facilities for any kind of EMF Models through a frame6
http://www.uml.org/
http://www.eclipse.org/emf/compare/
8
WSDLDiff.zip
7
68
WSDL Version1 A
B C WSDL Version2 WSDL Parser WSDL Parser org.eclipse.wst.wsdl org.eclipse.xsd org.eclipse.wst.wsdl org.eclipse.xsd WSDL Model1 WSDL Model2 XSD Transformer XSD Transformer WSDL Model1’ WSDL Model2’ Matching Engine org.eclipse.compare.match Match Model D
Differencing Engine org.eclipse.compare.diff Diff Model Figure 4.1: The process implemented by WSDLDiff to extract fine-grained
changes between two versions of a WSDL interface.
work easy to be used and extended to compare instances of EMF Models. The
Eclipse community provides already an Ecore meta model for WSDL interfaces,
including a meta model for XSD, and tools to parse them into EMF Models. We
use these features to parse and extract changes between WSDL interfaces as
described in the following.
4.2.2
Fine-Grained Changes Extraction Process
Figure 4.1 shows the process implemented by WSDLDiff to extract fine-grained
changes between two versions of a WSDL interface. The process consists of
four stages:
• Stage A: in the first stage we parse the WSDL interfaces using the APIs
provided by the org.eclipse.wst.wsdl and org.eclipse.xsd projects. The output of this stage consists of the two EMF Models (WSDL Model1 and
4.2. WSDLDiff
69
WSDL Model2) corresponding to the two WSDL interfaces taken as input (WSDL Version1 and WSDL Version2).
• Stage B: in this stage we transform the EMF Models corresponding to the
XSD (contained by the WSDL models) in order to improve the accuracy
of the fine-grained changes extraction process as it will be shown in the
Subsection 4.2.4. The output of this stage consist of the transformed
models (WSDL Model1’ and WSDL Model2’).
• Stage C: in the third stage we use the Matching Engine provided by
the EMF Compare framework to detect the nodes that match in the two
models.
• Stage D: the Match Model produced by the Matching Engine is then used
to detect the differences among the two WSDL models under analysis.
This task is accomplished by the Differencing Engine provided also by
EMF Compare. The output of this stage is a tree of structural changes
that reports the differences between the two WSDL models. The differences are reported in terms of additions, removals, moves and modifications of each element specified in the WSDL and in the XSD.
In the next subsection we first illustrate the strategies behind EMF Compare
describing the matching (Stage C) and differencing (Stage D) stages and then
we describe the XSD transformation (Stage B).
4.2.3
Eclipse EMF Compare
The comparison facility provided by EMF Compare is based on the work developed by Xing and Stroulia [2005a]. This work has been adapted to compare
generic EMF Models instead of UML models as initially developed by Xing.
The comparison consists of two phases: (1) the matching phase (Stage C in
our approach) and (2) the differencing phase (Stage D in our approach). The
matching phase is performed computing a set of similarity metrics. These
metrics are computed for two nodes while traversing the two models under
analysis with a top-down approach. In the generic Matching Engine, provided
in org.eclipse.compare.match and used in our approach, the set of metrics consists of four similarity metrics:
• type similarity: to compute the match of the types of two nodes;
• name similarity: to compute the similarity between the values of the
attribute name of two nodes;
70
• value similarity: to compute the similarity between the values of other
attributes declared in the nodes;
• relations similarity: to compute the similarity of two nodes based on
the relationships they have with other nodes (e.g., children and parents
in the model).
Once the matching phase has been completed, it produces a matching
model consisting of all the entities that are matched in the two models. The
matching model is then used in the differencing phase to extract all the differences between the two models. Specifically, the matching model is browsed by
a Differencing Engine that computes the tree edit operations. These operations
represent the minimum set of operations to transform a model into an other
model. They are classified in added, changed, removed and moved operations.
For more details about the matching and differencing phases implemented by
EMF Compare we refer the reader to [Brun and Pierantonio, 2008].
4.2.4
XSD Transformation
In an initial manual validation of EMF Compare on WSDL models we found
that in a particular case the set of differences produced did not correspond to
the minimum set of tree edit operations. The problem was due to the EMF
Model used to represent the XSDs. For this reason we decided to add the XSD
Transformer. To better understand the problem behind the original EMF Model
and the solution adopted, consider the example shown in Figure 4.2. Figure 4.2a shows an XSDElement book that consists of an XSDModelGroup (the
element sequence) that contains two XSDElements (the elements author and
title). Figure 4.2b shows the original EMF Model parsed by the WSDL Parser
(Stage A in Figure 4.1). The EMF Model contains the nodes XSDParticle. These
nodes are necessary to represent the attributes minOccurs, maxOccurs and ref
for each XSDElement declared in an XSDModelGroup and for the XSDModelGroup itself.
The XSDParticles in the original model are parents of the elements to which
they are associated. This structure can lead to mistakes when the order of XSDElements within an XSDModelGroup changes. In this case, when the Matching
Engine traverses the models, it can detect a match between XSDParticles that
are associated to different XSDElements (e.g., a match between the XSDParticle
of the element author and the XSDParticle of the element title). This match
is likely because the values of the attributes minOccurs, maxOccurs and ref
are set to their default values. When this match occurs the Matching Engine
keeps traversing the model and it detects a mismatch when it traverses the
4.2. WSDLDiff
71
<xs:element name=”book"> <xs:complexType> <xs:sequence> <xs:element name=”author” type="xs:string"/> <xs:element name=”>tle" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> (a) Definition of an XSD element
XSDElement XSDElement book book XSDComplexType XSDComplexType XSDPar.cle XSDModelGroup XSDModelGroup XSDPar.cle XSDPar.cle XSDPar.cle XSDElement XSDElement author %tle (b) Original EMF Model
XSDElement author XSDPar.cle XSDElement %tle XSDPar.cle (c) Transformed EMF Model
Figure 4.2: An example that shows the XSD transformation performed by the
XSD Transformer in the Stage B of the fine-grained changes extraction process.
children of the previously matched XSDParticles (e.g., a mismatch between the
elements author and title). As consequence, even if there are no differences
among the models the Differencing Engine can produce the added XSDelement
title, the added XSDelement author, the removed XSDelement title and the removed XSDelement author as changes.
To overcome this problem, we decided to transform the EMF Model inverting the parent-child relationship in presence of XSDParticles as shown in
Figure 4.2c. In the transformed models, the Matching Engine traverses the XSDParticles only when a match is detected between the XSDElements to which
they are associated.
Besides this problem, in one case, WSDLDiff reported the removed Part
and added Part changes instead of the changed Part change when a Part was
renamed. However for this study the two set of changes are equivalent. For
72
this reason we have not considered it as a problem. Clearly, as part of our
future work we plan to validate the fine-grained changes extraction process
with a benchmark.
4.3
Study
The goal of this study is to analyze the evolution of web services through
the analysis of fine-grained changes extracted from subsequent versions of a
WSDL interface. The perspective is that of web services consumers interested
in extracting the types of changes that appear along the evolution of a web
service. They can analyze the most frequent changes in a WSDL interface
estimating the risk related to the usage of a specific element. The context
of this study consists of all the publicly available WSDL versions of four real
world web services, namely:
• Amazon EC2: Amazon Elastic Compute Cloud is a web service that
provides resizable compute capacity in the cloud. In this study we have
analyzed 22 versions.
• FedEx Rate Service: the Rate Service provides the shipping rate quote
for a specific service combination depending on the origin and destination information supplied in the request. We analyzed 10 different
versions.
• FedEx Ship Service: the Ship Service provides functionalities for managing package shipments and their options. 7 versions out of 10 have
been analyzed in this study.
• FedEx Package Movement Information Service: the Package Movement Information Service provides operations to check service availability, route and postal codes between origin and destination. We analyzed
3 versions out of 4. For the sake of simplicity we refer to this service as
FedEx Pkg.
We chose these web services because they were previously used by Fokaefs
et al. [2011]. The other web services analyzed by Fokaefs et al. [2011] (PayPal
SOAP API9 and Bing Search10 ) have not been considered because the previous
versions of the WSDL interfaces are not publicly available. For the same reasons not every version of the web services has been considered in our analysis.
9
https://www.paypalobjects.com/enUS/ebook/PPAPIReference/
architecture.html
10
http://www.bing.com/developers
4.3. Study
73
In Table 4.5 at the end of the chapter we report the size of the WSDL interfaces
in terms of number of Operations, number of Parts, number of XSDElements
and number of XSDTypes declared in each version. The size of the WSDL interfaces has been measured using the API provided by the org.eclipse.wst.wsdl
and org.eclipse.xsd Eclipse Plug-in projects.
The results reported in Table 4.5 show that the web services under analysis evolve differently. The number of Operations declared in the AmazonEC2
service is continuously growing and only in four versions does not change
(version 5, 7, 22 and 23). The number of Operations declared in the other
web services is more stable. Specifically, the FedEx Pkg service declares always
2 Operations. The FedEx Rate service declares 1 Operation in 9 versions out
of 10 and 2 Operations in 1 version (version 3). Concerning the FedEx Ship
service we can notice an increase in the number of Operations from version 1
to version 5. Then, the number of Operations decreases to 7 and it remains
stable until the current version (version 10).
To better understand the evolution of web services we used the WSDLDiff
tool to extract the fine-grained changes from subsequent versions of the WSDL
interfaces under analysis. In the next subsections we first show the types of
changes extracted in this study and then we present the results of the study
answering our research questions.
Table 4.1: Number of added, changed and removed WSDL and XSD elements
for each WSDL interface under analysis
WSDL
AmazonEC2
AmazonEC2
AmazonEC2
FedEx Rate
FedEx Rate
FedEx Rate
FedEx Ship
FedEx Ship
FedEx Ship
FedEx Pkg
FedEx Pkg
FedEx Pkg
Type
WSDL
XSD
Total
WSDL
XSD
Total
WSDL
XSD
Total
WSDL
XSD
Total
#Added
358
623
981 (≈80%)
3
236
239 (≈39%)
28
182
210 (≈38%)
0
0
0 (0%)
#Changed
34
166
200 (≈16%)
1
295
296 (≈49%)
4
298
302 (≈55%)
0
6
6 (100%)
#Deleted
46
5
51 (≈4%)
3
73
76 (≈12%)
8
28
36 (≈6%)
0
0
0 (0%)
74
4.3.1
Fine-Grained Changes
The output of WSDLDiff consists of the set of edit operations. These operations
are associated with the elements declared in the WSDL and XSD specifications.
Among all the elements the following WSDL elements have been detected as
affected by changes: BindingOperation, Operation, Message and Part. The XSD
elements detected as affected by changes are: XSDType, XSDElement, XSDAttributeGroup and XSDAnnotation. These elements were affected by the following fine-grained changes:
• XSD Element changes: consist of added XSDElements (XSDElementA),
removed XSDElements (XSDElementR) and moved XSDElements (XSDElementM) within a declaration of an XSDType or an XSDElement.
• Attribute changes: changes due to the update of an attribute value.
Specifically we detected changes to the values of the attributes name
(NameUpdate), minOccurs (MinOccursUpdate), maxOccurs (MaxOccursUpdate) and fixed (FixedUpdate).
• Reference Changes: consists of changes to a referenced value (RefUpdate).
• Enumeration Changes: changes of elements declared within an XSDEnumeration element. We detected added enumeration values (EnumerationA) and removed enumeration values (EnumerationR).
For the sake of simplicity we have presented only the changes detected in our
study. However WSDLDiff is able to detect changes to every element declared
in the WSDL and XSD specifications.
4.3.2
Research Question 1 (RQ1)
The first research question (RQ1) is:
What is the percentage of added, changed and removed elements of a WSDL
interface?
To answer RQ1, for each type of element declared in the WSDL and XSD specifications, we counted the number of times they have been added, changed,
or removed between every pair of subsequent versions of the WSDL interfaces
under analysis. We present the results in three different tables. In Table 4.2
we report the number of added, changed and deleted WSDL elements while
the added, changed and removed XSD elements are shown in Table 4.3. Table 4.1 summarizes the results showing the total number and the percentage
4.3. Study
75
of added, changed and deleted WSDL and XSD elements for each web service. The raw data with the changes extracted for each pair of subsequent
versions is available on our web site.11 In Table 4.2 we omitted the number of
added, changed and removed BindingOperations because they are identical to
the number of added, changed and removed Operations. Moreover, the added
and removed Parts do not include the Parts that were added and removed due
to the additions and deletions of Messages. This choice allows us to highlight
the changes in the Parts of existing Messages.
Table 4.2: Number of added Operations (OperationA), changed Operations
(OperationC), deleted Operations (OperationD), added Messages (MessageA),
changed Messages (MessageC), deleted Messages (MessageD), added Parts
(PartA), changed Parts (PartC) and deleted Parts (PartD) for each WSDL interface.
Change Type AmazonEC2 FedEx Rate FedEx Ship FedEx Pkg
OperationA
113
1
10
0
OperationC
0
1
0
0
OperationD
9
1
4
0
MessageA
218
2
16
0
MessageC
2
0
2
0
MessageD
10
2
2
0
PartA
27
0
2
0
PartC
34
0
0
0
PartD
27
0
2
0
Total
440
7
38
0
The results show that in all the web services the total number of deleted
elements is a small percentage of the total number of changes (see Table 4.1).
In particular, the percentage of deleted elements is approximately 4% for AmazonEC2, 12% for FedEx Rate and 6% for FedEx Ship. This result demonstrates
that web service providers do not tend to delete existing elements.
Concerning the number of added elements, the FedEx Rate and Ship services show approximately the same percentage (39% and 38%) while the
AmazonEC2 service shows a percentage of approximately 80%. These percentages need to be interpreted taking into account the added, changed and
removed WSDL and XSD elements. In fact, while the AmazonEC2 evolves con11
ICWS12RQ1.pdf
76
Table 4.3: Number of added XSDTypes (XSDTypeA), changed XSDTypes (XSDTypeC), deleted XSDTypes (XSDTypeD), added XSDElements (XSDElementA),
changed XSDElements (XSDElementC), deleted XSDElements (XSDElementD),
added XSDAttributeGroup (XSDAttributeGroupA) and changed XSDAttributeGroup (XSDAttributeGroupC) for each WSDL interface.
Change Type
AmazonEC2 FedEx Rate FedEx Ship FedEx Pkg
XSDTypeA
409
234
157
0
XSDTypeC
160
295
280
6
XSDTypeD
2
71
28
0
XSDElementA
208
2
25
0
XSDElementC
1
0
18
0
XSDElementD
0
2
0
0
XSDAttributeGroupA
6
0
0
0
XSDAttributeGroupC
5
0
0
0
Total
791
604
508
6
tinuously adding 113 new Operations (see Table 4.2), the FedEx services are
more stable with 1 new Operation added in FedEx Rate and 10 new Operations
added in FedEx Ship. However, despite the few number of new Operations
added in the FedEx services the number of added, changed and removed XSDTypes is high like in the AmazonEC2 service. This result lets us assume that the
elements added in the FedEx services modify old functionalities and, hence,
they are more likely to break the clients. Instead the AmazonEC2 is continuously evolving providing new Operations. This assumption is confirmed by
the percentage of changed elements, that is lower in AmazonEC2 (about 16%)
than in FedEx Rate and Ship (about 49% and 55%).
Based on these results we can answer RQ1 stating that in all four web
services the percentage of removed elements is a small percentage compared
to the total number of added, changed and removed elements. Concerning
the added elements the AmazonEC2 showed the highest percentage (≈80%)
due to the high number of new WSDL elements added along its evolution.
Instead the FedEx Rate and Ship services showed lower percentages (respectively about 39% and 38%). The percentage of changed elements is higher in
the FedEx Rate and Ship services (respectively about 49% and 55%) compared
to the approximately 16% of changed elements in AmazonEC2.
Answering RQ1 we decided to omit the analysis of the FedEx Pkg service
because the low number of changes and versions do not allow us to make any
4.3. Study
77
assumption.
4.3.3
Research Question 2 (RQ2)
The second research question (RQ2) is:
Which types of changes are made to the elements of a WSDL interface?
In order to address RQ2 we focused on the changes applied to XSDTypes. In
fact, among all the elements changed (802), 742 elements (approximately
92%) are XSDTypes (see Table 4.2 and 4.3). For each XSDType we extracted
the fine-grained changes and we report the results in Table 4.4. We omitted to
report the number of XSDAnnotation changes because they are not relevant for
our study. The raw data with the changes extracted for each pair of subsequent
versions is available on our web site.12
Table 4.4: Number of added XSDElements (XSDElementA), deleted XSDElements (XSDElementR), moved XSDElements (XSDElementM), updated attributes (NameUpdate, MinOccursUpdate, MaxOccursUpdate and FixedUpdate),
updated references (RefUpdate), added enumeration values (EnumerationA)
and removed enumeration values (EnumerationR) in the XSDTypes for each
WSDL interface.
Change Type
AmazonEC2 FedEx Rate FedEx Ship FedEx Pkg
XSDElementA
198
113
136
1
XSDElementD
11
47
49
3
XSDElementM
1
55
51
0
NameUpdate
11
20
8
0
MinOccursUpdate
17
33
39
0
MaxOccursUpdate
0
9
6
0
FixedValue
0
11
12
2
RefUpdate
9
80
273
0
EnumerationA
0
1141
926
2
EnumerationD
0
702
528
3
Total
247
2211
2028
11
The results show that the most frequent change along the evolution of
the AmazonEC2 is the XSDElementA. In fact, it accounts for around 80% (198
changes out of 247) of the total changes. Concerning the FedEx Rate and FedEx
12
ICWS12RQ2.pdf
78
Ship services, the EnumerationA changes are the most frequent, accounting for
approximately 51% (1141 changes out of 2211) and for 45% (926 changes
out of 2028) of all changes. Adding the EnumerationD changes, we obtain
approximately 83% (1843 changes out of 2211) and 71% (1454 changes out
of 2028) of changes occurring in the enumeration elements. The results show
that in 3 web services out of 4 there is a type of change that is predominant.
Based on this result web services consumers can become aware of the most
frequent types of changes affecting a WSDL interface. Like for RQ1, the small
number of changes in the FedEx Pkg does not allow any valid conclusion.
4.3.4
Summary and implications of the results
The changes collected in this study highlight how different WSDL interfaces
evolve differently. This study with the WSDLDiff tool can help services consumers to analyze which elements are frequently added, changed and removed and which types of changes are performed more frequently. For example, a developer who wants to integrate a FedEx service into his/her application can learn that the specification of data types changes most frequently
while Operations change only rarely (RQ1). In particular, the enumeration
values are the most unstable elements (RQ2). Instead, an AmazonEC2 consumer can be aware that new Operations are continuously added (RQ1) and
that data types are continuously modified adding new elements (RQ2).
4.4
Conclusion & Future Work
In this chapter we proposed a tool called WSDLDiff to extract fine-grained
changes between two WSDL interfaces. With WSDLDiff we performed a study
aimed at understanding the evolution of web services looking at the changes
detected by our tool. The results of our study showed that the fine-grained
changes are a useful means to understand how a particular web service evolves
over time. This information is relevant for web services consumers who want
1) to analyze the most frequent changes affecting a WSDL interface and 2) to
compare the evolution of different web services with similar features. From
this information they can estimate the risk associated to the usage of a web
service.
The study presented in this chapter is the first study on the evolution of
web services and we believe that our tool provides an essential starting point.
As future work, first we plan to investigate metrics that can be used as
indicators of changes in WSDL elements. For instance in our work shown in
Chapter 2, we found an interesting correlation between the number of changes
in Java interfaces and the external cohesion metric defined for services by
4.4. Conclusion & Future Work
79
Perepletchikov et al. [2010]. With our tool to extract fine-grained changes
we performed a similar study with WSDL interfaces that will be shown in
Chapter 6.
Next, we plan to classify the changes retrievable with WSDLDiff, integrating and possibly extending the works proposed by Feng et al. [2011] and
Treiber et al. [2008].
Finally, we plan to investigate the co-evolution of the different web services
composing a service oriented system. With WSDLDiff we can highlight web
services that evolve together, hence, violating the loosely coupling property.
This analysis can help us to investigate the causes of web services co-evolution
and techniques to keep their evolution independent.
80
Table 4.5: Number of Operations, Parts, XSDElements and XSDTypes declared
in each version of the WSDL interfaces under analysis
WSDL
AmazonEC2
AmazonEC2
AmazonEC2
AmazonEC2
AmazonEC2
AmazonEC2
AmazonEC2
AmazonEC2
AmazonEC2
AmazonEC2
AmazonEC2
AmazonEC2
AmazonEC2
AmazonEC2
AmazonEC2
AmazonEC2
AmazonEC2
AmazonEC2
AmazonEC2
AmazonEC2
AmazonEC2
AmazonEC2
FedEx Rate
FedEx Rate
FedEx Rate
FedEx Rate
FedEx Rate
FedEx Rate
FedEx Rate
FedEx Rate
FedEx Rate
FedEx Rate
FedEx Ship
FedEx Ship
FedEx Ship
FedEx Ship
FedEx Ship
FedEx Ship
FedEx Ship
FedEx Pkg
FedEx Pkg
FedEx Pkg
Ver. Operations Parts XSDElements XSDTypes
2
14
28
28
60
3
17
34
34
75
4
19
38
38
81
5
19
38
38
81
6
20
40
40
87
7
20
40
40
85
8
26
52
52
111
9
34
68
68
137
10
37
74
74
151
11
38
76
76
157
12
41
82
82
171
13
43
86
86
179
14
65
130
130
259
15
68
136
136
272
16
74
148
148
296
17
81
162
162
326
18
87
174
174
350
19
91
182
182
366
20
95
190
190
390
21
118
236
236
464
22
118
236
236
465
23
118
236
236
467
1
1
2
2
72
2
1
2
2
80
3
2
4
4
88
4
1
2
2
124
5
1
2
2
129
6
1
2
2
178
7
1
2
2
202
8
1
2
2
223
9
1
2
2
228
10
1
2
2
235
2
1
2
2
124
5
9
16
16
178
6
9
16
16
177
7
7
12
12
199
8
7
12
12
221
9
7
12
12
246
10
7
12
12
254
2
2
4
4
20
3
2
4
4
20
4
2
4
4
20
.
5.
Dependencies among Web APIs
Service Oriented Architecture (SOA) enables organizations to react to requirement changes in an agile manner and to foster the reuse of existing services.
However, the dynamic nature of service oriented systems and their agility bear
the challenge of properly understanding such systems. In particular, understanding the dependencies among services is a non trivial task, especially if service oriented systems are distributed over several hosts belonging to different
departments of an organization.
In this chapter, we propose an approach to extract dynamic dependencies
among web services. The approach is based on the vector clocks, originally conceived and used to order events in a distributed environment. We use the vector
clocks to order service executions and to infer causal dependencies among services.
We show the feasibility of the approach by implementing it into the Apache
CXF framework and instrumenting the SOAP messages. We designed and executed two experiments to investigate the impact of the approach on the response
time. The results show a slight increase that is deemed to be low in typical industrial service oriented systems.1
5.1
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.2
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.3
Study Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.4
Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.5
Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.6
Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.7
Conclusion & Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 98
IT organizations need to be agile to react to changes in the market. As a
consequence they started to develop their software systems as Software as a
1
This chapter was published in the 4th International Conference on Service Oriented Computing and Application (SOCA 2011) [Romano et al., 2011].
81
82
Chapter 5. Dependencies among Web APIs
Service (SaaS), overcoming the poor inclination of monolithically architected
systems towards agility. Hence, the adoption of Service Oriented Architectures
(SOAs) has become popular. In addition, SOA-based application development
also aims at reducing development costs through service reuse.
On the other hand, mining dependencies between services in a SOA is
relevant to understand the entire system and its evolution over time. The distributed and dynamic nature of those architectures makes this task particularly
challenging.
In order to get an accurate picture of the dependencies within a SOA system a dynamic analysis is required. Using static analyses simply fails to cover
important features of a SOA architecture, for example the ability to perform
dynamic binding. To the best of our knowledge, existing technologies used
to deploy a service oriented system do not provide tool to accurately detect
the entire chain of dependencies among services. For instance, open source
Enterprise Service Bus systems (e.g., MuleESB2 and ServiceMix3 ) are limited
to detect only direct dependencies (i.e., invocation between pair of services).
Such monitoring facilities are widely implemented through the wire tap and
the message store patterns described by Hohpe and Woolf [2003]. Other tools,
such as HP OpenView SOA Manager4 , allow the exploration of the dependencies, but they must explicitly be specified by the user [Basu et al., 2008].
In this chapter, we propose (1) an adaptation of our approach based on
vector clocks [Romano and Pinzger, 2011b] to extract dynamic dependencies
among web services deployed in an enterprise; (2) a non-intrusive, easy-toimplement and portable implementation and (3) an analysis of the impact of
our approach on the performance.
Vector clocks have originally been conceived and used to order events in
a distributed environment [Mattern, 1989; Fidge, 1988]. We bring this technique to the domain of service oriented systems by attaching the vector clocks
to SOAP messages and use them to order service executions and to infer causal
dependencies.
The approach has been implemented into the Apache CXF5 framework
taking advantage of the Pipes and Filters pattern [Hohpe and Woolf, 2003].
Since this pattern is widely used in the most popular web service frameworks
and Enterprise Service Buses, the approach can be implemented on other SOA
2
http://www.mulesoft.org/
http://servicemix.apache.org/
4
http://h20229.www2.hp.com/products/soa/
5
http://cxf.apache.org/
3
5.1. Applications
83
platforms (e.g., Apache Axis26 and Mule ESB) in a similar manner.
To analyze the impact of the approach on the performance of a system
we investigate how the approach affects the response time of services. The
results show a slight increase due to the increasing message size and the instrumented Apache CXF framework. To determine the impact on real systems
a repository of 41 industrial systems is examined. Given the amount of services typically deployed within these industrial systems we do not expect a
significant increase of the response time when using our approach.
This chapter is structured as follows. In Section 5.1 we present the main
applications of the proposed approach. In Section 5.2 we report the related
work. In Section 5.3 we describe the context in which we plan to apply our
study. In Section 5.4 we describe our approach to extract dynamic dependencies among web services. In Section 5.5 we propose an implementation
of our approach. In Section 5.6 we report the first experiments and the obtained results. Finally, we conclude the chapter and present the future work
in Section 5.7.
5.1
Applications
In this section we discuss the main applications of our approach that we plan
to perform in future work.
5.1.1
Quality attributes measurement
Our approach can be used to build up dynamic dependency graphs. These
graphs are commonly weighted, where the weights indicate the number of
times a particular service is invoked or a particular execution path is traversed.
The information contained in these graphs can help software engineers to
measure important quality attributes (e.g., analyzability and changeability)
for measuring maintainability of the system under analysis.
For instance Perepletchikov et al. defined several cohesion and coupling
metrics to estimate the maintainability and analyzability of service oriented
systems [Perepletchikov et al., 2006, 2007, 2010]. In our work shown in Chapter 2, we found an interesting correlation between the number of changes in
Java interfaces and the external cohesion metric defined by Perepletchikov et
al. With our approach to extract dynamic dependencies among services we
plan to perform similar studies to validate and improve those metrics by analyzing service oriented systems. More in general, our dynamic dependency
analysis is a starting point to study the interactions among services in indus6
http://axis.apache.org/
84
trial service oriented systems and to define anti-patterns that can affect the
quality attributes required by a SOA.
5.1.2
Change Impact Analysis
Besides the measurement of quality attributes our approach can be used to
perform Change Impact Analyses (IA) on service oriented systems. Bohner et
al. [Shawn A. Bohner, 1996; Bohner, 2002] defined the IA as the identification
of potential consequences of a change, or the assessment of what needs to be
modified to accomplish a change. They defined two techniques to perform IA,
namely Traceability and Dependency.
Wang and Capretz [2009] defined an IA approach for service oriented
systems based on a service dependency graph. Our approach fits in with their
work by adding a dynamic dependency graph.
5.2
Related Work
The most recent work on mining dynamic dependencies in service oriented
systems has been developed by Basu et al. [2008]. Basu et al. infer the causal
dependencies through three dependencies identification algorithms, respectively based on the analysis of 1) occurrence frequency of logged message
pairs, 2) distribution of service execution time and 3) histogram of execution
time differences. This approach does not require the instrumentation of the
system infrastructure. However, it is based on probabilities and there is still
the need for properly setting the parameters of their algorithms to reach a
good accuracy.
Briand et al. [2006] proposed a methodology and an instrumentation infrastructure aimed at reverse engineering of UML sequence diagrams from
dynamic analysis of distributed Java systems. Their approach is based on a
complete instrumentation of the systems under analysis which in turn requires
a complete knowledge of the system.
Hrischuk and Woodside [2002] provided a series of requirements to reverse engineer scenarios from traces in a distributed system. However, besides the requirements, this work does not provide any approach to extract
dependencies in a service oriented system.
As can be deduced from the overview of related work there currently does
not exist any accurate approach for inferring the dependencies amongst services. In this chapter, we present such an approach based on the concept of
vector clocks.
5.3. Study Context
85
Figure 5.1: A sample enterprise with web services deployed in two departments
5.3
Study Context
In this section we describe the context in which we plan to apply our study.
The perspective is that of a quality engineer who wants to extract the dynamic
dependencies among services within the boundaries of an enterprise. We refer
to dependencies as message dependencies, according to which two services are
dependent if they exchange messages. We furthemore refer to web services as
services which are compliant to the following XML-standards:
• WSDL7 (Web Services Description Language) which describes the service interfaces.
• SOAP8 (Simple Object Access Protocol) widely adopted as a simple, robust and extensible XML-based protocol for the exchange of messages
among web services.
7
8
http://www.w3.org/TR/soap/
86
Finally, we assume that the enterprise provides a UDDI9 (Universal Description, Discovery, and Integration) registry to allow for the publication of services and the search for services that meet particular requirements.
Our sample enterprise is composed of several departments (a sample enterprise with two departments is shown in Figure 5.1). Each department
exposes some functionality as web services that can be invoked by web services deployed in other departments. Services deployed within the boundaries of the enterprise are called internal services. Services deployed outside
the boundaries of the enterprise are called external services.
We assume that hosts within the departments publish web services through
an application server (e.g., JBoss AS10 or Apache Tomcat11 ) and web service
engines (e.g., Apache Axis2 or Apache CXF).
5.4
Approach
Our approach to extract dynamic dependencies among web services is based
on the concept of vector clocks. In this section, we first provide a background
on vector clocks after which we present our approach to order service executions and to infer dynamic dependencies among web services.
5.4.1
Vector Clocks
Ordering events in a distributed system, such as a service oriented system,
is challenging since the physical clock of different hosts may not be perfectly
synchronized. The logical clocks were introduced to deal with this problem.
The first algorithm relying on logical clocks was proposed by Lamport [1978].
This algorithm is used to provide a partial ordering of events, where the term
partial reflects the fact that not every pair of events needs to be related. Lamport formulated the happens-before relation as a binary relation over a set of
events which is reflexive, antisymmetric and transitive.
Lamport’s work is a starting point for the more advanced vector clocks defined by Fidge and Mattern in 1988 [Fidge, 1988; Mattern, 1989]. Like the
logical clocks, they have been widely used for generating a partial ordering
of events in a distributed system. Given a system composed by N processes,
a vector clock is defined as a vector of N logical clocks, where the ith clock is
associated to the ith process. Initially all the clocks are set to zero. Every time
a process sends a message, it increments its own logical clock, and it attaches
9
http://uddi.xml.org/
http://www.jboss.org/jbossas/
11
http://tomcat.apache.org/
10
5.4. Approach
87
the vector clock to the message. When a process receives a message, first it increments its own logical clock and then it updates the entire vector clock. The
updating is achieved by setting the value of each logical clock in the vector
to the maximum of the current value and the values contained by the vector
received with the message.
5.4.2
Inferring dependencies among web services
We conceive a vector clock (VC) as a vector/array of pairs (s,n), where s is
the service id and n is number of times the service s is invoked. When an instance of the service s receives an execution request the vector clock is updated
according to the following rules:
• if the request does not contain a vector clock (e.g., a request from outside
the system), the vector clock is created, and the pair (s,1) is added to it;
• if the request contains a vector clock and a pair with service id s is already
contained in the vector clock, the value of n is incremented by one; if not,
the pair (s,1) is added to the vector.
Once the vector clock is updated, its value is associated to the execution of
service s and we label it VC(s). The vector clock is then stored in a database.
Whenever an instance of the service s sends an execution request to another service x, then the following actions are performed:
• if the service x is an internal service, then the vector clock is attached to
the outgoing message;
• if the service x is an external service, the pair (x,1) is added to the vector
clock and the vector clock is stored in the database but not attached to
the outgoing message.
From the set of vector clocks stored in the database, we can infer the causal
order of the service executions. Given the vector clocks associated to the execution of the service i and the service j, VC(i) and VC(j), we can state that
the execution of service i causes the execution of service j, if VC(i) <VC(j),
according to the following equation:
V C(i) < V C( j) ⇔ ∀x V C(i) x ≤ V C( j) x
∧∃x 0 V C(i) x 0 < V C( j) x 0
(5.1)
88
Figure 5.2: Example of a service oriented system to open a bank account
where VC(i)x denotes the value for n in the pair (x,n) of the vector clock
VC(i). In other words, the execution of a service i causes the execution of a
service j, if and only if all the pairs contained in the vector VC(i) have values
for n that are less or equal to the corresponding values for n in VC(j), and at
least one value for n is smaller.
If all the corresponding pairs of the two vector clocks VC(i) and VC(j) contain the same values for n except one corresponding pair whose values for n
differ exactly by 1, we state that there is a direct dependency (i.e., a direct
call) between service i and service j.
If a pair with id s is missing in the vector the value for n is considered to
be 0.
Finally, to infer the dynamic dependencies among services, it is necessary
to apply the binary relation in (5.1) among each pair of vector clocks whose
values are stored in the database.
5.4. Approach
5.4.3
89
Working Example
Consider the example system from Figure 5.2 composed by six services inside
the enterprise boundary, one external service and one client which triggers the
execution. The system provides the services to open an account in a banking
system.
In this example, the client interested in creating an account needs to invoke the service OpenAccount. This service invokes the services GetUserInfo,
Deposit and RequestCreditCard. These services invoke the service WriteDB to
access a database. WriteDB first writes in a database and then, if its invocation
has been triggered by RequestCreditCard, invokes NotifyUser which performs
actions to notify the user. The external service TaxAuthority is invoked by
GetUserInfo to inquire fiscal information about the user.
The execution flow resulting from the invocation of the service OpenAccount is shown as a UML sequence diagram in Figure 5.3. The arrows in the
diagram are labeled with the vector clocks associated to the execution of the
invoked service. Vector clocks with superscripts mark vector clocks associated to
different instances of the same service. When the OpenAccount (OA) service is
invoked, there is no vector clock attached to the message, since the invocation
request comes from outside (i.e., Client). Hence, a new vector clock (VC(OA))
is created with the single pair (OA,1) and it is stored in the database. Then
the execution of the service OpenAccount triggers the execution of the service
GetUserInfo (GUI). When this service is invoked, a new pair (GUI,1) is added
to the vector clock, obtaining the new clock VC(GUI)=[(OA,1),(GUI,1)] that is
stored in the database.
When the service GetUserInfo (GUI) invokes the external service TaxAuthority (TA) the vector clock is set to VC(TA)=[(OA,1),(GUI,1),(TA,1)] and is
stored in the database. In this way we can infer dependencies to external
services. Since TaxAuthority (TA) is an external service and we do not have
control of external services the vector clock is not attached to this message.
Consider the execution of the service WriteDB (WDB), and assume we want
to infer all the services that depend on it. Since we have multiple invocations
of the service WriteDB in the execution flow, the dependent services are all the
services x whose vector clocks VC(x) satisfy the following boolean expression:
V C(x) < V C(W DB)0 ∨ V C(x) < V C(W DB)00 ∨
∨V C(x) < V C(W DB)000
These services are OpenAccount, GetUserInfo, Deposit and RequestCreditCard
(see Figure 5.3).
<<external>>
/TaxAuthority (TA)
<<internal>>
/Deposit (D)
<<internal>>
/WriteDB (WDB)
<<internal>>
/NotifyUser (NU)
VC(WDB)ʼʼʼ=[(OA,1),(RCC,1),(WDB,1)] VC(NU)=[(OA,1),(RCC,1),
(WDB,1),(NU,1)]
VC(WDB)ʼʼ=[(OA,1),(D,1),(WDB,1)]
VC(WDB)ʼ=[(OA,1),(GUI,1),(WDB,1)]
<<internal>>
/RequestCreditCard (RCC)
VC(RCC)=[(OA,1),(RCC,1)]
VC(D)=[(OA,1),(D,1)]
VC(TA)=[(OA,1),(GUI,1),(TA,1)]
<<internal>>
/GetUserInfo (GUI)
VC(GUI)=[(OA,1),(GUI,1)]
<<internal>>
/OpenAccount (OA)
VC(OA)=[(OA,1)]
/Client
<<external>>
90
Figure 5.3: Sequence diagram for opening a bank account. The arrows in the
diagram are labeled with the vector clocks associated to the execution of the
invoked service.
5.5. Implementation
91
If we want to infer all the services that WriteDB depends on, we look for all
the services x whose vector clock VC(x) satisfy the following boolean expression:
V C(x) > V C(W DB)0 ∨ V C(x) > V C(W DB)00 ∨
∨V C(x) > V C(W DB)000
The sole service which WriteDB depends on is NotifyUser.
Consider the execution of the service OpenAccount (OA), and assume we
want to infer the services that OpenAccount depends on directly. Those services
are the services GetUserInfo(GUI), Deposit(D) and RequestCreditCard(RCC). Their
vector clocks (VC(GUI), VC(D) and VC(RCC)) contain only one pair (respectively (GUI,1), (D,1) and (RCC,1)) with a value for n that is larger exactly by 1
than the corresponding values in the vector clock VC(OA). Among the services
OA and WDB there are no direct dependencies because the vector clocks corresponding to the execution of WDB contain two pairs with different values for
n.
The values for n from the example in Figure 5.3 are all equal to 1. However, they are needed to detect the presence of cycles along the execution
flows. Assume that the NotifyUser service invokes the WriteDB service introducing a cycle. In this case the vector clock associated to the second invocation
of the service WriteDB is VC(WDB)=[(OA,1),(RCC,1),(WDB,2),(NU,1)].
5.5
Implementation
The implementation of the proposed approach should be non-intrusive, easyto-implement and portable to different SOA platforms. Only if these properties
hold we can be sure that the approach can be adapted in an industry setting.
In this section we propose an implementation that meets these requirements.
The implementation requires three steps. First, the messages need to be
instrumented to attach the vector clock data structure. Next, we need a technique to capture the incoming messages in order to retrieve the vector clock,
update it and store its value in the database. Finally the outgoing messages
have to be captured to attach the updated vector clock to them.
To instrument the messages we use the SOAP header element. This element is meant to contain additional information (e.g., authentication information) not directly related to the particular message.
For example, after attaching the vector clock to the message sent from the
service GetUserInfo to the service WriteDB (see Figure 5.3), the message contains the following header:
92
<s o a p : E n v e l o p e>
<soap:Header>
<v c : V e c t o r C l o c k>
<v c : p a i r>
<v c : s>OpenAccount</ v c : s>
<v c : n>1</ v c : n>
</ v c : p a i r>
<v c : p a i r>
<v c : s>G e t U s e r I n f o</ v c : s>
<v c : n>1</ v c : n>
</ v c : p a i r>
</ v c : V e c t o r C l o c k>
</ soap:Header>
...
</ s o a p : E n v e l o p e>
Concerning the interception of the incoming and outgoing messages, we adopted
a technique that relies on the Pipes and Filters [Hohpe and Woolf, 2003] architectural pattern. The Pipes and Filters pattern allows to divide a larger processing task into a sequence of smaller, independent processing steps, called
Filters, that are connected by channels, called Pipes. This pattern is widely
adopted to process incoming and outgoing messages in web service engines
and frameworks such as Apache Axis2 and Apache CXF.
Those frameworks use Filters to implement the message processing tasks
(e.g., messages marshaling and unmarshaling) and they allow the developers
to easily extend the chains of Filters to further process messages. Since this
pattern is widely used, even by the Enterprise Service Bus platforms (e.g.,
MuleESB), we decided to use this pattern to implement the logic needed to
retrieve, update, store and forward the vector clocks.
Instrumenting the services would be an alternative implementation approach. However, instrumentation is risky since it modifies the implementation and can introduce bugs. To implement our approach we use the Apache
CXF service framework. In Apache CXF the filters are called interceptors. Figure 5.4 shows the chains of interceptors between an Apache CXF Deployed
Service and an Apache CXF Developed Consumer.
When the consumer invokes a remote service, the Apache CXF runtime
creates an outbound chain (Out Chain) to process the request. If the invocation starts a two-way message exchange, the runtime creates an inbound
chain to process the response (omitted in Figure 5.4).
When a service receives a request from a consumer, a similar process takes
place. The Apache CXF runtime creates an inbound interceptor chain (In
5.5. Implementation
93
Figure 5.4: The chains containing our vector clock interceptors between a
Apache CXF Deployed Service and a Apache CXF Developed Consumer
Chain) to process the request. If the request is part of a two-way message
exchange, the runtime also creates an outbound interceptor chain (omitted in
Figure 5.4).
In this implementation we add two interceptors. We add VectorClockInInterceptor in the In Chain to update/create the vector clock value and store it
in the database. In the Out Chain we added the VectorClockOutInterceptor to
attach the vector clock to the outgoing message, or to update and store the
vector clock in the case of invocations to external services.
Those interceptors can be added dynamically to the chain of interceptors.
This feature allows us to use our approach without re-deploying the system
under analysis.
94
5.6
Experiments
To investigate the impact of our approach on the service response time we
designed and executed two experiments. The response time of a system can
increase because the approach introduces two variables. First, we introduced
two new filters in the Pipes and Filters pattern and the Apache CXF runtime
is loaded with additional message processing tasks. Secondly, we introduced
a new header element in the SOAP messages to attach the vector clock which
increases the size of the messages passed between services.
We performed two experiments in which we measure the impact of the
instrumented Apache CXF framework (Experiment 1) and the impact of the
increasing size of the messages (Experiment 2) on the response time.
To perform our experiments the Apache CXF framework 2.4.1 is instrumented as described in the previous section. Tomcat 7.0.19 is used as an
application server and Hibernate 3 as Java persistence framework. On the
hardware part two platforms are connected through a 100 Mbit/s Ethernet
connection:
• Platform 1: MacBook pro 6.2 , processor 2.66 GHz Intel Core i7, memory 4 GB DDR3, Mac OS 10.6.5.
• Platform 2: MacBook pro 7.1 , processor 2.4 GHz Intel Core 2 Duo,
memory 4 GB DDR3, Mac OS 10.6.4.
Each platform uses a MySQL 5.1.53 (Community Edition) database to store
the vector clocks values for subsequent dependencies extraction. Execution
times are measured using the Java timer method, System.currentTime(). This
method returns the current value of the most precise available system timer,
in milliseconds (ms).
5.6.1
Experiment 1
In the first experiment we investigate the impact of the instrumented version
of the Apache CXF framework on the response time. We implemented the example shown in Figure 5.2 deploying the services within the boundary on Platform 1 and the external service on Platform 2. We deployed the services within
the system in one platform to achieve more accurate timing and eliminate the
network overhead, which is not relevant for this experiment. Moreover the
implementation of each service contains only the logic needed to invoke other
services. We measured the response time of the service OpenAccount in three
different scenarios:
5.6. Experiments
95
Response Time (ms)
300.00
250.00
200.00
150.00
100.00
Clock
ClockNoDB
NoClock
Figure 5.5: Box plots of the response time in milliseconds obtained for the
Experiment 1
• NoClock: we executed the system without our vector clock approach.
• Clock: we executed the system with our vector clock approach.
• ClockNoDB: we executed the system with our vector clock approach
without storing the vector clocks values in the database.
For each scenario we executed the system 1000 times to minimize the influence of the operating system activities. Figure 5.5 shows the box plots of the
response time measured for the three different scenarios while the following
table shows median and average values in milliseconds.
Scenario
NoClock
ClockNoDB
Clock
Median (ms)
116.6
249.4
286.4
Average (ms)
108
226
275
The results show that on average the difference among the response time is
167 ms between the scenarios with and without vector clocks. The overhead
due to the storage in the database using Hibernate 3 is on average 49 ms.
96
The difference measured is relevant, but it is relative to a system which
involves the execution of 7 services without any business logic. The impact
of our approach can be lower in real systems since the increase in milliseconds introduced by the instrumented Apache CXF framework is expected to
be a small percentage of the total response time when additional logic is also
executed.
5.6.2
Experiment 2
In the second experiment we investigate the impact of the increasing message
size on the response time. We implemented the system shown in Figure 5.6.
The system is composed of 12 web services that we labeled from 1 to 12. Each
web service Servicei invokes the Servicei+1 , except the last service Service12 .
The invocations among services are synchronous. To take into account the
network overhead we deployed the Servicei on the Platform 1 if i is an odd
number and on Platform 2 if it is even. Similarly to Experiment 1 the services’
implementations do not contain any business logic except the logic needed to
invoke the other service.
Figure 5.6: System deployed to perform the Experiment 2
We measure the response time of the service Service1 while increasing the
vector clock size from 1 to 2000 pairs. The vector clock is added to the message
sent to Service1 that forwards the message to Service2 until the last service
of the execution flow is reached. For each vector clock size, this scenario is
executed 1000 times to minimize the influence of the operating system activities. The vector clocks are not stored to the database in order to achieve more
accurate time measures.
Figure 5.7 shows the median and average of the measured response times
for each vector clock size. As shown by the plot the increasing size of the
messages has a relevant impact on the response time. Basically, the more
5.6. Experiments
97
unique services are invoked along the execution flow the higher the response
time.
Figure 5.7: Average and median response time in milliseconds when increasing the vector clock size for experiment 2
5.6.3
Summary of the results
Our experiments measured the impact of the approach on the response time.
This impact is mainly due to the increasing size of the SOAP messages. The
instrumentation of the CXF framework can be a minor issue for real systems.
In order to validate whether the increase in message-size is not problematic in practice, we counted the number of services and operations in a set of
industrial systems which use web services. These industrial systems have been
previously analyzed by the Software Improvement Group12 and cover a wide
range of domains. The following table reports the frequencies of the number
of operations and the number of services within these systems:
12
http://www.sig.eu
98
#Services
1-10
11-100
101-201
#Systems
31
6
4
#Operations
1-10
11-100
101-500
> 501
#Systems
13
17
9
2
According to these results, applying our approach to extract dependencies
in the biggest system (composed of 201 services) in our repository would lead
to an increase of the response time of 140 ms in the worst case. This difference
is significant for a system without any business logic, but we believe it is only
a small percentage of the response time in real systems. In our future work we
plan to investigate the impact of our approach in a subset of those systems.
5.7
In this chapter, we presented a novel approach to extract dynamic dependencies among services using the concept of vector clocks. They allow the reconstruction of an accurate dynamic dependency graph from the execution of a
service oriented system.
We implemented our approach into the Apache CXF framework using the
Pipes and Filters pattern. This pattern makes our approach portable to a wide
range of SOA platforms, such as Mule ESB and Apache Axis2.
The information retrievable with our approach is of great interest for both
researchers and developers of service-oriented systems. Amongst others, the
dependencies can be used to study service usage patterns and anti-patterns. In
addition, the information can be used to identify the potential consequences
of a change or a failure in a service, also known in literature as change and
failure impact analysis.
As future work, we plan to apply our approach to extract dependencies in
both open-source and industrial systems. The extracted graphs allows us to
measure important quality attributes of the systems under analysis, such as
changeability, maintainability and analyzability.
Moreover, we plan to further investigate the impact of our approach on
the response time of industrial systems. If the impact is significant, we plan to
improve our approach to minimize the introduced overhead.
.
6
Change-Prone Web APIs
Several metrics have been proposed in literature to highlight change-prone software components in order to ease their maintainability. However, to the best
of our knowledge, no such studies exist for web APIs (i.e., APIs exposed and
accessible via networks) whose popularity has grown considerably over the last
years. Web APIs are considered contracts between providers and consumers and
stability is a key quality attribute of them.
We present a qualitative and quantitative study of the change-proneness of
web APIs with low external and internal cohesion. First, we report on an online
survey to investigate the maintenance scenarios that cause changes to web APIs.
Then, we define an internal cohesion metric and analyze its correlation with the
changes performed in ten well known WSDL APIs.
Our results provide several insights into the interface, method, and datatype change-proneness of web APIs with low internal and external cohesion.
The results assist both providers and consumers in assessing the stability of web
APIs, and provide directions for future research.1
6.1
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.2
Research Questions and Approach . . . . . . . . . . . . . . . . . . . . . 103
6.3
Online Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.4
Quantitative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.5
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.6
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.7
Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Over the last years software systems have grown significantly from isolated software systems to distributed software systems (e.g., service-oriented
systems) [Neukom, 2004; Murer et al., 2010]. These systems consist of interconnected, distributed components that are implemented and deployed by
1
This chapter has been published as technical report [Romano et al., 2013].
99
100
Chapter 6. Change-Prone Web APIs
different organizations or different departments within the same organization. In these systems the distributed component’s API, referred to as web API
throughout this chapter, is considered a contract between the component’s
provider and its consumers [Erl, 2007].
One of the key factors for deploying successful web APIs, and in general
APIs, is assuring an adequate level of stability [Erl, 2007; Daigneau, 2011;
Vásquez et al., 2013]. Changes in a web API might break the consumers’
systems forcing them to continuously adapt to new versions of the web API
[Papazoglou, 2008; Daigneau, 2011]. For this reason, assessing the stability
of web APIs is key to reduce the likelihood of continuous updates.
To reduce the effort and the costs to maintain software systems several
approaches have been defined to identify change-prone software components
[Girba et al., 2004; Khomh et al., 2009; Posnett et al., 2011; Penta et al.,
2008]. Based on these studies software engineers can use quality indicators
(e.g., software metrics, heuristics to measure antipatterns) that can estimate
the components’ change frequency and can assist them in taking appropriate
countermeasures. However, to the best of our knowledge, none of these studies investigates change indicators for web APIs. We believe that this is mainly
due to the lack of publicly available web APIs with long histories which makes
performing such studies challenging for reasearchers.
Change-proneness indicators would bring relevant benefits for both
providers and consumers. On the one hand, consumers can estimate the
change-proneness of suitable web APIs available on the market and subscribe
to the most stable one. On the other hand, providers want to publish stable
web APIs to reduce the maintenance effort and to attract more consumers and,
consequently, increase their profits.
Among all the structural properties of web APIs (e.g., complexity and size),
we believe that the cohesion can affect their change-proneness. Our intuition
is based on our work shown in Chapter 2 and existing work [Perepletchikov
et al., 2007, 2010]. In Chapter 2, we showed that, among the existing source
code metrics, the external cohesion has the strongest correlation with the number of changes performed in Java interfaces. Moreover, Perepletchikov et al.
[Perepletchikov et al., 2007, 2010] showed that the cohesion can affect the
understandability and, consequently, the maintainability of web APIs.
In this chapter, to assist both providers and consumers, we use a mixed
method approach [Creswell and Clark, 2010] to analyze the impact of the internal and external cohesion on the change-proneness of web APIs. Internal
cohesion measures the cohesion of the operations (also referred to as methods) of a web API. External cohesion measures the extent to which the oper-
6.1. Background
101
ations of a web API are used together by external consumers (also referred to
as clients).
In the first part of our study, we use an online survey to investigate 1) the
interface and method level change-proneness of web APIs with low external
cohesion and 2) the interface and data-type level change-proneness of web
APIs with low internal cohesion. The results show the likelihood with which
maintenance scenarios can cause changes in web APIs affected by low internal
and external cohesion.
The second part of our study consists of a quantitative analysis of the
change-proneness of web APIs with low internal cohesion. We first introduce
the Data Type Cohesion (DTC) metric to overcome the problem of the existing
internal cohesion metrics. Based on frequent discussions with industrial partners and colleagues, we believe that the existing metrics should be improved
because they do not take into account the cohesion among data types. We
then analyze the change-proneness of ten public WSDL2 (Web Service Description Language) APIs investigating the correlation between our DTC metric and
the number of changes performed in the WSDL APIs. The results show that
the values for the DTC metric are correlated with the number of changes the
WSDL APIs undergo to.
The contributions of this chapter are:
• insights into the likelihood of maintenance scenarios to cause changes
in web APIs with low internal and external cohesion.
• a new internal cohesion metric that takes into account the cohesion
among data types to highlight change-prone WSDL APIs.
• guidelines for researchers to investigate the method level of web APIs
with low external cohesion and the data-type level of web APIs with low
internal cohesion.
6.1
Background
The concept of software cohesion has been widely investigated in the context of different programming paradigms [Briand et al., 1998; Counsell et al.,
2006; Perepletchikov et al., 2007; Kumar et al., 2011; Zhao and Xu, 2004].
In this chapter we adhere to the classification defined by Perepletchikov et
al. [Perepletchikov et al., 2007, 2010] who investigated the cohesion of web
APIs. According to their classification there are 8 different levels of cohesion
2
102
involving web APIs: coincidental, logical, temporal, communicational, external,
implementation, sequential, and conceptual. In this chapter we focus on the external and communicational cohesion to which we refer as internal cohesion.
The internal cohesion measures the cohesion of the operations (also referred
as methods throughout the chapter) declared in a web API. Similar to the
method cohesion (i.e., LCOM) defined by Chidamber et al. [Chidamber and
Kemerer, 1991, 1994], the internal cohesion expresses the extent to which the
operations belong together counting their common parameters. The external
cohesion measures the extent to which the operations of a web API are used
by external consumers (also called clients). In the next subsections, first, we
present existing metrics proposed in literature to measure the external and
internal cohesion. Then, we present the existing antipatterns in web APIs that
result from low internal and external cohesion.
6.1.1
Cohesion Metrics
To compute the external cohesion of a web API, Perepletchikov et al. [Perepletchikov et al., 2007, 2010] proposed the SIUC (Service Interface Usage Cohesion) metric. This metric computes the sum of operations invoked by each
client normalized by the number of clients and operations declared in the web
API. To the best of our knowledge there are no further studies proposing other
metrics for measuring the external cohesion.
Existing studies propose different metrics to measure the internal cohesion of web APIs. Perepletchikov et al. [Perepletchikov et al., 2007, 2010]
proposed the SIDC (Service Interface Data Cohesion) metric, Sindhgatta et al.
[Sindhgatta et al., 2009] proposed the LCOS (Lack of COhesion in Service)
and the SFCI (Service Functional Cohesion Index). Even though their formulas
differ, these metrics have in common that they only measure the degree to
which operations use common messages without considering the cohesion of
messages. For this reason, in this paper, we refer to these existing metrics as
message-level metrics.
6.1.2
Antipatterns
In literature different antipatterns for web APIs, and more in general for
APIs, have been proposed [Moha et al., 2012; Rotem-Gal-Oz, 2012; Král and
Zemlicka, 2007; Cherbakov et al., 2006; Dudney et al., 2002; Martin, 2002].
Among the proposed antipatterns two antipatterns in web APIs are symptoms
of low internal and external cohesion: the Multiservice [Dudney et al., 2002]
and the Fat [Martin, 2002] antipatterns.
The Multiservice antipattern was originally conceived by Dudney et al.
6.2. Research Questions and Approach
103
<<webAPI>>
CommerceAPI
placeOrder()
reserveInventory()
generateInvoice()
acceptPayment()
validateCredit()
getOrderStatus()
cancelOrder()
getPaymentStatus()
Figure 6.1: Example of a Multiservice web API (symptom of low internal cohesion). It exposes operations related to five different business entities (i.e.,
Order, Inventory, Invoice, Payment, and Credit).
[Dudney et al., 2002] and it is also known as God Object in literature [Moha
et al., 2012]. A Multiservice web API exposes many operations that are related to different business entities. The CommerceAPI shown in Figure 6.1
is an example of a Multiservice API. This API exposes operations related to
five different business entities: Order, Inventory, Invoice, Payment, and Credit.
Such a web API ends up to be low internally cohesive because of the different entities encapsulated by it. As a consequence, many clients can invoke
simultaneously its operations causing performance bottlenecks [Moha et al.,
2012].
The Fat antipattern was proposed by Martin in [Martin, 2002]. This antipattern occurs in web APIs and other types of APIs, such as Java interfaces.
A Fat web API is an API with disjoint sets of operations that are invoked by
different clients and, hence, they show low external cohesion. The BankAPI
shown in Figure 6.2 is an example of a Fat API. The Student and Professional
clients invoke two disjoint sets of operations. Martin proposed the Interface
Segregation Principle (ISP) to refactor such APIs. The ISP states that Fat APIs
need to be split into smaller APIs according to its clients’ usage. Each smaller
API should be specific to a client and each client should only know about the
set of operations it is interested in.
6.2
Research Questions and Approach
The change-proneness of web APIs is relevant to design and maintain large
distributed information systems [Murer, 2011; Murer et al., 2010]. To better
104
<<webAPI>>
BankAPI
accountBalanceForStudent()
requestLoanForStudent()
requestInsuranceForStudent()
requestInsuranceForPro()
requestLoanForPro()
accountBalanceForPro()
Student
Client
<<uses>>
Professional
Client
Figure 6.2: Example of a Fat web API (symptom of low external cohesion).
The Student and Professional clients invoke disjoint set of operations.
understand the importance of assuring stable web APIs consider the scenario
shown in Figure 6.3.
PaymentAPI1
provides
3 changes per month
invokes
PaymentAPI2
P
Provider1
provides
1 change per month
BankClient
PaymentAPI3
9 changes per month
Provider2
provides
Provider3
Figure 6.3: Scenario in which the web API consumer BankClient subscribes to
the most stable API PaymentAPI2 among three available web APIs.
In this scenario, a web API consumer (i.e., BankClient) wants to use a web
API to receive payments from its customers. On the market there are three
different providers (i.e., Provider1, Provider2, and Provider3) each providing a
payment API (i.e., PaymentAPI1, PaymentAPI2, and PaymentAPI3) that adhere
to BankClient’s business and functional requirements. BankClient is interested
in a stable web API to reduce the need to adapt its system(s). Therefore, he
decides to monitor the evolution of the three web APIs for a certain time. After
this time he can use the most stable API (i.e., PaymentAPI2) with the lowest
105
change frequency (i.e., 1 change per month).
In a real world scenario, where time-to-market is important for gaining
competitive advantage, BankClient typically does not have the time to monitor the stability of different web APIs. Moreover, the number of past changes
might not be available and they might not be a good indicator for future
changes. For instance, an API might have been refactored to improve its
change-proneness. Furthermore, from the perspective of providers, they are
interested in providing stable web APIs to increase the likelihood with which
clients subscribe to their APIs and, consequently, increasing their profits.
In this chapter, we investigate the relationship between internal and external cohesion and the change-proneness of web APIs. The results can assist
web APIs consumers and providers in estimating the stability of web APIs. In
the following, we motivate and state our research questions, as well as, outline
our research approach.
6.2.1
External Cohesion and Change-Proneness
Concerning web APIs with low external cohesion we want to investigate which
scenarios are more likely to cause future changes in Fat web APIs. Moreover,
we want to analyze the change-proneness of methods exposed in such web
APIs. APIs with low external cohesion can have two different types of methods.
Shared methods are methods invoked by all different clients. In Figure 6.4
requestInsurance() is a shared method since both the Student and Professional
clients invoke it. Non-shared methods are methods invoked only by a specific
client (e.g., the requestLoanForStudent() method in Figure 6.4). We believe
that these two classes of methods can be changed for different reasons and
knowing these reasons can give further insights into the change-proneness of
web APIs with low external cohesion. To assist providers in evaluating the
change-proneness of their web APIs with low external cohesion, we answer
the following research question:
• RQ1: What are the scenarios in which developers change web APIs with
low external cohesion? In which cases do they change the shared and
non-shared methods?
We investigate the change-proneness on two different levels: interface
level (i.e., change-proneness of a web API as a whole) and method level (i.e.,
change-proneness of the methods exposed by a web API). The results from
this research question assist only providers because consumers typically do
not have access to the information needed to measure the external cohesion
(i.e., how other consumers invoke the API).
106
<<webAPI>>
BankAPI
accountBalanceForStudent()
requestLoanForStudent()
requestInsurance()
requestLoanForPro()
accountBalanceForPro()
Student
Client
<<uses>>
Professional
Client
Figure 6.4: Web API with low external cohesion where only the method requestInsurance() is shared by the two different clients Student Client and Professional Client.
6.2.2
Internal Cohesion and Change-Proneness
Similar to external cohesion we investigate which scenarios are more likely
to cause changes in Multiservice web APIs (i.e., web APIs with low internal
cohesion). Furthermore, we analyze the change-proneness of the data types
declared within a web API. This allows to highlight the differences between
the change-proneness of shared data types (i.e., data types referenced multiple
times within a web API) and non-shared data types (i.e., data types referenced
only once). To evaluate the change-proneness of web APIs with low internal
cohesion, we answer the following research question:
• RQ2: What are the scenarios in which developers change web APIs with
low internal cohesion? In which cases do they change the shared and
non-shared data types?
We investigate the change-proneness on two different levels: interface
level (i.e., change-proneness of a web API as a whole) and data-type level
(i.e., change-proneness of the data types declared in a web API). Differently to
RQ1, the results from RQ2 assist both, providers and consumers. Both have
access to the web API to measure the internal cohesion.
6.2.3
Internal Cohesion Metrics as Change Indicators
To make the results from RQ2 actionable in an industrial environment [Bouwers et al., 2013] a metric should be used to measure the internal cohesion.
107
However, as shown in Section 6.1, the existing metrics are message-level metrics that do not consider the usage of data types to compose messages. To
understand this drawback consider the two examples in Figure 6.5.
type2
operation1
message1
type1
type3
operation2
(a) Two operations operation1 and operation2 that use the
same message message1.
operation1
message1
type1
type2
operation2
message2
type4
type3
(b) Two operations operation1 and operation2 that use different messages message1 and message2 referencing indirectly the
same data types type2 and type3.
Figure 6.5: Example that shows the drawback of existing message-level internal cohesion metrics SIDC, LCOS, and SFCI.
The web API shown in Figure 6.5a exposes two operations operation1 and
operation2 that use the same message message1. The message-level metrics are
capable to detect the cohesion of this web API, however, fail when measuring
the cohesion of the web API shown in Figure 6.5b. This API has two operations
operation1 and operation2 that use different messages, namely message1 and
message2. In this case the message-level metrics result in a low value of cohesion. However, message1 and message2 reference the same data types type2
and type3. We argue that the web API is cohesive because both, type1 and
type4 (referenced by respectively message1 and message2), are complex data
types composed of type2 and type3.
To overcome this problem Bansiya et al. [Bansiya and Davis, 2002] defined the CAMC (Cohesion Among Methods of Class) metric that measures the
cohesion of object oriented classes. In this chapter we adapt the CAMC metric
for web APIs proposing the Data Type Cohesion (DTC) metric. For a web API
108
s, DTC is computed as follows:
P
DT C(s) =
x, yεOp(s) Co (x,
y)
|Op(s)|
(6.1)
where Op(s) represents the set of operations exposed in s. Co (x, y) is the
cohesion between two operations x and y, and it is defined as:
P
m,nεM P(s) Cd t (m, n)
Co (x, y) =
(6.2)
|M P(s)|
where MP(s) is the set of all message pairs used by x and y; Cd t (m, n) is the
cohesion between the messages m and n computed as:
Cd t (m, n) =
C om(m, n)
C om(m, n) + U ncom(m, n)
(6.3)
where C om(m, n) represents the number of data types referenced by both
messages m and n; and U ncom(m, n) is the number of data types referenced
only in one message.
To investigate quantitatively the change-proneness of web APIs with low
internal cohesion we answer the following research question:
• RQ3: To which extent does the DTC metric highlight change-prone
WSDL APIs? Which data types declared in a WSDL API are more changeprone?
Similar to RQ2 we investigate the change-proneness on two different levels:
interface level and data-type level. The results from RQ3 are useful for both,
providers and consumers, interested in measuring the internal cohesion in order to highlight change-prone web APIs and change-prone data types declared
by them.
6.2.4
Research Approach
To answer our research questions we adopt a mixed method approach [Creswell
and Clark, 2010]. First, we answer RQ1 and RQ2 with a qualitative analysis
consisting of an online survey. Then, following an exploratory sequential approach [Creswell and Clark, 2010], we refine the results from RQ2 with a
quantitative analysis aimed at answering RQ3. Note, we do not quantitatively
refine the results from RQ1 because the needed information ( i.e., how consumers invoke web APIs) are not available. We present the study, the analysis
methods and the results of the qualitative and quantitative analyses respectively in Section 6.3 and Section 6.4.
6.3. Online Survey
6.3
109
Online Survey
To answer the first two research questions RQ1 and RQ2, we performed an
online survey consisting of three different parts. The first part of the survey
introduces the terminology that might be used differently in academia and
in industry. The next questions are on the background of participants. In
particular, we asked information about their current position within their institutions/companies and their background in the areas involving web APIs
(i.e., service-oriented, cloud computing, WSDL APIs, and RESTful APIs).
In the second part, the questions are aimed at investigating the changeproneness of four SOA antipatterns (i.e., Fat, Multiservice, Tiny, and SandPile
antipatterns). In this chapter, we focus on and report the results about the Fat
antipattern (RQ1) and the Multiservice antipattern (RQ2). We do not report
the results about the Tiny and the SandPile antipatterns because they are not
symptoms of web APIs with low external and/or internal cohesion. In fact,
they are symptoms of inadequate granularity [Moha et al., 2012] and subject
of our future work.
In the third and final part of the survey, we asked participants to share their
experiences with other design practices that can affect the change-proneness
of web APIs and have not been covered by the survey. They are meant to draw
directions for future work on this subject. Then, we asked questions to assess
their prior knowledge about the antipatterns presented in the survey.
Before publishing our survey, we conducted three rounds of pilots with
five software engineering researchers with a strong background in qualitative
analyses. In each round we refined the survey questions and its structure
based on their feedback. This step was necessary to attract participants in
completing the survey. The complete survey is available on our website.3
In the following subsections, we first present information about the participants and their background and, then, we answer our research questions RQ1
and RQ2. For each research question we present the data used, the analysis
method, and the results to answer it.
6.3.1
Participation
Our survey was opened on July 1st, 2013 and closed on July 31st. We forwarded the survey to our industrial partners and academics working on research topics related to web API development. Moreover, we advertised it in
google groups related to web APIs. During this time we collected responses
from 79 participants among which 47 (59.5%) completed the entire survey an3
http://goo.gl/f0gi17
110
swering all questions. Given that participants needed to answer 36 questions,
investing on average approximately 40 minutes of their time, we consider it
a good number of participants and a high rate of completion [Smith et al.,
2013].
Among the 79 participants 44 work in industry, 30 in academia, and 5 in
both academia and industry. Participants rated their background on a 5-point
Likert scale ranging from absent, weak, medium, good, to strong. Participants,
who answered the questions of the second and third part, have at least a
good background in at least one of the following areas: service-oriented, cloud
computing, WSDL APIs, and RESTful APIs. Few participants have an absent
or weak background in any of these topics and quit the survey just after the
background questions.
Interestingly, 72.7% of the participants answered that they do not know
any metric/quality indicators to estimate the change-proneness of web APIs.
The most common indicators used by the remaining 27.3% are the response
time and information about the changes (e.g., number of changes between two
versions, number of operations changed, etc.).
6.3.2
External Cohesion and Change-Proneness
To investigate the answer to RQ1, we analyzed the change-proneness of web
APIs with low external cohesion on two different levels: interface level and
method level.
Interface Level Change-Proneness
Focusing on the interface level, we asked the participants to rank six scenarios
that can lead a Fat web API to be changed. As discussed in Section 6.2, this
antipattern is a symptom of web APIs with low external cohesion. Table 6.1
shows the list of scenarios. We derived them from our frequent discussions
with our industrial and academic partners and colleagues. Furthermore, we
asked the participants to state additional scenarios in a text box. For each
scenario, 53 participants ranked the likelihood on a 5-point Likert scale: 0
(Won’t change), 1 (Might change), 2 (Likely to change), 3 (Very likely to change),
and 4 (Sure will change).
We first used the non-parametric Kruskal-Wallis rank sum test [Kruskal
and Wallis, 1952] to analyze whether there is a difference between the scenarios to cause changes. Kruskal-Wallis tests whether samples originate from
the same distribution comparing three or more sets of scores (i.e., the values
of the 5-point Likert scale) that come from different groups (i.e., the different scenarios). We used the non-parametric Kruskal-Wallis test because the
6.3. Online Survey
111
Table 6.1: Scenarios that cause changes in Fat web APIs (i.e., web APIs with
disjoint sets of operations that are invoked by different clients indicating low
external cohesion).
Id
Fat1
Fat2
Fat3
Fat4
Fat5
Fat6
A Fat API is changed because ...
its clients have troubles in understanding it.
having a specific method for each client would
introduce clones and the API would become
hard to be maintained
it is a bottleneck for the performance of the system. It should be split into APIs specific for each
different client (i.e., Interface Segregation Principle [Martin, 2002]).
different developers work on the specific functionalities for the different clients.
if the functional requirements of a client
change, the other clients will be affected as well.
test cases for all clients should pass before the
API can be deployed.
distributions of scores given by the participants are ordinal and non-normally
distributed. Moreover, this test has been designed to compare three or more
distributions in contrast to the non-parametric Mann-Whitney test [Lehmann
and D’Abrera, 1975] that compares two distributions. Performing the KruskalWallis rank sum test among the scores given to the different scenarios resulted
in a p-value < 0.01. This shows that the given scenarios cause changes to Fat
web APIs with different probabilities.
The distributions of the scores given to the different scenarios are reported
in Figure 6.6. To analyze these probabilities, we ranked the scenarios by the
median and mean values. According to this ranking the Fat2 is the most likely
scenario with a median value of 3. This means that a Fat web API is very
likely to be changed to reduce the amount of clones and ease maintainability.
The second most likely scenario is Fat1 . According to its median score of 2,
a Fat web API is likely to be changed to improve its understandability for the
clients. The other 4 scenarios have median values equal to 1 indicating that
they might force a Fat web API to be changed.
To conclude and answer the first part of RQ1, we can state that: 1) Fat web
APIs are very likely to be changed to ease maintainability and reduce clones;
2) they are also likely to be changed for improving understandability.
0
1
2
3
4
Likelihood to occur
112
Fat1
Fat2
Fat3
Fat4
Fat5
Fat6
Scenarios
Figure 6.6: Likelihood ranges from 0 (Won’t change) to 4 (Sure will change)
for the scenarios causing changes in Fat APIs listed in Table 6.1.
Method Level Change-Proneness
Addressing the second part of RQ1, we focus on analyzing the change-proneness
of methods declared in a web API with low external cohesion. As described
in Section 6.2, Fat web APIs expose two classes of methods: 1) methods invoked by a specific client (i.e., non-shared methods) and 2) methods invoked
by different clients (i.e., shared methods). In our survey, we asked the participants to state which class of methods is more likely to be changed. Out of
60 participants who answered this question 33 found that shared methods are
more likely to be changed while 27 found that non-shared methods are more
change-prone.
In addition, we asked the participants to motivate their choice filling in
a text box. To analyze their motivations we manually clustered the answers
in different groups using the card sort technique [Barker]. This technique
consists in sorting the cards (i.e., the provided motivations in our case) into
meaningful groups and abstracting hierarchies to deduce general categories.
We mined two frequent groups of answers from their motivations. On the one
hand, 16 out of 33 participants found that shared methods are more likely to
be changed because they should satisfy multiple requirements from different
clients. On the other hand, 12 out of 27 participants found that non-shared
methods are more change-prone because changing them affects fewer clients.
To conclude and answer the second part of RQ1, we can notice that the
participants have two different ideas about the change-proneness of methods.
Even though their opinions are different they do not conflict and they do provide two useful insights:
6.3. Online Survey
113
• shared methods are changed when the requirements of their different
clients evolve differently.
• otherwise developers tend to change non-shared methods because the
impact of a change is lower.
6.3.3
Internal Cohesion and Change-Proneness
Similar to RQ1, we answer RQ2 analyzing the change-proneness of web APIs
with low internal cohesion on two different levels: interface level and datatype level.
Interface Level Change-Proneness
The first part of RQ2 aims at investigating scenarios that can cause changes in
Multiservice web APIs. As discussed in Section 6.1, this antipattern is a symptom of web APIs with low internal cohesion. Similar to before, we provided
the participants with seven scenarios to be ranked on the same 5-point Likert scale. Table 6.2 lists the seven scenarios stemming from discussions with
our industrial and academic partners. Furthermore, we asked them to state
additional scenarios in a text box. 51 participants ranked these scenarios.
To analyze the results we followed the same approach used for analyzing
the Fat web APIs’ results before. First, we used the Kruskal-Wallis rank sum
test to verify whether there is a statistical difference between the distributions
of scores given to the different scenarios. The test resulted in a p-value<0.01
indicating that these scenarios cause changes to Multiservice web APIs with
different probabilities.
Then, we ranked the scenarios based on the median and mean values of
their scores. The distributions of the scores given to the different scenarios are
reported in Figure 6.7. The ranking shows that a Multiservice web API is very
likely to be changed because of the different entities encapsulated by these
web APIs (MS1 ). These changes can affect different clients even though they
are not interested in the changed entity (MS2 ). Multiservice web APIs are also
very likely to be changed to improve their understandability (MS7 ). Furthermore, the scenarios MS3 , MS4 , MS5 and MS6 are likely to cause changes.
To conclude and answer the first part of RQ2, we can state that Multiservice web APIs are very likely to be changed because: 1) every time it changes
many clients are affected (MS2 ); 2) the web API can change for different reasons caused by the different entities (MS1 ); and 3) understanding the web API
is complicated for its clients (MS7 ).
114
Table 6.2: Scenarios that cause changes in Multiservice web APIs (i.e., APIs
that expose many operations that are related to different business entities).
Id
MS1
MS2
MS3
MS4
MS5
MS6
MS7
A Multiservice API is changed because ...
every business entity can change for different
reasons (e.g., different evolving requirements).
A new version should be published every time
one of these entities changes.
changes to the API affect many clients (even
though they do not use the changed business
entity).
all the tests involving the different entities
should pass before the entire web API is deployed.
the number of invocations to the Multiservice
web API is high due to the different business
entities.
proper pool tuning techniques are needed to
achieve adequate performance due to the numerous clients.
different developers work on different business
entities.
many business entities are exposed complicating the understanding of the API.
Data-Type Level Change-Proneness
Addressing the second part of RQ2, we focus on analyzing the change-proneness
of data types declared in Multiservice web APIs with low internal cohesion. In
our survey, we asked participants to select which of the two classes of data
types is more change-prone: shared data types (referenced more than once
in a web API) and non-shared data types (referenced only once). Out of 48
participants who answered this question 30 (62.5%) found that non-shared
data types are more likely to be changed, while 18 (37.5%) found that shared
data types are more change-prone.
In addition, participants motivated their answers filling in a text box. Applying the card sort technique we manually clustered their motivations into
two common groups of answers. On the one hand, 12 out of 18 participants
stated that shared data types are likely to be changed because they are used by
different messages and/or data types that can force them to change. In other
words, they have multiple causes to change. On the other hand, 8 out of 30
participants stated that non-shared data types are more change-prone because
1
2
3
4
115
0
Likelihood to occur
6.4. Quantitative Analysis
MS1
MS2
MS3
MS4
MS5
MS6
MS7
Scenarios
Figure 6.7: Likelihood ranges from 0 (Won’t change) to 4 (Sure will change)
for the scenarios causing changes in Multiservice APIs listed in Table 6.2.
developers prefer to share stable data types that represent generic business
abstractions.
Similarly to the change-proneness of methods, the participants have two
different opinions. However, they do not conflict and give two relevant insights into the change-proneness of data types:
• shared data types are changed when their operations evolve differently.
• otherwise developers tend to change non-shared data types because the
impact of a change is lower.
6.4
Quantitative Analysis
The goal of the quantitative analysis is to provide an answer to RQ3, and consequently refine the results from RQ2. To reach this goal, we analyzed the
correlation between the DTC cohesion metric and the number of changes performed in the different versions of ten public WSDL APIs. Table 6.3 lists the
selected WSDL APIs from Amazon,4 eBay,5 and FedEx6 with their basic characteristics. WSDL is a standard interface description language used by many
service-oriented systems to describe the functionality offered by a web API.
We selected these WSDL APIs because they have sufficiently long histories as
indicated by the increase in number of operations and data types. Furthermore, they have been used and discussed for similar studies in prior research
4
http://aws.amazon.com
http://developer.ebay.com
6
http://www.fedex.com/us/web-services
5
116
Table 6.3: WSDL APIs selected for the quantitative analysis showing the name
(WSDL_API), the number of versions (Vers), the number of operations in the
first and last versions (Ops), and the number of data types in the first and last
version (Types).
WSDL_API
AmazonEC2
AmazonFPS
AmazonQueueService
AWSECommerceService
AWSMechanicalTurkRequester
eBay
FedExPackageMovement
FedExRateService
FedExShipService
FedExTrackService
Vers
22
3
4
5
6
5
4
11
8
5
Ops
14-118
29-27
8-15
23-23
40-44
156-156
2-2
1-1
1-7
3-4
Types
60-463
19-18
26-51
35-35
86-102
897-902
15-15
43-140
74-166
29-33
[Fokaefs et al., 2011]. Even though a bigger data set is desirable, having access to WSDL APIs with long histories is not a trivial task. Most of them are
used in a closed environment allowing access only to registered customers.
6.4.1
Interface Level Change-Proneness of
WSDL APIs
For analyzing the change-proneness of the selected WSDL APIs we first computed the values for the DTC metric for each version of each WSDL API. Next,
we extracted the changes between each pair of subsequent versions of a WSDL
API. The changes were extracted using our WSDLDiff tool (presented in Chapter 4) that loads the specification of two versions of a WSDL API and compares them by using the differencing algorithm provided by the Eclipse EMF
Compare plugin.7 In particular, WSDLDiff extracts the types of the elements
affected by changes (e.g., Operation, Message, Data Type) and the types of
changes (e.g., removal, addition, move, attribute value update). With this, WSDLDiff is capable of extracting changes, such as "a message has been added to
an operation" or "the name of an attribute in a data type has been modified".
We refer to these changes as fine-grained changes. Using WSDLDiff for each
version of a WSDL API we counted the number of fine-grained changes that
occurred between the current and previous version.
7
http://www.eclipse.org/modeling/emf/
6.4. Quantitative Analysis
117
We used the Spearman rank correlation for computing the correlation between the values of the DTC metric and the number of changes. Spearman
compares the ordered ranks of two variables to measure a monotonic relationship. We chose the Spearman correlation because it does not make assumptions about the distribution, variances and the type of the relationship
[S.Weardon and Chilko, 2004]. A Spearman value (i.e., rho) of +1 and -1
indicates high positive or high negative correlation, whereas 0 indicates that
the variables under analysis do not correlate at all. Values greater than +0.3
and lower than
-0.3 indicate a moderate correlation; values greater than +0.5 and lower than
-0.5 are considered to be strong correlations [Hopkins, 2000].
The result of the Spearman correlation analysis shows that the DTC metric
has a significant and moderate negative correlation with a rho value equal to
-0.361 (i.e., rho<-0.3) and with a p-value equal to 0.007.
Moreover, we computed the values of the existing message-level metrics
(i.e., LCOS, SFCI, and SIDC) on the same WSDL APIs. We found out that
their values are always 0 or 1. For instance, the value for LCOS is 1 in 62 out
of 73 versions and 0 in 11 versions. Manually analyzing the WSDL APIs we
noticed that this is due to their design. As shown by the example in Figure 6.5b
messages reference different data types that are used as wrappers to isolate the
data type declarations from the declaration of the operations and messages.
For the WSDL APIs under analysis, this result confirms that existing metrics
suffer from the problem explained in Section 6.2 and discussed in previous
work [Bansiya and Davis, 2002].
We can conclude that DTC shows a moderate correlation, indicating that in
increase in the internal cohesion is associated with a decrease in the number
of changes.
6.4.2
Data-Type Level Change-Proneness of
WSDL APIs
To detail these results, we investigated the change-proneness of shared and
non-shared data types. For each data type in each version, we computed the
number of times they are referenced in the WSDL API and the number of
changes as extracted by our WSDLDiff tool. Next, we used Spearman to compute the correlation between these two metrics. Table 6.4 presents the results
of this analysis.
Looking at the p-values of the correlation analysis, we note that significant
results were obtained for 5 WSDL APIs (i.e., p-value< 0.01). Among them the
values for three WSDL APIs show a strong correlation (i.e., rho<-0.5) and for
118
Table 6.4: Results of the Spearman correlation analysis between the number
of references and number of changes of data types. Bold values highlight
significant correlations.
WSDL
AmazonEC2
AmazonFPS
AmazonQueueService
AWSECommerceService
AWSMechanicalTurkRequester
eBay
FedExPackageMovement
FedExRateService
FedExShipService
FedExTrackService
p-value
0.248
0.612
0.301
0.089
0.000
0.638
0.005
0.000
0.000
0.000
rho
-0.048
-0.104
-0.130
0.291
-0.502
-0.015
-0.512
-0.418
0.193
-0.559
one WSDL API they show a moderate correlation (i.e., rho<-0.3). These correlations indicate that the more a data type is referenced the less change-prone
it is. Manually analyzing a sample set of shared data types, we found that
they represent generic business entities or satellite data used by operations of
the same domain. Hence, we assume that their requirements do not evolve
differently. For instance, the ClientDetail in FedexShipService is a shared data
type referenced directly and indirectly on average 9 times by shipment operations that require information about the client. This data type encapsulates
descriptive data about clients and it did not change across the releases. This
result partially confirms the results of our survey namely: shared data types
are change-prone if referenced by operations with different requirements, otherwise developers tend to change non-shared data types.
Based on these results, we can answer RQ3 stating that the DTC metric is
able to highlight change-prone WSDL APIs. Moreover, we can partially confirm
the insights of our participants about change-prone data types. However, to
fully validate this result a bigger data set is needed. An ideal data set would
consist of several WSDL APIs with long histories and from different domains
or companies. This is needed to avoid that the results might be WSDL or
company specific. Unfortunately, as already discussed, getting access to these
artifacts is challenging.
6.5. Discussion
6.5
119
Discussion
In this section we summarize the results of our study, discuss the implications
of the results and the threats to validity.
6.5.1
Summary of the Results
Summarizing the findings of our study, we found that Fat web APIs are very
likely to be improved to reduce clones and ease maintainability and they are
likely to be changed to improve understandability (RQ1). Multiservice APIs
are very likely to be improved because such a web API declares different business entities and a change in one entity typically affects all the clients. Similar
to Fat web APIs, Multiservice APIs are also affected by understandability issues
(RQ2).
Analyzing the change-proneness of methods and data types we found that
both shared messages and shared data types are likely to be changed if they
are shared by clients and operations with different requirements (RQ1 and
RQ2). For instance, if two clients with different requirements invoke the same
operations, these operations change every time one of the two clients’ requirements change. Hence, they are more change-prone.
If modification tasks are not driven by clients’ or operations’ requirements
then developers tend to modify non-shared operations and non-shared data
types to keep the impact of a change low (RQ1 and RQ2).
To compute the internal cohesion and making the results of RQ2 actionable, useful metrics are needed [Bouwers et al., 2013]. This led us to introduce the DTC metric and to investigate its ability to highlight change-prone
WSDL APIs. The quantitative study showed that DTC is able to highlight
change-prone WSDL APIs. Moreover, we partially confirmed our survey participants’ insight: shared data types are change prone if they are referenced by
operations with different requirements, otherwise non-shared data types are
more likely to be changed (RQ3).
6.5.2
Implications of the Results
The results of this study are useful for web API providers, web API consumers,
and software engineering researchers.
Providers & Consumers. Both, web API providers and consumers, can
benefit from a new internal cohesion metric (DTC) that overcomes the problem of the message-level metrics. Using DTC they can measure the internal cohesion to estimate the interface level change-proneness of WSDL APIs (RQ3).
Based on the metric values, consumers can select and subscribe to the most
stable web API that shows the best internal cohesion, thereby reducing the risk
120
to continuously update their clients to new web API versions. Providers can
use DTC to identify the set of most change-prone web APIs (with low values
for DTC) that should undergo a refactoring. For example, in case of a Multiservice API, the provider should consider splitting the API into different web
APIs each one encapsulating a different business entity.
Providers. Furthermore, based on the values of the external cohesion
metric they can estimate the change-proneness considering the maintenance
scenarios likely to cause changes as suggested by our study. For instance, they
can measure the SIUC metric as proposed by Perepletchikov [Perepletchikov
et al., 2007, 2010] and refactor the web APIs with low values for external
cohesion potentially affected by the Fat antipattern. They should refactor these
APIs applying the Interface Segregation Principle described by Martin [2002].
According to this principle, Fat APIs should be split into different APIs so that
clients only have to know about the methods they are interested in.
Researchers. The results of this study are also valuable input to software
engineering researchers. In this study we showed the impact of low external and internal cohesion on change-proneness of web APIs. As next step,
researchers should investigate techniques for refactoring these kinds of web
APIs. For instance, to the best of our knowledge there are no approaches able
to apply the Interface Segregation Principle to refactor Fat APIs. Such an approach should mine the usage of a web API’ clients and, based on it, output
the ideal sub APIs. This task is particularly challenging if a web API is invoked
differently by many different clients.
In general, the results of this study are a precious input for researchers
interested in investigating the change-proneness of web APIs. Each maintenance scenario that causes changes in web APIs should be further investigated
to further assist web API providers.
6.5.3
Threats to Validity
In this study threats to construct validity concern the set of selected scenarios
that we used to investigate changes in web APIs. This set is not complete. To
mitigate this threat, we asked the participants of the survey to provide additional scenarios. Only three participants provided further scenarios. Hence,
we cannot draw any statistical conclusion. Based on this result, we believe
that we provided a good first set of scenarios that can be extended in future
studies.
With respect to internal validity, the main threat is the possibility that the
structure of the survey could have affected the answers of participants. We
mitigated this threat by randomly changing the order of the scenarios for each
6.6. Related Work
121
participant. While this randomization worked for the scenarios, the threat
stemming from the order of the questions in our survey remains - participants
could have gained knowledge from answering the earlier questions that could
have affected the answers to latter questions.
The threats to external validity have been mitigated thanks to our participants who work on software systems from different domains (e.g., banking
systems, mobile applications, telecommunication systems, financial systems).
Moreover, 18 participants are employed in international consulting companies
with expertise in a wide range of software systems. Moreover, with regards
to the quantitative analysis the set of WSDLs APIs should be enlarged in our
future work to improve the generalization of the results. However, accessing
WSDLs APIs with a long history is not an easy task. In fact, most of them are
used in a closed environment and allowing access only to registered clients.
6.6
Related Work
We identify three areas of related work: change-proneness, stability of APIs,
and analysis of web APIs.
Change-proneness. Khoshgoftaar and Szabo [1994] and Li and Henry
[1993] were among the first researchers to investigate the impact of software structures on change-proneness. Khoshgoftaar et al. trained a regression model and a neural network using size and complexity metrics to predict
change-prone components. The results show that the neural network is a
stronger predictive model compared to the multiple regression model. Li et
al. used the C&K metrics to predict maintenance effort improving the performance of prediction models. Girba et al. [2004] defined the Yesterday’s
weather approach to predict change-prone classes based on values of metrics
and the analysis of their evolution. Di Penta et al. [2008] showed that classes
participating in antipatterns are more or less change-prone depending on the
role they play in the antipattern. Khomh et al. [2009] investigated the impact of code smells on the change-proneness of Java classes. Their results
show that classes affected by code smells are more change-prone and specific smells are more correlated than others to change-proneness. Zhou et al.
[2009] examined the confounding effect of class size on the associations between metrics and change-proneness. They show that the size of a class is
a relevant confounding variable to take into account to estimate its changeproneness. These studies represent a subset of existing work (e.g., [Tsantalis
et al., 2005; Elish and Al-Khiaty, 2013]) that underlines the importance of
our research on providing indicators for highlighting change-prone software
components. However, no study exists that investigates such indicators for
122
highlighting change-prone web APIs.
Stability of APIs. The stability of APIs is a well known problem in the research community. A recent study by Vásquez et al. [2013] shows that changeprone APIs negatively impact the success of Android apps. This work does not
provide indicators for change-prone APIs but it shows the relevance of assuring
an adequate stability of APIs. Recently, Raemaekers et al. [2012] analyzed the
stability of third parties libraries using four metrics to show how third parties
libraries evolve. Hou and Yao [2011] analyzed the evolution of AWT/Swing
APIs and their findings show that the majority of the changes is performed in
the early versions. Dig and Johnson [2006] analyzed four frameworks and
one library finding that on average 80% of the API breaking changes are due
to refactoring. Even though these studies show the relevance of investigating
the stability of APIs there are few studies proposing metrics as indicators of
change-prone APIs. In our previous work presented in Chapter 2, we investigated such indicators for interfaces. In Chapter 3 we analyzed the impact of
antipatterns on the change-proneness of Java APIs. The results show that APIs
are more change-prone if they participate in ComplexClass, SpaghettiCode, and
SwissArmyKnife antipatterns. In Chapter 2 we showed that the external cohesion is the best performing metric to highlight and predict change-prone Java
interfaces in the analyzed systems. Those studies were on Java APIs while
the focus of this chapter is on web APIs analyzing metrics and antipatterns
specifically defined for web APIs.
Analyses of web APIs. In Chapter 4 we analyzed the evolution of four
WSDL APIs. We proposed the WSDLDiff tool to extract automatically finegrained changes and we showed that it helps consumers in highlighting the
most frequent changes in WSDL APIs. A similar analysis was performed by
Fokaefs et al. [2011] in 2011. They manually extracted the changes from the
different versions of the WSDL APIs. Several antipatterns for web APIs have
been proposed in literature, however, none of them has been investigated to
indicate change-prone web APIs. Moha et al. [2012] proposed an approach
for specifying and detecting web API antipatterns. In their work they provide
a complete and concise description of the most popular antipatterns.
Perepletchikov et al. [2007] proposed five cohesion metrics, but an empirical evaluation of them for indicating change-prone APIs is missing. In their
later study [Perepletchikov et al., 2010] they proposed three additional cohesion metrics and a controlled study. The results from this study show that the
proposed metrics can help in predicting the analyzability of web APIs early
in the software development life cycle, but not their stability. Our work is
complementary to this existing work. Starting from the external and internal
6.7. Concluding remarks
123
cohesion defined by Perepletchikov et al. and the antipatterns described in
Section 6.1, we present a qualitative and quantitative study of using cohesion
metrics to indicate the change-proneness of web APIs.
6.7
Concluding remarks
Assuring an adequate level of stability of web APIs is one of the key factors for
deploying successful distributed systems [Erl, 2007; Daigneau, 2011; Vásquez
et al., 2013]. While consumers want to rely on stable web APIs in order to
prevent continuous updates of their systems, providers want to publish high
quality web APIs in order to prevent such updates and to stay successful on
the market. Previous work has shown that the cohesion of an API is an indicator for understandability and stability [Perepletchikov et al., 2007, 2010].
In this chapter, we extended this research to web APIs and investigated the
relationship between internal and external cohesion and stability, measured as
change-proneness.
We first presented an online survey to rank a number of typical maintenance scenarios to improve web APIs affected by the Multiservice and Fat
antipatterns, both symptoms of web APIs with low internal and external cohesion. The results narrow down the many possible scenarios to two scenarios
for Fat APIs and three scenarios for Multiservice APIs in which changes are
very likely to occur. Focusing on internal cohesion, we detailed these results
in a quantitative study with ten public available web APIs specified in WSDL.
Results showed that the DTC metric is able to highlight change-prone WSDL
APIs.
The results of our studies also open several directions for future work.
Specifically, the method level and the data-type level change-proneness needs
to be further investigated to better classify change-prone methods and data
types. Furthermore, we plan to analyze the impact of granularity on the
change-proneness of web APIs, for instance with the SandPile and Tiny antipatterns (both symptoms of APIs with inadequate granularity) [Moha et al.,
2012].
.
7.
Refactoring Fat APIs
Recent studies have shown that the violation of the Interface Segregation Principle (ISP) is critical for maintaining and evolving software systems. Fat interfaces (i.e., interfaces violating the ISP) change more frequently and degrade the
quality of the components coupled to them. According to the ISP the interfaces’
design should force no client to depend on methods it does not invoke. Fat interfaces should be split into smaller interfaces exposing only the methods invoked
by groups of clients. However, applying the ISP is a challenging task when fat
interfaces are invoked differently by many clients.
In this chapter, we formulate the problem of applying the ISP as a multiobjective clustering problem and we propose a genetic algorithm to solve it. We
evaluate the capability of the proposed genetic algorithm with 42,318 public
Java APIs whose clients’ usage has been mined from the Maven repository. The
results of this study show that the genetic algorithm outperforms other search
based approaches (i.e., random and simulated annealing approaches) in splitting the APIs according to the ISP.1
7.1
7.2
7.3
7.4
7.5
7.6
7.7
Problem Statement and Solution
Genetic Algorithm . . . . . . . . .
Random and Local Search . . . .
Study . . . . . . . . . . . . . . . . .
Threats to Validity . . . . . . . . .
Related Work . . . . . . . . . . . .
Conclusions and Future Work . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
127
131
135
137
146
147
148
When designing interfaces developers should refactor fat interfaces [Martin, 2002]. Fat interfaces are interfaces whose clients invoke different subsets
of their methods. Such interfaces should be split into smaller interfaces each
one specific for a different client (or a group of clients). This principle has
1
This chapter was published in the in the 30th International Conference on Software Maintenance and Evolution (ICSME 2014) [Romano et al., 2014].
125
126
Chapter 7. Refactoring Fat APIs
been formalized by Martin [Martin, 2002] in 2002 and is also known as the
Interface Segregation Principle (ISP). The rationale behind this principle is that
changes to an interface break its clients. As a consequence, clients should not
be forced to depend upon interface methods that they do not actually invoke
[Martin, 2002]. This guarantees that clients are affected by changes only if
they involve the methods they invoke.
Recent studies have shown that violation of the ISP and, hence, fat interfaces can be problematic for the maintenance of software systems. First, in
Chapter 2 we showed that such interfaces are more change-prone than nonfat interfaces. Next, Abdeen et al. [Abdeen et al., 2013] proved that violations
of the ISP lead to degraded cohesion of the components coupled to fat interfaces. Finally, Yamashita et al. [Yamashita and Moonen, 2013] showed that
changes to fat interfaces result in a larger ripple effect. The results of these
studies, together with Martin’s insights [Martin, 2002], show the relevance of
designing and implementing interfaces according to the ISP.
However, to the best of our knowledge, there are no studies that propose
approaches to apply the ISP. This task is challenging when fat interfaces expose
many methods and have many clients that invoke differently their methods,
as shown in [Mendez et al., 2013]. In this case trying to manually infer the
interfaces into which a fat interface should be split is unpractical and expensive.
In this chapter, we define the problem of splitting fat interfaces according to the ISP as a multi-objective clustering optimization problem [Praditwong et al., 2011]. We measure the compliance with the ISP of an interface
through the Interface Cohesion Metric (IUC). To apply the ISP we propose a
multi-objective genetic algorithm that, based on the clients’ usage of a fat interface, infers the interfaces into which it should be split to conform to the ISP
and, hence, with higher IUC values. To validate the capability of the proposed
genetic algorithm we mine the clients’ usage of 42,318 public Java APIs from
the Maven repository. For each API, we run the genetic algorithm to split the
API into sub-APIs according to the ISP. We compare the capability of the genetic algorithm with the capability of other search-based approaches, namely
a random algorithm and a multi-objective simulated annealing algorithm. The
goal of this study is to answer the following research questions:
Is the genetic algorithm able to split APIs into sub-APIs with higher IUC values?
Does it outperform the random and simulated annealing approaches?
The results show that the proposed genetic algorithm generates sub-APIs
with higher IUC values and it outperforms the other search-based approaches.
7.1. Problem Statement and Solution
127
These results are relevant for software practitioners interested in applying the
ISP. They can monitor how clients invoke their APIs (i.e., which methods are
invoked by each client) and they can use this information to run the genetic
algorithm and split their APIs so that they comply with the ISP.
The remainder of this chapter is organized as follows. Section 7.1 introduces fat APIs, the main problems they suffer, and formulates the problem of
applying the ISP as a multi-objective clustering problem. Section 7.2 presents
the genetic algorithm to solve the multi-objective clustering problem. Section 7.3 presents the random and local search (i.e., simulated annealing) approaches implemented to evaluate the capability of the genetic algorithm. The
study and its results are shown and discussed in Section 7.4 while threats to
validity are discussed in Section 7.5. Related work is presented in Section 7.6.
We draw our conclusions and outline directions for future work in Section 7.7.
7.1
In this section, first, we introduce fat APIs, their drawbacks, and the Interface
Segregation Principle to refactor them. Then, we discuss the challenges of
applying the Interface Segregation Principle for real world APIs. Finally, we
present our solution to automatically apply the principle.
7.1.1
Fat APIs and Interface Segregation Principle
The Iterface Segregation Principle (ISP) has been originally described by Martin in [Martin, 2002] and it copes with fat APIs. Fat APIs are APIs whose clients
invoke different sets of their methods. As a consequence clients depend on interface methods that they do not invoke. These APIs are problematic and they
should be refactored because their clients can be broken by changes to methods which they do not invoke. To refactor fat APIs Martin [Martin, 2002]
introduced the ISP.
The ISP states that fat APIs need to be split into smaller APIs (referred
to as sub-APIs throughout this chapter) according to their clients’ usage. Any
client should only know about the set of methods that it invokes. Hence, each
sub-API should reflect the usage of a specific client (or of a class of clients that
invoke the same set of methods). To better understand the ISP consider the
example shown in Figure 7.1. The API shown in Figure 7.1a is considered a
fat API because the different clients (i.e., Client1, Client2, and Client3) invoke
different methods (e.g., Client1 invokes only method1, method2, and method3
out of the 10 methods declared in the API). According to the ISP, this API
should be split into three sub-APIs as shown in Figure 7.1b. These sub-APIs
are specific to the different clients (i.e., Client1, Client2, and Client3) and, as
128
a consequence, clients do not depend anymore on interface methods they do
not invoke.
FatAPI
1-method1()
2-method2()
3-method3()
4-method4()
5-method5()
6-method6()
7-method7()
8-method8()
9-method9()
10-method10()
Client1
Client2
Client3
(a) A fat API with different clients (i.e.,
Client1, Client2, and Client3) invoking different sets of methods (denoted by rectangles).
Clients depend on methods which they do not
invoke.
SubAPI1
1-method1()
2-method2()
3-method3()
Client1
SubAPI2
4-method4()
5-method5()
6-method6()
Client2
SubAPI3
7-method7()
8-method8()
9-method9()
10-method10()
Client3
(b) The fat API is split into sub-APIs each one
specific for a client. Clients depend only on
methods which they invoke.
Figure 7.1: An example of applying the Interface Segregation Principle.
7.1.2
Fat APIs and Change-Proneness
Fat APIs are also problematic because they change frequently. In Chapter 2
we showed empirically that fat APIs are more change-prone compared to nonfat APIs. In this work we used external cohesion as a heuristic to detect fat
APIs. The external cohesion was originally defined by Perepletchikov et al.
[Perepletchikov et al., 2007, 2010] for web APIs and it measures the extent
to which the methods declared in an API are used by their clients. An API
is considered externally cohesive if all clients invoke all methods of the API.
It is not externally cohesive and considered a fat API if they invoke different
subsets of its methods.
To measure the external cohesion we used the Interface Usage Cohesion
metric (IUC) defined by Perepletchikov et al. [Perepletchikov et al., 2007,
2010]. This metric is defined as:
Pn
I U C(i) =
used_methods( j,i)
j=1 num_methods(i)
n
129
where j denotes a client of the API i; used_methods (j,i) is the function which
computes the number of methods defined in i and used by the client j; num
_methods(i) returns the total number of methods defined in i; and n denotes
the number of clients of the API i. Note that the IUC values range between 0
and 1.
Consider the example shown in Figure 7.1. The FatAPI in Figure 7.1a
3
3
4
+ 10
+ 10
)/3 = 0.366 indicating low external
shows a value of IUCFatAPI = ( 10
cohesion that is a symptom of a fat API. The sub-APIs generated after applying
the ISP (shown in Figure 7.1b) show higher external cohesion. They have the
following values for IUC: IUCSubAPI1 = ( 33 )/1 = 1, IUCSubAPI2 = ( 33 )/1 = 1, and
IUCSubAPI3 = ( 44 )/1 = 1.
In Chapter 2 we investigated to which extent the IUC metric can be used
to highlight change-prone Java interface classes. The results showed that the
IUC metric exhibits the strongest correlation with the number of source code
changes performed in Java interface classes compared to other software metrics (e.g., C&K metrics [Chidamber and Kemerer, 1994]). The IUC metric also
improved the performance of prediction models in predicting change-prone
Java interface classes.
These results, together with Martin’s insights [Martin, 2002] and results
of previous studies [Abdeen et al., 2013; Yamashita and Moonen, 2013], motivated us to investigate and develop an approach to refactor fat APIs using
the ISP.
7.1.3
Problem
The problem an engineer can face in splitting a fat API is coping with API
usage diversity. In 2013, Mendez et al. [Mendez et al., 2013] investigated
how differently the APIs are invoked by their clients. They provided empirical
evidence that there is a significant usage diversity. For instance, they showed
that Java’s String API is used in 2,460 different ways by their clients. Clients
do not invoke disjoint sets of methods (as shown in Figure 7.1a) but the set of
methods can overlap and can be significantly different. As a consequence, we
argue that manually splitting fat APIs can be time consuming and error prone.
A first approach to find the sub-APIs consists in adopting brute-force search
techniques. These techniques enumerate all possible sub-APIs and check whether
they maximize the external cohesion and, hence, the value for the IUC metric.
The problem with these approaches is that the number of possible sub-APIs
can be prohibitively large causing a combinatorial explosion. Imagine for instance to adopt this approach for finding the sub-APIs for the AmazonEC2 web
API. This web API exposes 118 methods in version 23. The number of 20-
130
combinations of the 118 methods in AmazonEC2 are equal to:
118!
118
=
≈ 2 ∗ 1021
20
20!98!
This means that for evaluating all the sub-APIs with 20 methods the search
requires to analyze at least 2 ∗ 1021 possible combinations, which can take
several days on a standard PC.
As a consequence, brute-force search techniques are not an adequate solution for this problem.
7.1.4
Solution
To overcome the aforementioned problems we formulate the problem of finding sub-APIs (i.e., applying the ISP) as a clustering optimization problem defined as follows. Given the set of n methods X={X1 ,X2 ...,Xn } declared in a
fat API, find the set of non-overlapping clusters of methods C={C1 ,C2 ...,Ck }
that maximize IUC(C) and minimize clusters(C); where IUC(C) computes the
lowest IUC value of the clusters in C and clusters(C) computes the number of
clusters. In other words, we want to cluster the methods declared in a fat API
into sub-APIs that show high external cohesion, measured through the IUC
metric.
This problem is an optimization problem with two objective functions, also
known as multi-objective optimization problem. The first objective consists in
maximizing the external cohesion of the clusters in C. Each cluster in C (i.e.,
a sub-API in our case) will have its own IUC value (like for the sub-APIs in
Figure 7.1b). To maximize their IUC values we maximize the lowest IUC value
measured through the objective function IUC(C).
The second objective consists in minimizing the number of clusters (i.e.,
sub-APIs). This objective is necessary to avoid solutions containing as many
clusters as there are methods declared in the fat API. If we assign each method
to a different sub-API, all the sub-APIs would have an IUC value of 1, showing
the highest external cohesion. However, such sub-APIs do not group together
the methods invoked by the different groups of clients. Hence, the clients
would depend on many sub-APIs each one exposing a single method.
To solve this multi-objective clustering optimization problem we implemented a multi-objective genetic algorithm (presented in next section) that
searches for the Pareto optimal solutions, namely solutions whose objective
function values (i.e., IUC(C) and clusters(C) in our case) cannot be improved
without degrading the other objective function values.
7.2. Genetic Algorithm
131
Moreover, to compare the performance of the genetic algorithm with random and local search approaches we implemented a random approach and
a multi-objective simulated annealing approach that are presented in Section 7.3.
7.2
Genetic Algorithm
To solve multi-objective optimization problems different algorithms have been
proposed in literature (e.g., [Deb et al., 2000; Rudolph, 1998; Zitzler and
Thiele, 1999]). In this chapter, we use the multi-objective genetic algorithm
NSGA-II proposed by Deb et al. [Deb et al., 2000] to solve the problem of
finding sub-APIs for fat APIs according to the ISP, as described in the previous
section. We chose this algorithm because 1) it has been proved to be fast, 2)
to provide better convergence for most multi-objective optimization problems,
and 3) it has been widely used in solving search based software engineering
problems, such as presented in [Deb et al., 2000; Yoo and Harman, 2007;
Zhang et al., 2013; Li et al., 2013]. In the following, we first introduce the
genetic algorithms. Then, we show our implementation of the NSGA-II used
to solve our problem. For further details about the NSGA-II we refer to the
work by Deb et al. [Deb et al., 2000].
Genetic Algorithms (GAs) have been used in a wide range of applications
where optimization is required. Among all the applications, GAs have been
widely studied to solve clustering problems [Hruschka et al., 2009]. The key
idea of GAs is to mimic the process of natural selection providing a search
heuristic to find solutions to optimization problems. A generic GA is shown in
Figure 8.4.
Different to other heuristics (e.g., Random Search, Brute-Force Search, and
Local search) that consider one solution at a time, a GA starts with a set of
candidate solutions, also known as population (step 1 in Figure 8.4). These
solutions are randomly generated and they are referred to as chromosomes.
Since the search is based upon many starting points, the likelihood to explore
a wider area of the search space is higher than other searches. This feature
reduces the likelihood to get stuck in a local optimum. Each solution is evaluated through a fitness function (or objective function) that measures how
good a candidate solution is relatively to other candidate solutions (step 2).
Solutions from the population are used to form new populations, also known
as generations. This is achieved using the evolutionary operators. Specifically,
first a pair of solutions (parents) is selected from the population through a
selection operator (step 4). From these parents two offspring solutions are
generated through the crossover operator (step 5). The crossover operator
132
Create initial population of
chromosomes
1
2
Evaluate fitness of each
chromosome
3
Max
Iterations
4
Select next generation
(Selection Operator)
5
Perform reproduction
(Crossover operator)
6
Perform mutation
(Mutation operators)
7
Output
best chromosomes
Figure 7.2: Different steps of a genetic algorithm.
is responsible to generate offspring solutions that combine features from the
two parents. To preserve the diversity, the mutation operators (step 6) mutate
the offspring. These mutated solutions are added to the population replacing
solutions with the worst fitness function values. This process of evolving the
population is repeated until some condition (e.g., reaching the max number
of iterations in step 3 or achieving the goal). Finally, the GA outputs the best
solutions when the evolution process terminates (step 7).
To implement the GA and adapt it to find the set of sub-APIs into which
a fat API should be split we next define the fitness function, the chromosome
(or solution) representation, and the evolutionary operators (i.e., selection,
crossover, and mutation).
7.2.1
Chromosome representation
To represent the chromosomes we use a label-based integer encoding widely
adopted in literature [Hruschka et al., 2009] and shown in Figure 8.5. According to this encoding, a solution is an integer array of n positions, where n
is the number of methods exposed in a fat API. Each position corresponds to a
specific method (e.g., position 1 corresponds to the method method1() in Fig-
7.2. Genetic Algorithm
133
ure 7.1a). The integer values in the array represent the clusters (i.e., sub-APIs
in our case) to which the methods belong. For instance in Figure 8.5, the methods 1,2, and 10 belong to the same cluster labeled with 1. Note that two chromosomes can be equivalent even though the clusters are labeled differently.
For instance the chromosomes [1,1,1,1,2,2,2,2,3,3] and [2,2,2,2,3,3,3,3,1,1]
are equivalent. To solve this problem we apply the renumbering procedure as
shown in [Falkenauer, 1998] that transforms different labelings of equivalent
chromosomes into a unique labeling.
1 2 3 4 5 6 7 8 9 10
1 1 2 3 2 4 5 3 6 1
Figure 7.3: Chromosome representation of our candidate solutions.
7.2.2
Fitness Functions
The fitness function is a function that measures how good a solution is. For
our problem we have two fitness functions corresponding to the two objective
functions discussed in Section 7.1, namely IUC(C)) and clusters(C). IUC(C)
returns the lowest IUC value of the clusters in C and clusters(C) returns the
number of clusters in C. Hence, the two fitness functions are f1 =IUC(C) and
f2 =clusters(C). While the value of f1 should be maximized the value of f2
should be minimized.
Since we have two fitness functions, we need a comparator operator that,
given two chromosomes (i.e., candidate solutions), returns the best one based
on their fitness values. As comparator operator we use the dominance comparator as defined in NSGA-II. This comparator utilizes the idea of Pareto optimality and the concept of dominance for the comparison. Precisely, given two
chromosomes A and B, the chromosome A dominates chromosome B (i.e., A
is better than B) if 1) every fitness function value for chromosome A is equal
or better than the corresponding fitness function value of the chromosome B,
and 2) chromosome A has at least one fitness function value that is better than
the corresponding fitness function value of the chromosome B.
7.2.3
The Selection Operator
The selection operator selects two parents from a population according to their
fitness function values. We use the Ranked Based Roulette Wheel (RBRW) that
is a modified roulette wheel selection operator as proposed by Al Jadaan and
Rajamani [2008]. RBRW ranks the chromosomes in the population by the
fitness values: the highest rank is assigned to the chromosome with the best
134
fitness values. Hence, the best chromosomes have the highest probabilities to
be selected as parents.
7.2.4
The Crossover Operator
Once the GA has selected two parents (ParentA and ParentB) to generate the
offspring, the crossover operator is applied to them with a probability Pc . As
crossover operator we use the operator defined specifically for clustering problems by Hruschka et al. [2009]. In order to illustrate how this operator works
consider the example shown in Figure 8.6 from [Hruschka et al., 2009]. The
operator first selects randomly k (1≤k≤n) clusters from ParentA, where n is
the number of clusters in ParentA. In our example assume that the clusters
labeled 2 (consisting of methods 3, 5, and 10) and 3 (consisting of method 4)
are selected from ParentA (marked red in Figure 8.6). The first child (ChildC)
originally is created as copy of the second parent ParentB (step 1). As second
step, the selected clusters (i.e., 2 and 3) are copied into ChildC. Copying these
clusters changes the clusters 1, 2, and 3 in ChildC. These changed clusters are
removed from ChildC (step 3) leaving the corresponding methods unallocated
(labeled with 0). In the fourth step (not shown in Figure 8.6) the unallocated
methods are allocated to an existing cluster that is randomly selected.
The same procedure is followed to generate the second child ChildD. However, instead of selecting randomly k clusters from ParentB, the changed clusters of ChildC (i.e., 1,2, and 3) are copied into ChildD that is originally a copy
of ParentA.
7.2.5
The Mutation Operators
After obtaining the offspring population through the crossover operator, the
offspring is mutated through the mutation operator with a probability Pm . This
step is necessary to ensure genetic diversity from one generation to the next
ones. The mutation is performed by randomly selecting one of the following cluster-oriented mutation operators [Falkenauer, 1998; Hruschka et al.,
2009]:
• split: a randomly selected cluster is split into two different clusters. The
methods of the original cluster are randomly assigned to the generated
clusters.
• merge: moves all methods of a randomly selected cluster to another
randomly selected cluster.
• move: moves methods from one cluster to another. Both methods and
clusters are randomly selected.
7.3. Random and Local Search
135
ParentA
1 2 3 4 5 6 7 8 9 10
1 1 2 3 2 4 5 1 2 5
ParentB
1 2 3 4 5 6 7 8 9 10
4 2 1 2 3 3 2 1 2 4
1: copy ParentB into ChildC
ChildC 4 2 1 2 3 3 2 1 2 4
2: copy clusters 2 and 3 from
ParentA to ChildC
ChildC 4 2 2 3 2 3 2 1 2 4
3: remove changed methods
from B (i.e., 1,2,3)
ChildC 4 0 2 3 2 0 0 0 2 4
4: unallocated objects are allocated to randomly
selected clusters
Figure 7.4: Example of crossover operator for clustering problems [Hruschka
et al., 2009].
We implemented the proposed genetic algorithm on top of the JMetal2
framework that is a Java framework that provides state-of-the-art algorithms
for optimization problems, including the NSGA-II algorithm.
7.3
Random and Local Search
To better evaluate the performance of our proposed genetic algorithm we implemented a random algorithm and a local search algorithm (i.e., a multiobjective simulated annealing algorithm) that are presented in the following
sub-sections.
7.3.1
Random Algorithm
The random algorithm tries to find an optimal solution by generating random
solutions. To implement the random algorithm we use the same solution representation (i.e., chromosome representation) used in the genetic algorithm
described in Section 7.2. The algorithm iteratively generates a random solution and evaluates it using the same fitness functions defined for the genetic
algorithm. When the maximum number of iterations is reached the best so2
http://jmetal.sourceforge.net
136
lution is output. This algorithm explores the search space randomly relying
on the likelihood to find a good solution after a certain number of iterations.
We use a random search as baseline because this comparison is considered the
first step to evaluate a genetic algorithm [Sivanandam and Deepa, 2007].
7.3.2
Multi-Objective Simulated Annealing
As second step to evaluate the performance of our proposed genetic algorithm
we implemented a local search approach.
A local search algorithm (e.g., hill-climbing) starts from a candidate solution and then iteratively tries to improve it. Starting from a random generated solution the solution is mutated obtaining the neighbor solution. If the
neighbor solution is better than the current solution (i.e., it has higher fitness
function values) it is taken as current solution to generate a new neighbor
solution. This process is repeated until the best solution is obtained or the
maximum number of iterations is reached. The main problem of such local
search approaches is that they can get stuck in a local optimum. In this case
the local search approach cannot further improve the current solution.
To mitigate this problem advanced local search approaches have been proposed like simulated annealing. The simulated annealing algorithm was inspired from the process of annealing in metallurgy. This process consists in
heating and cooling a metal. Heating the metal alters its internal structure
and, hence, its physical properties. On the other hand, when the metal cools
down its new internal structure becomes fixed.
The simulated annealing algorithm simulates this process. Initially the
temperature is set high and then it is decreased slowly as the algorithm runs.
While the temperature is high the algorithm is more likely to accept a neighbor
solution that is worse than the current solution, reducing the likelihood to get
stuck in a local optimum. At each iteration the temperature is slowly decreased
by multiplying it by a cooling factor α where 0 < α < 1. When the temperature
is reduced, worse neighbor solutions are accepted with a lower probability.
Hence, at each iteration a neighbor solution is generated mutating the current solution. If this solution has better fitness function values it is taken as
current solution. Otherwise it is accepted with a certain probability called acceptance probability. This acceptance probability is computed by a function
based on 1) the difference between the fitness function values of the current
and neighbor solution and 2) the current temperature value.
To adapt this algorithm for solving our multi-objective optimization problem we implemented a Multi-Objective Simulated Annealing algorithm following the approach used by Shelburg et al. [2013]. To represent the solutions
7.4. Study
137
we use the same solution representation used in the genetic algorithm (i.e.,
label-based integer encoding). We generate the neighbor solutions using the
mutation operators used in our genetic algorithm. We compare two solutions
using the same fitness functions and dominance comparator of our genetic algorithm. The acceptance probability is computed as in [Shelburg et al., 2013]
with the following function:
Accept P r o b(i, j, t emp) = e
−abs(c(i, j))
t emp
where i and j are the current and neighbor solutions; temp is the current
temperature; and c(i,j) is a function that computes the difference between the
fitness function values of the two solutions i and j. This difference is computed
as the average of the differences of each fitness function values of the two
solutions according to the following equation:
P|D|
c(i, j) =
k=1 (ck ( j) − ck (i))
|D|
where D is the set of fitness functions and ck ( j) is the value of the fitness
function k of the solution j. In our case the fitness functions are the IUC(C)
and clusters(c) functions used in the genetic algorithm. Note that since this
difference is computed as average it is relevant that the fitness function values
are measured on the same scale. For this reason the values of the fitness
function clusters(C) are normalized to the range between 0 and 1. For further
details about the multi-objective simulated annealing we refer to the work in
[Nam and Park, 2000; Shelburg et al., 2013].
7.4
Study
The goal of this empirical study is to evaluate the effectiveness of our proposed
genetic algorithm in applying the ISP to Java APIs. The quality focus is the
ability of the genetic algorithm to split APIs in sub-APIs with higher external
cohesion that is measured through the IUC metric. The perspective is that of
API providers interested in applying the ISP and in deploying APIs with high
external cohesion. The context of this study consists of 42,318 public Java APIs
mined from the Maven repository.
In this study we answer the following research questions:
Is the genetic algorithm able to split APIs into sub-APIs with higher IUC values?
Does it outperform the random and simulated annealing approaches?
138
In the following, first, we show the process we used to extract the APIs and
their clients’ usage from the Maven repository. Then, we show the procedure
we followed to calibrate the genetic algorithm and the simulated annealing
algorithms. Finally, we present and discuss the results of our study.
7.4.1
Data Extraction
The public APIs under analysis and their clients’ usage have been retrieved
from the Maven repository.3 The Maven repository is a publicly available data
set containing 144,934 binary jar files of 22,205 different open-source Java
libraries, which is described in more detail in [Raemaekers et al., 2013]. Each
binary jar file has been scanned to mine method calls using the ASM4 Java
bytecode manipulation and analysis framework. The dataset was processed
using the DAS-3 Supercomputer5 consisting of 100 computing nodes.
To extract method calls we scanned all .class files of all jar files. Class
files contain fully qualified references to the methods they call, meaning that
the complete package name, class name and method name of the called method
is available in each .class file. For each binary file, we use an ASM bytecode
visitor to extract the package, class and method name of the callee.
Once we extracted all calls from all .class files, we grouped together
calls to the same API. As clients of an API we considered all classes declared
in other jar files from the Maven repository that invoke public methods of that
API. Note that different versions of the same class are considered different for
both clients and APIs. Hence, if there are two classes with the same name
belonging to two different versions of a jar file they are considered different.
To infer which version of the jar file a method call belongs to we scanned the
Maven build file (pom.xml) for dependency declarations.
In total we extracted the clients’ usage for 110,195 public APIs stored in
the Maven repository. We filtered out APIs not relevant for our analysis by
applying the following filters:
• APIs should declare at least two methods.
• APIs should have more than one client.
• IUC value of the APIs should be less than one.
After filtering out non relevant APIs we ended up with a data set of 42,318
public APIs whose number of clients, methods, and invocations are shown by
3
http://search.maven.org
http://asm.ow2.org
5
http://www.cs.vu.nl/das3/
4
139
0
20
40
60
80
100
7.4. Study
#Methods
#Clients
#Invocations
Figure 7.5: Box plots of number of methods (#Methods), clients (#Clients),
and invocations (#Invocations) for the public APIs under analysis. Outliers
have been removed for the sake of simplicity.
the box plots in Figure 7.5 where outliers have been removed for the sake of
simplicity. The median number of methods exposed in the APIs under analysis
is 4 while the biggest API exposes 370 methods. The median number of clients
is 10 with a maximum number of 102,4456 (not shown in Figure 7.5). The
median number of invocations to the APIs is 17 with a maximum number of
270,5697 (not shown in Figure 7.5).
7.4.2
GA and SA Calibration
To calibrate the GA and SA algorithms we followed a trial-and-error procedure
with 10 toy examples. Each toy example consists of an API with 10 methods
and 4 clients. For each of the 10 toy examples we changed the clients’ usage.
Then, we evaluated the IUC values output by the algorithms with different
parameters. For each different parameter, we ran the algorithms ten times.
We used the Mann-Whitney and Cliff’s Delta tests to evaluate the difference
between the IUC values output by each run. For the GA we evaluated the
output with the following parameters:
• population size was incremented stepwise by 10 from 10 to 200 individ6
7
clients of the API org.apache.commons.lang.builder.EqualsBuilder
invocations to the API org.apache.commons.lang.builder.EqualsBuilder
140
uals.
• numbers of iterations was incremented stepwise by 1,000 from 1,000 to
10,000.
• crossover and mutation probability were increased stepwise by 0.1 from
0.0 to 1.0.
We noticed a slower convergence of the GA only when the population size was
less than 50, the number of iterations was less than 1,000, and the crossover
and mutation probability was less than 0.7. Hence, we decided to use the
default values specified in JMetal (i.e., population of 100 individuals, 10,000
iterations, crossover and mutation probability of 0.9).
Similarly, the output of the SA algorithm was evaluated with different values for the cooling factor. The cooling factor was incremented stepwise by 0.1
from 0.1 to 1.0. We did not register any statistically significant difference and
we chose a starting temperature of 0.0003 and a cooling factor of 0.99965 as
proposed in [Shelburg et al., 2013]. The number of iterations for the SA and
RND algorithms is 10,000 to have a fair comparison with the GA.
7.4.3
Results
To answer our research questions, first, we compute the IUC value for each
public API using the extracted invocations. We refer to this value as IUCbefore .
Then, we run the genetic algorithm (GA), the simulated annealing algorithm
(SA), and random algorithm (RND) with the same number of iterations (i.e.,
10,000). For each API under analysis, these algorithms output the set of subAPIs into which the API should be split. Each sub-API will show a different IUC
value. Among these sub-APIs we take the sub-API with the lowest IUC value to
which we refer as IUCafter . We chose the lowest IUC value because this gives
us the lower boundary for the IUC values of the resulting sub-APIs.
Figure 7.6 shows the distributions of IUCafter values and number of subAPIs output by the different algorithms. The box plots in Figure 7.6a show that
all the search-based algorithms produced sub-APIs with higher IUCafter values
compared to the original APIs (ORI). The genetic algorithm (GA) produced
sub-APIs that have higher IUCafter values than the original APIs (ORI) and the
sub-APIs generated by the simulated annealing algorithm (SA) and by the
random algorithm (RND). The second best algorithm is the random algorithm
that outperforms the simulated annealing.
The higher IUCafter values of the genetic algorithm are associated with a
higher number of sub-APIs as shown in Figure 7.6b. These box plots show that
7.4. Study
141
5
1
0.0
2
0.2
3
4
#SubAPIs
0.6
0.4
IUC
6
0.8
7
8
1.0
the median number of sub-APIs are 2 for the genetic algorithm and the random
algorithm. The simulated annealing generated a median number of 1 API,
meaning that in 50% of the cases it kept the original API without being able
to split it. We believe that the poor performance of the simulated annealing is
due to its nature. Even though it is an advanced local search approach it is still
a local search approach that can get stuck in a local optimum. To give a better
view of the IUC values of the sub-APIs we show the distributions of IUC values
measured on the sub-APIs generated by the genetic algorithm in Figure 7.7.
Min represents the distribution of IUC values of sub-APIs with the lowest IUC
(i.e., IUCafter ). Max represents the distribution of IUC values of sub-APIs with
the highest IUC. Q1, Q2, and Q3 represent respectively the first, second, and
third quartiles of the ordered set of IUC values of the sub-APIs.
ORI
GA
SA
RND
(a) Box plots of IUC values measured
on the original APIs (ORI) and IUCafter
measured on the sub-APIs output by
the genetic algorithm (GA), by the
simulated annealing algorithm (SA),
and by the random algorithm (RND).
GA
SA
RND
(b) Number of sub-APIs generated by
the genetic algorithm (GA), the simulated annealing algorithm (SA), and
the random algorithm (RND).
Figure 7.6: IUC values and number of sub-APIs generated by the different
search-based algorithms.
The box plots in Figure 7.6 already give insights into the capability of the
different search-based algorithms of applying the ISP. To provide statistical evidence of their capability we compute the difference between the distributions
of IUCbefore and IUCafter generated by the different algorithms using the paired
Mann-Whitney test [Mann and R., 1947] and the paired Cliff’s Delta d effect
size [Grissom and Kim, 2005]. First, we use the Mann-Whitney test to analyze
142
IUC
0.4
0.6
0.8
1.0
IUC of sub−APIs output by the GA
min
Q1
Q2
Q3
max
Figure 7.7: Box plots of IUC values measured on the sub-APIs output by the
genetic algorithm. Outliers have been removed for the sake of simplicity.
whether there is a significant difference between the distributions of IUCbefore
and IUCafter . Significant differences are indicated by Mann-Whitney p-values
≤ 0.01. Then, we use the Cliff’s Delta effect size to measure the magnitude
of the difference. Cliff’s Delta estimates the probability that a value selected
from one group is greater than a value selected from the other group. Cliff’s
Delta ranges between +1, if all selected values from one group are higher
than the selected values in the other group, and -1, if the reverse is true. 0
expresses two overlapping distributions. The effect size is considered negligible for d < 0.147, small for 0.147 ≤ d < 0.33, medium for 0.33 ≤ d < 0.47,
and large for d ≥ 0.47 [Grissom and Kim, 2005]. We chose the Mann-Whitney
test and Cliff’s Delta effect size because the distributions of IUC values are
not normally distributed as shown by the results of the Shapiro test. The
Mann-Whitney test and Cliff’s Delta effect size are suitable for non-normal
distribution because they do not require assumptions about the variances and
the types of the distributions (i.e., they are non-parametric tests). The results
of the Mann-Whitney test and Cliff’s Delta effect size are shown in Table 7.1.
The distribution of IUCafter values measured on the sub-APIs generated by
the genetic algorithm is statistically different (M-W p-value<2.20E-16) from
the original IUC values (GA vs ORI). The Cliff’s Delta is 0.732 if we consider
all the APIs (ALL) and 1 if we consider only APIs with more than 2 methods
(#Methods>2). In both cases the Cliff’s delta is greater than 0.47 and, hence,
the effect size is considered statistically large. We obtained similar results
comparing the distributions of IUCafter values of the sub-APIs generated by
7.4. Study
143
APIs
ALL
#Methods>2
APIs
ALL
#Methods>2
APIs
ALL
#Methods>2
GA vs ORI
Cliff’s delta
0.732
1
GA vs SA
M-W p-value Cliff’s delta
<2.20E-16
0.705
<2.20E-16
0.962
GA vs RND
M-W p-value Cliff’s delta
<2.20E-16
0.339
<2.20E-16
0.463
M-W p-value
<2.20E-16
<2.20E-16
Magnitude
large
large
Magnitude
large
large
Magnitude
medium
medium
Table 7.1: Mann-Whitney p-value (M-W p-value) and Cliff’s delta between the
distributions of IUCafter values measured on the sub-APIs generated by the
genetic algorithm and measured on the original APIs (i.e., GA vs ORI) and
on the sub-APIs generated by the simulated annealing (i.e., GA vs SA) and
random algorithm (i.e., GA vs RND). The table reports the results for all the
APIs under analysis (i.e., ALL) and for APIs with more than 2 methods (i.e.,
#Methods>2).
the genetic algorithm and the simulated annealing algorithm (GA vs SA). The
Mann-Whitney p-value is <2.20E-16 and the Cliff’s delta is large (i.e., 0.705 for
ALL and 0.962 for #Methods>2). The distributions of IUCafter values of the
genetic algorithms and random algorithm (GA vs RND) are also statistically
different (M-W p-value<2.20E-16). Its effect size is medium (i.e., 0.339 for
ALL and 0.463 for #Methods>2).
Moreover, from the results shown in Table 7.1 we notice that the Cliff’s
delta effect size is always greater when we consider only APIs with more than
two methods. This result shows that the effectiveness of the genetic algorithm,
random algorithm, and simulated annealing algorithm might depend on the
number of methods declared in the APIs, number of clients, and number of
invocations. To investigate whether these variables have any impact on the
effectiveness of the algorithms, we analyze the Cliff’s Delta for APIs with increasing numbers of methods, clients, and invocations. First, we partition the
data set grouping together APIs with the same number of methods. Then, we
compute the Cliff’s Delta between the distributions of IUCbefore and IUCafter for
each different group. Finally, we use the paired Spearman correlation test to
investigate the correlation between the Cliff’s Delta measured on the different
144
GA vs ORI
GA vs SA
GA vs RND
GA vs ORI
GA vs SA
GA vs RND
GA vs ORI
GA vs SA
GA vs RND
#Methods
rho
corr
0.070
none
-0.028
none
0.617
strong
#Clients
p-value
rho
corr
5.199E-13
0.446 moderate
8.872E-12
0.429 moderate
2.057e-08
0.424 moderate
#Invocations
p-value
rho
corr
<2.20E-16 0.541
strong
<2.20E-16 0.520
strong
9.447E-14
0.477 moderate
p-value
0.6243
0.8458
8.127E-06
Table 7.2: P-values and rho values of the Spearman correlation test to investigate the correlation between the Cliff’s Delta and number of methods, clients,
and invocations. Values in bold indicate significant correlations. Corr indicates
the magnitude of the correlations.
groups and their number of methods. We use the same method to analyze
the correlation between the Cliff’s Delta and the number of clients and invocations. The Spearman test compares the ordered ranks of the variables to
measure a monotonic relationship. We chose the Spearman correlation because the distributions under analysis are non-normal (normality has been
tested with the Shapiro test). The Spearman test is a non-parametric test and,
hence, it does not make assumptions about the distribution, variances and the
type of the relationship [S.Weardon and Chilko, 2004]. A Spearman rho value
of +1 and -1 indicates high positive or high negative correlation, whereas
0 indicates that the variables under analysis do not correlate at all. Values
greater than +0.3 and lower than -0.3 indicate a moderate correlation; values
greater than +0.5 and lower than -0.5 are considered to be strong correlations
[Hopkins, 2000].
The results of the Spearman correlation tests are shown in Table 7.2. We
notice that the Cliff’s Delta between the distributions of IUCafter values of
the genetic algorithm and the random algorithm (i.e., GA vs RND) increases
with larger APIs. The Cliff’s Delta effect size are strongly correlated (i.e.,
rho=0.617) with the number of methods (#Methods). This indicates that
7.4. Study
145
the more methods an API exposes the more the genetic algorithm outperforms
the random algorithm generating APIs with higher IUC. Moreover, with increasing number of clients (i.e., #Clients) and invocations (i.e., #Invocations)
the Cliff’s Delta between the distributions of IUCafter values of the genetic algorithm and the other search algorithms increases as well. This is indicated
by rho values that are greater than 0.3.
Based on these results we can answer our research questions stating that
1) the genetic algorithm is able to split APIs into sub-APIs with higher IUC values and 2) it outperforms the other search-based algorithms. The difference in
performance between the genetic algorithm and random algorithm increases
with an increasing number of methods declared in the APIs. The difference in
performance between the genetic algorithm and the other search-based techniques increases with an increasing number of clients and invocations.
7.4.4
Discussions of the Results
The results of our study are relevant for API providers. Publishing stable APIs
is one of their main concerns, especially if they publish APIs on the web. APIs
are considered contracts between providers and clients and they should stay
as stable as possible to not break clients’ systems. In Chapter 2 we showed empirically that fat APIs (i.e., APIs with low external cohesion) are more changeprone than non-fat APIs. To refactor such APIs Martin [2002] proposed the
Interface Segregation Principle (ISP). However, applying this principle is not
trivial because of the large API usage diversity [Mendez et al., 2013].
Our proposed genetic algorithm assists API providers in applying the ISP.
To use our genetic algorithm providers should monitor how their clients invoke
their API. For each client they should record the methods invoked in order to
compute the IUC metric. This data is used by the genetic algorithm to evaluate
the candidate solutions through fitness functions as described in Section 7.2.
The genetic algorithm is then capable to suggest the sub-APIs into which an
API should be split in order to apply the ISP.
This approach is particularly useful to deploy stable web APIs. One of the
key factors for deploying successful web APIs is assuring an adequate level
of stability. Changes in a web API might break the consumers’ systems forcing them to continuously adapt them to new versions of the web API. Using
our approach providers can deploy web APIs that are more externally cohesive and, hence, less change-prone as shown in Chapter 2. Moreover, since
our approach is automated, it can be integrated into development and continuous integration environments to continuously monitor the conformance of
APIs to the ISP. Providers regularly get informed when and how to refactor
146
an API. However, note that the ISP takes into account only the clients’ usage
and, hence, the external cohesion. As a consequence, while our approach assures that APIs are external cohesive, it currently does not guarantee other
quality attributes (e.g., internal cohesion). As part of our future work we plan
to extend our approach in order to take into account other relevant quality
attributes.
7.5
Threats to Validity
This section discusses the threats to validity that can affect the empirical study
presented in the previous section.
Threats to construct validity concern the relationship between theory and
observation. In our study this threat can be due to the fact that we mined the
APIs usage through a binary analysis. In our analysis we have used binary jar
files to extract method calls. The method calls that are extracted from compiled .class files are, however, not necessarily identical to the method calls
that can be found in the source code. This is due to compiler optimizations.
For instance, when the compiler detects that a certain call is never executed, it
can be excluded. However, we believe that the high number of analyzed APIs
mitigates this threat.
With respect to internal validity, the main threat is the possibility that the
tuning of the genetic algorithm and the simulated annealing algorithm can affect the results. We mitigated this threat by calibrating the algorithms with 10
toys examples and evaluating statistically their performance while changing
their parameters.
Threats to conclusion validity concern the relationship between the treatment and the outcome. Wherever possible, we used proper statistical tests to
support our conclusions. In particular we used non-parametric tests which do
not make any assumption on the underlying data distribution that was tested
against normality using the Shapiro test. Note that, although we performed
multiple Mann-Whitney and Spearman tests, p-value adjustment (e.g., Bonferroni) is not needed as we performed the tests on independent and disjoint
data sets.
Threats to external validity concern the generalization of our findings. We
mitigated this threat evaluating the proposed genetic algorithm on 42,318
public APIs coming from different Java systems. The invocations to the APIs
have been mined from the Maven repository. These invocations are not a
complete set of invocations to the APIs because they do not include invocations
from software systems not stored in Maven. However, we are confident that
7.6. Related Work
147
the data set used in this chapter is a representative sample set.
7.6
Related Work
Interface Segregation Principle. After the introduction of the ISP by Martin
[2002] in 2002 several studies have investigated the impact of fat interfaces
on the quality of software systems.
In 2013, Abdeen et al. [2013] investigated empirically the impact of interfaces’ quality on the quality of implementing classes. Their results show that
violations of the ISP lead to degraded cohesion of the classes that implement
fat interfaces.
In 2013, Yamashita and Moonen [2013] investigated the impact of intersmell relations on software maintainability. They analyzed the interactions of
12 code smells and their relationships with maintenance problems. Among
other results, they show that classes violating the ISP manifest higher afferent
coupling. As a consequence changes to these classes result in a larger ripple
effect.
In Chapter 2, we showed that violations of the ISP can be used to predict change-prone interfaces. Among different source code metrics (e.g., C&K
metrics [Chidamber and Kemerer, 1994]) we demonstrated that fat interfaces
(i.e., interfaces showing a low external cohesion measured through the IUC
metric) are more change-prone than non-fat interfaces. Moreover, our results
proved that the IUC metric can improve the performance of prediction models
in predicting change-prone interfaces.
The results of this related work show the relevance of applying the ISP and
motivated us in defining the approach presented in this chapter.
Search Based Software Engineering. Over the last years genetic algorithms, and in general search based algorithms, have become popular to perform refactorings of software systems. The approach closest to ours has been
presented by Praditwong et al. [2011] in 2011. The authors formulated the
problem of designing software modules that adhere to quality attributes (e.g.,
coupling and cohesion) as multi-objective clustering search problem. Similarly
to our work, they defined a multi-objective genetic algorithm that clusters software components into modules. Moreover, they show that multi-objective approaches produce better solutions than existing single-objective approaches.
This work influenced us in defining the problem as multi-objective problem
instead of a single-objective problem. However, the problem we solve is different from theirs. Our approach splits fat API accordingly to the ISP and uses
different fitness functions.
148
Prior to this work [Praditwong et al., 2011], many other studies proposed
approaches to cluster software components into modules (e.g., [Mitchell and
Mancoridis, 2006; Mancoridis et al., 1999, 1998; Mitchell and Mancoridis,
2002; Mahdavi et al., 2003; Harman et al., 2005]). These studies propose
single-objective approaches that have been proven to produce worse solutions
by Praditwong et al. [2011].
To the best of our knowledge there are no studies that propose approaches
to split fat APIs accordingly to the ISP as proposed in this chapter.
7.7
Conclusions and Future Work
In this chapter we proposed a genetic algorithm that automatically obtains
the sub-APIs into which a fat API should be split according to the ISP. Mining
the clients’ usage of 42,318 Java APIs from the Maven repository we showed
that the genetic algorithm is able to split APIs into sub-APIs. Comparing the
resulting sub-APIs, based on the IUC values, we showed that the genetic algorithms outperforms the random and simulated annealing algorithms. The
difference in performance between the genetic algorithm and the other searchbased techniques increases with APIs with an increasing number of methods,
clients, and invocations. Based on these results API providers can automatically obtain and refactor the set of sub-APIs based on how clients invoke the
fat APIs.
While this approach is already actionable and useful for API providers,
we plan to further improve it in our future work. First, we plan to evaluate
qualitatively the sub-APIs generated by the genetic algorithm. The higher
IUC values guarantee that sub-APIs are more external cohesive and, hence,
they better conform to the ISP. However, we have not investigated yet what
developers think about the sub-APIs. Hence, we plan to contact developers
and perform interviews to investigate the quality of these sub-APIs. Next, we
plan to extend our approach taking into account other quality attributes, such
as internal cohesion. Finally, we plan to slightly modify the genetic algorithm
to generate overlapping sub-APIs (i.e., sub-APIs that share common methods).
.
8
Refactoring Chatty web APIs
The relevance of the service interfaces’ granularity and its architectural impact
have been widely investigated in literature. Existing studies show that the granularity of a service interface, in terms of exposed operations, should reflect their
clients’ usage. This idea has been formalized in the Consumer-Driven Contracts
pattern (CDC). However, to the best of our knowledge, no studies propose techniques to assist providers in finding the right granularity and in easing the adoption of the CDC pattern.
In this chapter, we propose a genetic algorithm that mines the clients’ usage
of service operations and suggests Façade services whose granularity reflect the
usage of each different type of clients. These services can be deployed on top of
the original service and they become contracts for the different types of clients
satisfying the CDC pattern. A first study shows that the genetic algorithm is
capable of finding Façade services and outperforms a random search approach.1
8.1
8.2
8.3
8.4
8.5
The Genetic Algorithm . . . . . .
Study . . . . . . . . . . . . . . . . .
Related Work . . . . . . . . . . . .
Conclusion & Future Work . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
151
156
161
165
166
One of the key factors for deploying successful services is assuring an adequate level of granularity [Hohpe and Woolf, 2003; Daigneau, 2011; Murer
et al., 2010; Haesen et al., 2008; Kulkarni and Dwivedi, 2008]. The choice
of how operations should be exposed through a service interface can have an
impact on both performance and reusability [Hohpe and Woolf, 2003; Murer
et al., 2010]. This level of granularity is also know in literature as functionality granularity [Haesen et al., 2008]. For the sake of simplicity we refer to it
simply as granularity throughout this chapter. Choosing the right granularity
1
This chapter was published in the 10th World Congress on Services (Services 2014) [Romano
and Pinzger, 2014].
149
150
Chapter 8. Refactoring Chatty web APIs
is not a trivial task. On the one hand, fine-grained services lead their clients to
invoke their interfaces multiple times worsening the performance [Hohpe and
Woolf, 2003; Daigneau, 2011]. On the other hand, coarse-grained services can
reduce reusability because their use is limited to very specific contexts [Hohpe
and Woolf, 2003; Daigneau, 2011]. To find a trade-off between fine-grained
and coarse-grained services the Consumer-Driven Contracts (CDC) pattern has
been proposed [Daigneau, 2011]. This pattern states that the granularity of
a service interface should reflect their clients’ usage satisfying their requirements and becoming a contract between clients and providers.
In literature several studies have investigated the impact of granularity
(e.g., [Hohpe and Woolf, 2003; Daigneau, 2011; Murer et al., 2010; Haesen
et al., 2008; Kulkarni and Dwivedi, 2008]), have classified the different levels
of granularity (e.g., [Haesen et al., 2008]), and have proposed metrics to measure them (e.g., [Khoshkbarforoushha et al., 2010; Alahmari et al., 2011]).
However, to the best of our knowledge, there are no studies proposing techniques to assist service providers in finding the right granularity and adopting
the CDC pattern. This task can be expensive because many clients invoke a service interface in different ways. Providers should, first, analyze the usage of
many clients and, then, design a service interface that satisfies all the clients’
requirements.
In this chapter, we propose a genetic algorithm to assist service providers
in finding the adequate granularity and adopting the CDC pattern. This algorithm mines the clients’ usage of a service interface and it retrieves Façade
services [Krafzig et al., 2004] whose interfaces have an adequate granularity for each different type of clients. These Façade services become contracts
that reflect clients’ usage easing the adoption of the CDC pattern. Moreover,
providers can deploy them on top of the existing service making this approach
actionable without modifying it.
The contributions of this chapter are as follows:
• a genetic algorithm designed to infer Façade services from clients’ usage
that represent contracts with the different types of clients.
• a study to evaluate the capability of the genetic algorithm compared to
the capability of a random search approach.
The results show that the genetic algorithm is capable of finding Façade
services and it outperforms the random search.
The remainder of this chapter is organized as follows. Section 8.1 presents
the problem and the proposed solution. Section 8.2 shows the proposed ge-
151
netic algorithm. Section 8.3 presents the study, its results, and discusses them.
Related work is presented in Section 8.4 while in Section 8.5 we draw our
conclusions and outline directions for future work.
8.1
In this section, first, we introduce the problem of finding the adequate granularity of service interfaces presenting the Consumer-Driven Contracts pattern.
Then, we present our solution to address this problem.
8.1.1
Problem Statement
Choosing the adequate granularity of a service is a relevant task and a widely
discussed topic [Hohpe and Woolf, 2003; Daigneau, 2011; Murer et al., 2010;
Haesen et al., 2008; Kulkarni and Dwivedi, 2008].
On the one hand, fine-grained services can lead to service-oriented systems with inadequate performance due to an excessive number of remote calls
[Hohpe and Woolf, 2003]. Consider for instance the fragment of a service interface to order an item shown in Figure 8.1. Figure 8.1a shows a fine-grained
design for this service that exposes methods to set shipment and billing information for ordering an item. This design is efficient if the methods’ invocation happens in a local environment (e.g., in a software system deployed
on a single machine) [Hohpe and Woolf, 2003]. In a distributed environment
(e.g., in a service-oriented system) a client needs to invoke three methods (i.e.,
setBillingAddress(), setShippingAdress(), and addPriorityShipment()) to set the
needed information. This causes a significant communication overhead since
three methods needs to be invoked over a network.
On the other hand, the coarse-grained OrderItem (shown in Figure 8.1b)
exposes only one method (i.e., setShipmentInfo()) to set all the information
related to the shipment and the billing. In this way clients invoke the service
only once reducing the communication overhead. However, if the services
are too coarse-grained they can limit the reusability because their use will be
limited to very specific contexts [Hohpe and Woolf, 2003; Murer et al., 2010;
Daigneau, 2011]. In our example in Figure 8.1, the clients of the coarsegrained service (Figure 8.1b) are constrained to set the billing address, the
shipping address, and to add the priority shipment details. The service is not
suitable for contexts where, for instance, priority shipments are not allowed.
Maintenance tasks are needed to adapt coarse-grained services to different
contexts. Hence, finding the adequate granularity of a service requires finding
a trade-off between having a too fine-grained or a too coarse-grained service.
152
This allows to publish a service with an acceptable communication overhead
and an adequate level of reusability.
OrderItem
<<FineGrained>>
-setBillingAddress()
-setShippingAddress()
-addPriorityShipment()
(a) Fine-grained version exposing different methods for setting each different needed information.
OrderItem
<<CoarseGrained>>
-setShipmentInfo()
(b) Coarse-grained version exposing a
method to set all the needed information.
Figure 8.1: An example of fine-grained and coarse-grained service interfaces
to set the shipping and the billing data for ordering an item.
To find such an adequate level of granularity the Consumer-Driven Contracts (CDC) has been defined for service interfaces [Daigneau, 2011]. The
CDC pattern states that a service interface should reflect their clients needs
through its granularity. In this way the service interface is considered a contract that satisfies the clients’ requirements.
Applying the CDC pattern is not a trivial task. A service has usually several
clients with different requirements invoking its interface differently. To deploy a service with an adequate granularity (using the CDC pattern) providers
should know all these requirements. Within an enterprise or a corporate environment providers know their clients and they can understand how clients
expect to use a service. However, clients are usually not known a priori and
they bind a service only after it has been published and advertised. Moreover
the number of clients and their different requirements can be huge and change
over time.
8.1.2
Solution
Our solution to the aforementioned problem consists in applying a cluster
analysis. This analysis consists in clustering the set of methods in such a way
that methods in the same cluster are invoked together by the clients. The goal
of our cluster analysis is to find clusters that minimize the number of remote
invocations to a service.
To better understand the cluster analysis for the granularity problem consider the example in Figure 8.2. The OrderItem extends the service shown
153
OrderItem
<<FineGrained>>
1-setBillingAddress()
2-setShippingAddress()
3-setPriorityShipment()
4-addPaymentDetails()
5-addWishCardType()
6-addWishCardMsg()
7-trackShipmentByApp()
8-trackShipmentByEmail()
9-trackShipmentBySMS()
10-notifyArrivalTime()
Client1
Client2
Client3
Client4
Figure 8.2: An example of a service interface to order an item for an ecommerce system. The rectangles represent independent methods that are
invoked by a client.
in Figure 8.1a exposing further methods to 1) add payment details (addPaymentDetails()), 2) add a wish card to an order (addWishCardType() and addWishCardMsg()), and 3) to track the shipment (trackShipmentByApp(), trackShipmentByEmail(), trackShipmentBySMS(), and notifyArrivalTime()). Imagine this service has four clients (Client1, Client2, Client3, and Client4). These
clients invoke different sets of independent methods denoted in Figure 8.2 by
rectangles (e.g., Client1 invokes setBillingAddress(), setShippingAddress(), and
setPriorityShipment()). These methods are considered independent because
the invocation of one method does not require the invocation of the other
ones [Wu et al., 2013]. In total there are 13 remote invocations: 3 performed
by Client1, 3 by Client2, 3 by Client3, and 4 by Client4.
In this example we can retrieve three clusters (shown in Figure 8.3a) that
minimize the number of remote invocations:
• Cluster1 (i.e., Shipment): consists of setBillingAddress(), setShippingAddress(), and setPriorityShipments().
• Cluster2 (i.e., WishCard): consists of addWishCardType() and addWishCardMsg().
• Cluster3 (i.e., TrackShipment): consists of trackShipmentByApp() trackShipmentByEmail(), trackShipmentBySMS() and notifyArrivalTime().
Once we know the clusters we can combine the fine-grained methods belonging to a cluster into a single coarse-grained method. These coarse-grained
methods can be exposed through Façade services [Krafzig et al., 2004] as
154
OrderItem
<<FineGrained>>
5-addWishCardType()
6-addWishCardMsg()
Shipment
<<Cluster1>>
-setShipmentInfo()
Client1
Client2
WishCard
<<Cluster2>>
-setWishCard()
TrackShipment
<<Cluster3>>
Client3
Client4
-setTrackingShip()
(a) The Shipment, WishCard, and TrackShipment have
been introduced. This design has 9 local invocations
and 6 remote invocations.
OrderItem
<<FineGrained>>
5-addWishCardType()
6-addWishCardMsg()
Shipment
<<Cluster1>>
-setShipmentInfo()
Client2
<<Cluster2>>
-setClient2Details()
TrackShipment
<<Cluster3>>
Client1
Client2
Client3
Client4
-setTrackingShip()
(b) The Shipment, Client2, and TrackShipment have
been introduced. This design has 10 local invocations
and 6 remote invocations.
Figure 8.3: Two possible refactorings of the service interface shown in Figure 8.2 using the proposed cluster analysis and using the Façade pattern. Black
arrows indicate local invocations while non-black arrows indicate remote invocations.
155
shown in Figure 8.3a. Façade services (i.e., Shipment, WishCard, and TrackShipment in our example) have been defined to provide different views of
lower level services (i.e., OrderItem in our example). Since the invocations
from Façade services to lower-level services are local invocations (shown with
black arrows in Figure 8.3), the total number of remote invocations (shown
with non-black arrows in Figure 8.3) has been reduced from 9 to 6. Moreover,
adopting this design choice allows to keep public the fine-grained OrderItem
that can be still invoked by current clients without breaking their behavior.
Choosing the clusters that minimize the number of remote invocations
can lead to multiple solutions. Imagine for instance that we change Cluster2
adding the method addPaymentDetails() as shown in Figure 8.3b. This cluster is optimal for Client2 that should perform only one remote invocation.
However, Client3 cannot invoke anymore the Façade service associated to the
Cluster2 because it contains a method (i.e., addPaymentDetails()) in which it
is not interested. The number of remote invocations is still equal to 6. At this
point an engineer should decide which architectural design is more suitable
for her specific domain. The decision might be influenced by three different
factors:
• Cohesion of Façade services: the design in Figure 8.3a might be preferred because the WishCard service is more cohesive than the Client2
service since it exposes related methods (methods related to the wish
card concern).
• Number of local invocations: the design in Figure 8.3a might be preferred because it has 9 local invocations while the design in Figure 8.3b
has 10 local invocations.
• Relevance of different clients: the service provider might want to give
a better service (e.g., upon a higher registration fee) to Client2 and,
hence, adopt the design in Figure 8.3b.
8.1.3
Contributions
In this chapter we propose a search-based approach to retrieve the clusters
of methods that minimize the number of remote invocations. As explained
previously, the methods belonging to the same cluster can be exposed through
a Façade service whose granularity reflects clients’ usage and, hence, satisfies
the CDC pattern.
A first approach to find these clusters consists in adopting brute-force
search techniques. These techniques consist of enumerating all possible clusters and checking whether they minimize the number of invocations. The
156
problem of these approaches is that the number of possible clusters can be
prohibitively large causing a combinatorial explosion. Imagine for instance to
adopt this approach for finding the right granularity of the AmazonEC2 web
service. This web service exposes 118 methods in the version 23. The number
of 20-combinations of the 118 methods in AmazonEC2 are equal to:
118!
118
≈ 2 ∗ 1021
=
20
20!98!
This means that for only evaluating all the clusters with 20 methods the
search will require executing at least 2 ∗ 1021 computer instructions, which
will take several days on a typical PC. Moreover, we should evaluate clusters
with size ranging from 2 to 118 causing the number of computer instructions
to further increase.
To solve this issue we propose a genetic algorithm (shown in Section 8.2)
that mimicking the process of natural selection finds optimal solutions (i.e.,
cluster that minimize the number of remote invocations) in acceptable time
without requiring special hardware configurations (e.g., the use of supercomputers).
Moreover, we perform a first study aimed at investigating the capability
of the proposed approach in finding Façade services that is presented in Section 8.3.
In this chapter we do not cover the problem of mining independent methods because it has already been subject of related work [Wu et al., 2013] that
can be integrated in our approach. Furthermore, related work [Wu et al.,
2013] shows that 78.1% of the methods in their analyzed web services are independent. This percentage shows that most of the methods can be clustered
into coarse-grained methods,further motivating the need of performing this
task with a proper approach.
8.2
The Genetic Algorithm
Genetic Algorithms (GAs) have been used in a wide range of applications
where optimization is required. Among all the applications GAs have been
widely studied to solve clustering problems [Hruschka et al., 2009].
GAs mimic the process of natural selection to provide a search heuristic
able to solve optimization problems. A generic GA is shown in Figure 8.4 and
consists of seven different steps.
In the fist step, the GA creates a set of randomly generated candidate solutions (also known as chromosomes) called population (step 1 in Figure 8.4).
8.2. The Genetic Algorithm
157
Create initial population of
chromosomes
1
2
Evaluate fitness of each
chromosome
3
Max
Evaluations
4
Select next generation
(Selection Operator)
5
Perform reproduction
(Crossover operator)
6
Perform mutation
(Mutation operators)
7
Output
best chromosomes
Figure 8.4: Different steps of a genetic algorithm.
In the second step, the candidate solutions are evaluated through a fitness
function (step 2). This function measures the goodness of a candidate solution. Then, the population is evolved iteratively through evolutionary operators (steps 4, 5, and 6) until some conditions are satisfied (e.g., reaching
the max number of fitness evaluations in step 3 or achievement of the goal).
Each evolution iteration is performed through a selection operator (step 4), a
crossover operator (step 5), and a mutation operator (step 6). The selection
operators selects a pair of solutions (parents) from the population. The parents are used by the crossover operator to generate two offspring solutions
(step 5). The offspring solutions are generated in such a way that combine
features from the two parents. The mutation operators (step 6) mutate the
offspring in order to preserve the diversity. The mutated solutions are added
to the population replacing solutions with the worst fitness scores. Finally, the
GA outputs the best solutions when the evolution process terminates (step 7).
To implement the GA and adapt it to find the set of clusters that minimize the number of remote invocations we have to define the fitness function,
the chromosome (or solution) representation, and the evolutionary operators
(i.e., selection, crossover, and mutation) that are shown in the following subsections.
158
ClientID
Client1
Client2
Client3
Client4
InvokedMethods
1;2;3
4;5;6
5;6;7
7;8;9;10
Table 8.1: Data set containing independent methods invoked by each different
client in Figure 8.2.
8.2.1
Chromosome representation
The chromosomes are represented with a label-based integer encoding widely
adopted in literature [Hruschka et al., 2009] and shown in Figure 8.5. According to this encoding, a solution is represented by an integer array of n positions, where n is the number of methods exposed in a service. Each position
corresponds to a specific method (e.g., position 1 corresponds to the method
setBillingAddress() in Figure 8.2). The integer values in the array represent the
cluster to which the methods belong. For instance in Figure 8.5, the methods
1,2, and 10 belong to the same cluster labeled with 1. Note that two chromosomes can be equivalent even though the clusters are labeled differently.
For instance the chromosomes [1,1,1,1,2,2,2,2,3,3] and [2,2,2,2,3,3,3,3,1,1]
represent the same clusters. To solve this problem we apply the renumbering
procedure as shown in [Falkenauer, 1998] that transforms different labelings
of equivalent clusterings into a unique labeling.
1 2 3 4 5 6 7 8 9 10
1 1 2 3 2 4 5 3 6 1
Figure 8.5: Chromosome representation of our candidate solutions.
8.2.2
Fitness
The fitness function is a function that measures how "good" a solution is. Our
fitness function counts for each chromosome the number of remote invocations needed by the clients. Imagine that the clients’ usage information of
Figure 8.2 are saved in the data set shown in Table 8.1.
In this data set, each row contains the id of the client (i.e., ClientID) and
the set of independent methods invoked by it (i.e., InvokedMethods). The InvokedMethods are sets of methods where each integer value corresponds to a
different method in the service. We label the methods in the OrderItem (shown
in Figure 8.2) from 1 to 10 depending on the order they appear in the service
8.2. The Genetic Algorithm
159
(e.g., setBillingAddress() is labeled with 1, setShippingAddress() is labeled with
2, etc.).
Once we have this data set, we compute the fitness function as the sum of
the number of remote invocations required to invoke each InvokedMethods set
in the data set. If the methods (or a subset of methods) in an InvokedMethods set belong to a cluster containing no other methods, the methods in this
cluster account for 1 invocation in total. Otherwise each different method accounts for 1. Consider for instance the chromosome [1,1,1,1,2,2,2,2,3,3]. This
chromosome clusters together the methods 1, 2, 3, and 4 (i.e., cluster 1), the
methods 5, 6, 7, and 8 (i.e., cluster 2), and the methods 9 and 10 (i.e., cluster
3). In this case the number of remote invocations to execute the InvokedMethods of Client1 (i.e., 1;2;3) is 3 because the cluster 1 contains the method
4 that is not needed by it. Hence, Client1 cannot invoke the Façade service
represented by the cluster labeled 1 and invokes the methods of the original
service OrderItem. If we change the chromosome into [1,1,1,2,2,2,2,2,3,3],
the total number of invocations is equal to 1 because Client1 can execute the
single operation declared in the Façade service represented by the cluster 1.
If the chromosome becomes [1,1,2,2,2,2,2,2,3,3] then the total number of remote invocations is equal to 2. The client invokes once the method of cluster
1 to invoke the methods 1 and 2. Then it invokes method 3 in the original
service.
8.2.3
The Selection Operator
To select the parents we use the Ranked Based Roulette Wheel (RBRW) operator. This operator is a modified roulette wheel selection operator that has been
proposed by Al Jadaan and Rajamani [2008]. RBRW ranks the chromosomes
in the population by the fitness value: the highest rank is assigned to the chromosome with the best fitness value. Hence, the best chromosomes have the
highest probabilities to be selected as parents.
8.2.4
The Crossover Operator
The two parents (ParentA and ParentB) are then used to generate the offspring. The crossover operator is applied with a probability Pc . To perform
the crossover we use the operator defined for clustering problems byHruschka
et al. [2009]. Consider the example shown in Figure 8.6 from [Hruschka
et al., 2009]. The operator first selects randomly k (1≤k≤n) clusters from
ParentA, where n is the number of clusters in ParentA. Assume that the clusters 2 and 3 are selected from ParentA (marked in red in Figure 8.6). The
first child (ChildC) originally is created as copy of the second parent ParentB
(step 1). As second step, the selected clusters (i.e., 2 and 3) are copied into
160
ChildC. Copying these clusters changes the clusters 1, 2, and 3 in ChildC. These
changed clusters are removed from ChildC (step 3) leaving the corresponding
methods unallocated (labeled with 0). In the forth step the unallocated methods are allocated to the cluster with the nearest centroid.
The same procedure is followed to generate the second child ChildD. However, instead of selecting randomly k clusters from ParentB, the changed clusters of ChildC (i.e., 1,2, and 3) are copied into ChildD that is originally a copy
of ParentA.
ParentA
ParentB
1 1 2 3 2 4 5 1 2 5
4 2 1 2 3 3 2 1 2 4
1: copy ParentB into ChildC
ChildC 4 2 1 2 3 3 2 1 2 4
2: copy clusters 2 and 3 from
ParentA to ChildC
ChildC 4 2 2 3 2 3 2 1 2 4
3: remove changed methods
from B (i.e., 1,2,3)
ChildC 4 0 2 3 2 0 0 0 2 4
4: unallocated objects are allocated to randomly
selected clusters
Figure 8.6: Example of crossover operator for clustering problems [Hruschka
et al., 2009].
8.2.5
The Mutation Operators
Finally, the offspring is mutated through the mutation operator with a probability Pm . This step ensures genetic diversity from one generation to the next
ones. We perform the mutation selecting one of the following cluster-oriented
mutation operators (randomly selected) [Falkenauer, 1998; Hruschka et al.,
2009]:
• split: a randomly selected cluster is split into two different clusters. The
methods of the original cluster are randomly assigned to the generated
clusters.
8.3. Study
161
• merge: moves all methods of a randomly selected cluster to another
randomly selected cluster.
• move: moves methods between clusters. Both methods and clusters are
randomly selected.
8.2.6
Implementation
We implemented the proposed genetic algorithm on top of the JMetal2 framework. JMetal is a Java framework that provides state-of-the-art algorithms for
optimization problems. We calibrated the genetic algorithm as follows:
• the population is composed by 100 chromosomes. The initial population
is randomly generated;
• the crossover and mutation probability is 0.9;
• the maximum number of fitness evaluation (step 3 in Figure 8.4) is
100,000.
8.3
Study
The goal of this study is to evaluate the capability of our approach in finding
Façade services that minimize the number of remote invocations and reflect
clients’ usage. The perspective is that of service providers interested in applying
the Consumer-Driven Contracts pattern using Façade services with adequate
granularity. In this study we answer the following research question:
To which extent is the propose GA capable of identifying Façade services that
minimize the number of remote invocations and reflect clients’ usage?
In the following subsections, first, we present the analysis we performed
to answer our research question. Then, we show the results and answer the
research question. Finally, we discuss the results and the threats to validity of
our study.
8.3.1
Analysis
To answer our research question we run the genetic algorithm (GA) defined
in Section 8.2 to find the Façade services for the working example shown in
Figure 8.2. To measure the performance of our GA we register the number
2
http://jmetal.sourceforge.net
162
of GA fitness evaluations needed to find the Façade services shown in Figure 8.3a and Figure 8.3b. Also, we compare the GA with a random search
(RS), in which the solutions are randomly generated but no genetic evolution
is applied. Both the GA and RS are executed 100 times and the number of
fitness evaluations required to find the Façade services are compared through
statistical tests. We use a random search as baseline because this comparison
is considered the first step to evaluate a genetic algorithm [Sivanandam and
Deepa, 2007]. Comparisons with other search-based approaches (e.g., local
search algorithms) will be subject of our future work.
First, we use the Mann-Whitney test to analyze whether there is a significant difference between the number of fitness evaluations required by the
GA and the ones required by the RS. Significant differences are indicated by
Mann-Whitney p-values ≤ 0.01. Then, we use the Cliff’s Delta d effect size to
measure the magnitude of the difference. Cliff’s Delta estimates the probability that a value selected from one group is greater than a value selected from
the other group. Cliff’s Delta ranges between +1 if all selected values from
one group are higher than the selected values in the other group and -1 if the
reverse is true. 0 expresses two overlapping distributions. The effect size is
considered negligible for d < 0.147, small for 0.147≤ d < 0.33, medium for
0.33≤ d < 0.47, and large for d ≥ 0.47. We chose the Mann-Whitney test
and Cliff’s Delta effect size because they do not require assumptions about
the variances and the types of the distributions (i.e., they are non-parametric
tests).
Moreover, to analyze the capability of the GA in finding Façade services for
bigger services, we increase stepwise the number of methods declared in OrderItem keeping unchanged the original methods (i.e., 1-10), their clients, and
the clients’ usage (as shown in Figure 8.2). In this way we enlarge the search
space and we analyze whether the GA is able to find the same Façade services.
For each different size of the OrderItem we perform the same analysis: 1) we
execute 100 times the GA and RS, 2) we register the number of fitness evaluations needed for finding the Façade services shown in Figure 8.3, and 3)
we perform the Mann-Withney and Cliff’s Delta test to analyze statistically the
differences between the distributions. We increment the size of the service up
to 118 methods, that is the size of the biggest WSDL interface (AmazonEC2)
analyzed in our previous work shown in Chapter 4.
8.3.2
Results
Table 8.2 shows the percentage of executions in which GA and RS find the
right Façade services shown in Figure 8.3. The results show that, while the
8.3. Study
163
#Methods
10
11
12
13
14
15
16
118
GA
100%
100%
100%
100%
100%
100%
100%
100%
RS
82%
70%
65%
35%
20%
10%
0%
0%
Table 8.2: Percentage of successful executions in which GA and RS find the
Façade services shown in Figure 8.3.
GA is always capable of finding the Façade services, the capability of the RS
decreases with an increasing number of methods. For services with 16 or more
methods the RS is not capable to find the Façade services.
The number of fitness evaluations required by the GA and RS are shown
in the form of box plots in Figure 8.7. The median number of fitness evaluations for the OrderItem with 118 methods required by the GA (not shown in
Figure 8.7) is equal to 5754 (with a median execution time of 295 seconds3 ).
Comparing it to the median number of fitness evaluations for the service with
10 methods (i.e., 1049 fitness evaluations with a median execution time of
34.5 seconds) shows that GA scales well with an increasing number of methods.
Moreover, the distributions of the number of fitness evaluations required
by the GA and the RS is statistically different as shown by the Mann-Whitney
p-values (<0.01) in Table 8.3. The magnitude of these differences is always
large as shown by Cliff’s Deltas d (=1) in Table 8.3. All the distributions,
except RS12 in Figure 8.7, are not normally distributed (normality has been
tested with the Shapiro test and a confidence level of 0.05). As a consequence
the non-parametric tests used in our analysis are the most suitable for these
distributions.
Based on these results, we can answer our research question stating that
the GA is capable to find Façade services and outperforms the RS approach.
3
Execution times has been evaluated on a MacBook Pro Mid 2010, processor 2.66 GHz
Intel Core i7, memory 4 GB 1067 MHz DDR3, OS 10.8.5.
4e+06
8e+06
0e+00
#Evaluations
164
GA10 RS10 GA11 RS11 GA12 RS12 GA13 RS13 GA14 RS14 GA15 RS15 GA16 RS16
Figure 8.7: Box plots showing the number of fitness evaluations (#Evaluations) required by GA and RS. GAX and RSX label the box plots for the OrderItem with X methods.
#Methods
10
11
12
13
14
15
MW p-value
< 2.2e-16
< 2.2e-16
< 2.2e-16
< 2.2e-16
< 2.2e-16
< 2.2e-16
Cliff d
1
1
1
1
1
1
Table 8.3: Mann-Whitney p-values (MW p-value) and Cliff’s Delta d (Cliff d)
between the distribution of #Evaluations required by the GA and RS.
8.3.3
Discussions
The results of this study show that the proposed GA, differently to the RS,
is capable to assist service providers in applying the Consumer-Driven Contracts pattern. Running the GA, providers can retrieve the Façade services that
reflect the usage of their clients and minimize the number of remote invocations. Once the set of Façade services is retrieved, they should manually
select the most appropriate Façade services as discussed in Section 8.1. These
Façade services can be deployed on top of the existing service without modifying it and preserving the compatibility of existing clients. Furthermore, since
this approach is semi-automatic, it can be executed over time to monitor the
evolution of clients’ usage. This allows service providers to co-evolve the granularity of their services reflecting the evolving usage of their clients.
The main threats to validity that can affect our study are the threats to
external validity. These threats concern the generalization of our findings. We
8.4. Related Work
165
evaluated our approach with a small working example. However, to best of
our knowledge, there are no available data sets that contain service usage
information suitable for our analysis. In literature different data sets are available for research on QoS (e.g., [Al-Masri and Mahmoud, 2008; Zhang et al.,
2010]). However, these data sets do not contain information about the operations invoked but only the service names and their url. As a consequence they
are not suitable for our analysis.
8.4
Related Work
Granularity of services. The closest work to ours is the study developed by
Jiang et al. [2011]. In this study the authors propose an approach to infer
the granularity of services by mining the activities of business processes. The
main idea consists of using frequent pattern mining algorithms to analyze
the invocations to service interfaces. Our approach differs to theirs because
it can mine the granularity of every kind of services and not only services
involved in business processes. Furthermore, we have not used the proposed
frequent pattern mining algorithm because they require a special tuning of the
support and confidence parameters that are problem specific. Moreover, these
parameters, together with other relevant details, are not reported in [Jiang
et al., 2011] making the replication of this study not possible. To the best of
our knowledge we are not aware of further studies aimed a inferring the right
granularity of service interfaces.
Related work have mostly proposed classifications for different levels of
granularity and have investigated metrics for measuring the granularity. Haesen et al. [2008] have proposed a classification of three service granularity
types (i.e., functionality, data, and business value granularity). For each of
these types they have discussed the impact on a set of architectural attributes
(e.g., performance, reusability and flexibility). In this chapter we adhered to
their functionality granularity that has been referred to as granularity for the
sake of simplicity. Haesen et al. confirm that the functionality granularity
can have an impact on both performance and reusability as stated in [Hohpe
and Woolf, 2003; Murer et al., 2010; Daigneau, 2011] and already discussed
in Section 8.1. Many other studies have investigated metrics to measure the
granularity (e.g., [Khoshkbarforoushha et al., 2010; Alahmari et al., 2011]).
For instance, Khoshkbarforoushha et al. [2010] measure the granularity appropriateness with a model that integrates four different metrics that measure:
1) the business value of a service, 2) the service reusability, 3) the service
context-independency, and 4) the service complexity. Alahmari et al. [2011]
proposed a set of metrics to measure the granularity based on internal struc-
166
tural attributes (e.g., number of operations, number of messages, complexity
of data types). However, these studies are limited to measure the granularity
and do not provide suggestions on inferring the right granularity.
Refactoring through genetic algorithms. Over the last years genetic algorithms, and in general search based algorithms, have become popular to
perform refactorings of software artifacts. For instance, Ghannem et al. [2013]
found appropriate refactoring suggestions using a set of refactoring examples.
Their approach is based on an Interactive Genetic Algorithm which enables
to interact with users and integrate their feedbacks into a classic GA. Ghaith
and Cinnéide [2012] presented an approach to automate improvements of
software security based on search-based refactoring. O’Keeffe and í Cinnéide
[2008] have constructed a software tool capable of refactoring object-oriented
systems. This tool uses search-based techniques to conform the design of a
system to a given design quality model. These studies confirm that genetic
algorithms are a useful technique to solve refactoring problems and satisfying
desired quality attributes.
8.5
In this chapter we have proposed a genetic algorithm to mine the adequate
granularity of a service interface. According to the Consumer-Driven Contracts
pattern, the granularity of a service should reflect its clients’ usage. To adopt
this pattern our genetic algorithm suggests Façade services whose granularity
reflect the clients’ usage. These services can be deployed on top of existing services allowing an easy adaptation of the Consumer-Driven Contracts pattern
that does not require any modifications to existing services.
Our approach is semi-automatic as discussed in Section 8.1. The genetic
algorithm outputs different sets of Façade services that should be reviewed by
providers. In our future work, first, we plan to further improve this approach
to minimize the effort required from the user. Specifically, we plan to add parameters that can guide the search algorithm towards more detailed goals:
giving more relevance to certain clients, satisfying other quality attributes
(e.g., high cohesion of Façade services, low number of local invocations), etc.
Then, we plan to compare our genetic algorithm with other search-based techniques (e.g., local search algorithms). Finally, we plan to improve the genetic
algorithm suggesting overlapping Façade services that allow a method to belong to different Façade services. However, an ad-hoc study is needed to investigate to which extent the methods can be exposed through different Façade
services because it can be problematic for the maintenance of service-oriented
systems.
.
9.
Conclusion
The need of reusing existing software components has caused the emergence
of a new programming paradigm called service-oriented. According to service
orientation, existing software systems (e.g., legacy systems) can be integrated
with web services. The main goal of web services is to provide a standardized
API that hide the technologies used to implement the legacy system. Other
systems can reuse the business logic of legacy systems without knowing their
implementation details and only binding these APIs. As a consequence the
coupling between integrated systems is reduced as discussed in Chapter 1.
However, such systems are still coupled through web APIs that specify the
operations exposed by web services and the data structures needed to invoke
them. These web APIs are considered contracts between web service providers
and their clients and they should stay as stable as possible. Changes in the
web APIs can lead the client systems to be broken and their business can be
damaged.
In this dissertation we have focused on better understanding the changeproneness of APIs and web APIs. Specifically to that end, this work has investigated which indicators can be used to highlight change-prone APIs and web
APIs providing approaches to assist practitioners in refactoring them.
9.1
Contributions
The main contributions of this thesis can be summarized as follows:
• An external cohesion metric (i.e., IUC) capable of highlighting changeprone Java interfaces. We performed an empirical study aimed at investigating which software metrics can be used to highlight change-prone
Java interfaces. We compared the capability of existing software metrics defined for object oriented and service oriented systems. Software
metrics have been measured along the history of a software system and
they have been correlated through statistical tests with the changes per167
168
Chapter 9. Conclusion
formed in the analyzed systems. These metrics have also been used to
train prediction models aimed at predicting change-prone Java interfaces. The results of this study are useful for software engineers and
software researchers. Software engineers can better measure the stability of their interfaces. This helps them in highlighting change-prone
interfaces before they are bound to web APIs. Software researchers can
use this first study to further investigate the change-proneness of Java
interfaces and more in general of APIs.
• A set of antipatterns (i.e., ComplexClass, SpaghettiCode, and SwissArmyKnife)
that highlight change-prone Java APIs. We performed an empirical study
aimed at investigating which antipatterns can be used as indicators of
changes in Java classes. We investigated which antipatterns are more
likely to lead to changes and which types of changes are likely to appear
in Java classes affected by certain types of antipatterns. Among other
types of changes, we investigated changes that APIs undergo along their
history and which antipatterns are more likely to cause these changes.
As in the previous contribution, we measured the presence of antipatterns along the history of software systems and we statistically correlated them with the number and type of changes performed in the software systems. The perspective of this study is that of software engineers
who want to estimate the stability of Java classes that participate in certain antipatterns. Among thess antipatterns, they might be interested in
antipatterns that cause changes to APIs. This is particularly relevant if
APIs are bound to web APIs.
• An approach to mine dynamic dependencies among web services deployed in an enterprise. To that end, we used the vector clocks technique originally conceived to order events in a distributed environment.
We used this technique in the domain of web service systems by attaching the vector clocks to the header of SOAP messages. We modified the
vector clocks’ value along the execution of a service oriented system and
we used them to order service executions and to infer causal dependencies among the executions. The implementation of this approach is
portable and it relies on well known integration patterns. Moreover, we
analyzed the impact of the attached vector clocks on the performance
of a service oriented systems. This approach is useful for software engineers who want to monitor the dynamic chain of dependencies among
web services that might be useful for debugging and reverse engineering
tasks.
9.1. Contributions
169
• A tool called WSDLDiff that extracts fine-grained changes between different versions of WSDL APIs. Differently to existing approaches, our
tool takes into account the syntax of WSDL and XSD that are used to
define operations and data structures in a WSDL API. This tool is useful
for web service subscribers and researchers. WSDLDiff can be used by
subscribers who want to analyze which elements are frequently added,
changed, and removed in a WSDL API and which types of changes a
WSDL API undergoes more frequently. Based on this information they
can subscribe to the most stable WSDL APIs reducing the likelihood that
unstable APIs might break their systems. Researchers can use our tool
to further investigate the change-proneness of WSDL APIs retrieving automatically the fine-grained changes performed along their history.
• A set of maintenance scenarios that can affect web APIs with low internal end external cohesion. We performed an empirical study aimed at investigating the impact of internal and external cohesion on the changeproneness of web APIs. This analysis is performed using a mixed-method
approach. First, we used an online survey to investigate the interface,
method, and data-type level change-proneness of web APIs with low
external and internal cohesion. The survey reports on maintenance scenarios that are likely to cause changes in such web APIs. Then, we analyzed the history of ten well known WSDL APIs to investigate the impact
of internal cohesion on the change-proneness. Specifically, we introduced a new internal cohesion metric (DTC) and we correlated statistically the values of this metric with the fine-grained changes extracted
with WSDLDiff from the WSDL APIs under analysis. The perspective of
this study is that of web service providers, subscribers, and software researchers. Both, web service providers and subscribers, can benefit from
the new metric to estimate the interface change-proneness of a WSDL
API. Based on the values of the DTC metric, subscribers can subscribe
to the most internally cohesive WSDL API to avoid that changes can
break their systems. Providers can highlight WSDL APIs that should be
refactored to avoid frequent changes. Moreover, they can estimate the
change-proneness based on the value of external cohesion metrics (e.g.,
SIUC). Based on the internal and external cohesion, they can estimate
the likelihood that certain maintenance scenario can cause changes to
their APIs. Software researchers can also benefit from this first study on
change-prone web APIs to further investigate the change-proneness of
web APIs.
• An approach to automatically refactor APIs with low external cohesion,
170
also knows as fat APIs. We propose an approach to split fat APIs according to the Interface Segregation Principle (ISP). We defined the problem
of splitting fat APIs as a multi-objective clustering optimization problem
and we proposed a genetic algorithm to solve it. Based on the client
usage of a fat API, the genetic algorithm infers the APIs into which it
should be split to conform to the ISP and, hence, showing a higher external cohesion. To validate the genetic algorithm we mined the clients’
usage of 42,318 public Java APIs from the Maven repositories. We compared the capability of the genetic algorithm with the capabilities of
other search-based techniques, namely a random approach and a multiobjective simulated annealing approach. The genetic algorithm is useful
for software engineers who want to refactor APIs and web APIs with low
external cohesion that is symptomatic of change-prone APIs.
• An approach to refactor chatty web APIs. We proposed a genetic algorithm that assists web service providers in finding the right granularity
for their web APIs. Based on the clients’ usage of a web API, the genetic
algorithm mines the Façade APIs that reflect the usage of the different clients according to the Consumer-Driven Contracts (CDC) patterns.
These Façade APIs cluster together methods that are invoked together by
the different clients reducing the number of remote invocations. Façade
APIs can be deployed on top of the original APIs with which they communicate locally. As a consequence, the remote chattiness is reduced.
9.2
The Research Questions Revisited
In Chapter 1 we have formulated a set of high level research questions whose
answers can be found in the studies presented in the chapters of this dissertation. In this section we answer these high level research questions based
on the findings of our studies and we discuss them in the context of this PhD
thesis.
9.2.1
Track 1: Change-Prone APIs
The first track of this PhD research was aimed at investigating indicators of
changes for APIs that might be bound to web APIs.
Research Question 1: Which software metrics do indicate change-prone APIs?
This research question can be answered from the results of the study presented in Chapter 2. In this chapter we investigated the change-proneness
9.2. The Research Questions Revisited
171
of Java interfaces correlating the values of software metrics with the number of fine-grained changes performed in Java interfaces. As software metrics
we selected the popular C&K metrics, already used to highlight change-prone
Java classes, and a set of complexity and usage metrics defined for interfaces.
The fine-grained changes have been extracted with ChangeDistiller mining the
software repositories of 10 well known open source Java projects. The results have shown that the Interface Usage Cohesion (IUC) metric exhibits the
strongest correlation with the number of changes performed in Java interfaces.
As a consequence, software engineers should design interfaces with high external cohesion (measured with the IUC metric) to avoid frequent changes.
Low external cohesion is also known as symptom of the violation of the Interface Segregation Principle (ISP). This principle was already popular amongst
software practitioners before our study. However, our study provides a first
empirical evidence of the effects of the ISP on the stability of interfaces.
In the second part of this study we used prediction models (i.e., Support
Vector Machine, Naive Bayes Network, and Neural Nets) to predict change-prone
Java interfaces. First, we trained these models with object oriented metrics
that showed the highest correlation with the number of fine-grained changes
namely, CB0, RFC, LCOM, and WMC. Then, we added the IUC metric to these
metrics. The results showed that when adding the IUC metric the precision
and recall increased.
Based on these results, we can answer Research Question 1 stating that the
IUC metric is the best metric in highlighting change-prone Java interfaces as
far as our research showed. This indicates that external cohesion is a required
quality attribute to design stable interfaces. Interestingly low external cohesion highlights also change-prone web APIs as it has been found in the study
shown in Chapter 6. This result suggests that the clients usage should be taken
into account when we expose operations through an API.
Research Question 2: What is the impact of antipatterns on the
change-proneness of APIs?
Previous studies have already shown the impact of antipatterns on the
change-proneness of software artifacts. In the context of this PhD research we
wanted to investigate whether antipatterns impact also the change-proneness
of APIs. We can answer Research Question 2 with the results of the study
reported in Chapter 3. In Chapter 3 we performed an empirical study to investigate 1) the impact of certain antipatterns on change-proneness and 2) the
frequency of appearance of certain type of fine-grained changes in Java classes
affected by certain antipatterns.
172
The fine-grained changes have been extracted with ChangeDistiller from
the repositories of 16 Java open source projects. These changes have been
clustered in 5 different categories depending on the entity of the change.
Among these categories we defined a category that includes all changes performed on APIs (e.g., method renaming, changes of parameters, changes of
return types). Besides extracting the changes performed in each class, we detected the list of antipatterns affecting each class with the DECOR tool [Moha
et al., 2008a,b, 2010].
Based on this extracted data, we correlated the presence of certain antipatterns with the frequency of certain types of changes. We showed empirically
that changes to APIs are more likely to appear if APIs are affected by the ComplexClass, SpaghettiCode, and SwissArmyKnife antipatterns. These results allow
us to answer Research Question 2 stating that these antipatterns have a greater
impact on the change-proneness of APIs in the analyzed systems. Together
with the results of the study showed in Chapter 2, they provide heuristics to
detect change-prone APIs. If these APIs are made available through web services, engineers should resolve these antipatterns, and assure high external
cohesion, to avoid frequent changes in the future.
9.2.2
Track 2: Change-Prone Web APIs
In the second track of this PhD research we focused on change-proneness of
web APIs. First, we defined two approaches to analyze service oriented systems. Then, we analyzed the change-proneness answering the research questions reported below.
Research Question 3: How can we extract fine-grained changes among
subsequent versions of web APIs?
In Chapter 4 we have proposed the WSDLDiff tool to extract fine-grained
changes from the history of WSDL APIs. The tool has been implemented on
top of the Eclipse Modeling Framework (EMF). This framework allows to parse
WSDL APIs into standardized models (i.e., ecore models) that can be compared through the Matching and Differencing engines. Differently to previous
work, our tool takes into account the syntax of WSDL and XSD languages and
outputs the elements affected by a change (e.g., XSDElement, WSDLMessage)
and the type of change (i.e., addition, deletion, and modification).
In our first study (shown in Chapter 4) we used WSDLDiff to analyze the
evolution of four well known public WSDL APIs. The changes extracted in
this study showed that WSDL APIs evolve differently and they do change frequently. This result further motivated us to investigate the change-proneness
9.2. The Research Questions Revisited
173
of web APIs. WSDLDiff is a useful tool that can help web service subscribers
in analyzing which elements are frequently added, removed, and changed in
a WSDL API. Based on this information they can subscribe to the most stable
WSDL API to avoid to continuously adapt their clients to new versions of a
WSDL API. Researchers can benefit from this tool to further investigate the
evolution of WSDL APIs.
We can answer Research Question 3 stating that EMF provides a framework
suitable to extract changes between different versions of a WSDL API.
Research Question 4: How can we mine the full chain of dynamic
dependencies among web services?
We can answer Research Question 4 based on the study presented in Chapter 5. We have reported on an approach to extract dynamic dependencies
among web services based on the vector clocks. We provided a non-intrusive,
easy-to-implement, and portable implementation that relies on the well known
Pipes and Filters integration pattern. As a consequence this approach can be
implemented in many enterprise service buses and web service frameworks
such as Apache Axis2, Apache CXF, and MuleESB.
This approach consists in attaching vector clocks to the header of SOAP
messages. When a web service is invoked, the vector clock is captured and updated storing information about the invoked web service. Along the execution
of a service oriented system the vector clock stores the chains of invocations
that can be viewed at run-time or at a later time. This approach is particularly
useful to reverse engineering and debugging service oriented systems. A first
analysis of the overhead due to this approach showed that extra overhead is
negligible.
To summarize we can answer Research Question 4 stating that the existing
technique of vector clocks can be used to retrieve dependencies among web
services and its overhead is negligible.
Research Question 5: What are the scenarios in which developers change web
APIs with low internal and external cohesion?
In Chapter 6 we have presented a mixed-method approach to investigate
the change-proneness of web APIs with low internal and low external cohesion. The survey we performed gives insights into the maintenance scenarios
that can lead such web APIs to change. Specifically, we can state that low externally cohesive web APIs change frequently to 1) improve understandability
174
and 2) ease maintainability and reduce clones in the APIs. Low internally
cohesive web APIs change frequently to 1) reduce the impact of changes on
the many clients they have, 2) avoid that all the clients lead the APIs to be
changed frequently, and 3) improve understandability.
We complemented the finding of our survey performing a quantitative
analysis of low internally cohesive web APIs. First, we defined a new internal cohesion metric (DTC) to measure properly the internal cohesion. Then,
we correlated the values of DTC with the number of changes performed in ten
well known WSDL APIs. The changes have been extracted with our WSDLDiff tool presented in Chapter 4. The results confirm that web APIs with low
internal cohesion are more change-prone than internally cohesive web APIs.
9.2.3
Track 3: Refactoring Web APIs
The last track of this dissertation is dedicated to approaches to refactor changeprone web APIs.
Research Question 6: Which search based techniques can be used to apply the
Interface Segregation Principle?
Both studies presented in Chapter 2 and Chapter 6 showed that low internally cohesive APIs and web APIs are more change-prone. To refactor such
APIs we presented an approach to apply the Interface Segregation Principle
(ISP) in Chapter 7. We defined the problem of splitting fat APIs into smaller
APIs specific for each client (i.e., ISP) as a multi-objective clustering optimization problem. To solve this problem we used two state-of-the art search based
approaches namely, a genetic algorithm and a simulated annealing algorithm.
The results of this study showed that the genetic algorithm is able to infer
more externally cohesive APIs for 42,318 public APIs whose usage has been
mined from the Maven repositories.
This approach is useful for API and web API providers. To use our genetic
algorithm API providers should monitor how their clients invoke their API.
This data is then used by the genetic algorithm to split the API into smaller
APIs accordingly to the ISP.
Research Question 7: Which search based techniques can transform a
fine-grained APIs into multiple coarse-grained APIs reducing the total number of
remote invocations?
As discussed in Section 1.1.3 fine-grained web APIs should be refactored
into coarse-grained web APIs to avoid performance problems. In Chapter 8
9.3. Recommendations for Future Work
175
we defined a genetic algorithm to infer coarse-grained Façade APIs from the
clients usage of a fine-grained API. The genetic algorithm looks for Façade
APIs that cluster together the fine-grained methods of the original API. Finegrained methods are clustered into a single coarse-grained method if they are
invoked consecutively by the clients. In this way the clients can invoke the
coarse-grained methods in the Façade APIs reducing the number of remote
invocations. A first study showed that the genetic algorithm outperforms the
random search technique and is always able to suggest the right Façade APIs
for the working example shown in Chapter 8. The capability of the random
approach decreases with larger fine-grained APIs.
This approach can be used every time there is the need to reduce the
chattiness of web APIs. In such cases the Façade APIs retrieved by the genetic
algorithm can be deployed on top of the original APIs. This allows the clients
to interact with the APIs with less invocations while keeping the original APIs.
9.3
Recommendations for Future Work
The work presented in this dissertation provides relevant insights into the
change-proneness of web APIs. However, this is only a first step in this area
of research that certainly needs to be incrementally enriched and revised. In
this section we present the recommendations for future work for each of the
different tracks of this PhD project.
To investigate the change-proneness of APIs we have performed quantitative studies. These studies provide statistical evidence of heuristics to highlight change-prone APIs. This track should be enriched performing qualitative
analyses. These qualitative analyses should include questionnaires, surveys,
interviews allowing developers and engineers to further refine our findings.
Moreover, it is desirable to perform a more extended quantitative analysis that
analyzes software systems implemented in different programming languages
and paradigms also including commercial software systems.
The recommendations for the future work of Track 2 are threefold. First,
a quantitative analysis of the change-prone externally cohesive web APIs is
desirable. This analysis should refine and revise the insights we collected in
our survey. However, performing this analysis requires access to the clients
usage of web APIs that might not be publicly available. As a consequence, this
analysis should be performed in an industrial environment where this data is
available.
A second important step to understand why web APIs change over time is
understanding their purpose. Track 2 should be extended taking into account
176
the web service typologies. Heuristics to classify web services into different
typologies, as suggested by Krafzig et al. [Krafzig et al., 2004], should be
defined. We expect that some web service typologies change less frequently
and for different reasons than others. For instance, the web API of a web
service that is meant to bridge a technological gap would change only when
the bridged technologies change. On the other hand, the interface of a web
service that provides search functionalities can change every time that the
search criterion changes.
To automatically classify web services we can analyze two sources of information. First, we can analyze the documentations that are usually available in
natural language and published on websites. For instance, Google Maps web
services are documented on their website.1 The second source of information
consists of the web API that is composed of: 1) method declarations, 2) data
types needed to invoke the methods and to retrieve the results, and 3) comments to ease the comprehension of a service interface. To obtain relevant
information from these two sources future work should be based on information retrieval techniques, widely used in the software engineering community
for similar purposes.
Finally, future work should investigate the change-proneness of REST APIs
separately. In this dissertation we have focused on RPC APIs such as WSDL
APIs. As discussed in Chapter 1 REST APIs are different because they are
Resource APIs that expose resources through HTTP as application protocol. As
consequence, the operations they expose are fixed but the resource itself can
change. Dedicated studies to investigate the change-proneness of resources is
desirable to understand why REST APIs change.
As future work of Track 3 both the genetic algorithms presented in Chapter 7 and Chapter 8 can be further improved. For instance, the sub-APIs generated by the genetic algorithm Chapter 7 exposes disjoint sets of methods.
These sub-APIs might expose overlapping sets of methods to show higher values of external cohesion. However, this causes the introduction of clones and
further studies are needed to investigate how they impact other quality attributes such as maintainability.
9.4
Concluding Remarks
The work presented in this dissertation was aimed at investigating the changeproneness of APIs and web APIs. This work by no means covers all the aspects
of change-prone APIs and web APIs nor provides a complete guideline on de1
https://developers.google.com/maps/documentation/webservices/
9.4. Concluding Remarks
177
signing stable APIs and web APIs. However, we advanced the state-of-the-art
in 1) validating software metrics (i.e., internal and external cohesion) that
highlight change-prone APIs and web APIs, 2) analyzing service oriented systems, and 3) refactoring fine-grained and low externally cohesive APIs and
web APIs. Our contributions are aimed at giving new insights into the changeproneness of APIs and web APIs that allow the research community to further
advance and refine our findings.
Bibliography
Marwen Abbes, Foutse Khomh, Yann-Gaël Guéhéneuc, and Giuliano Antoniol.
An empirical study of the impact of two antipatterns, blob and spaghetti
code, on program comprehension. In Tom Mens, Yiannis Kanellopoulos, and
Andreas Winter, editors, CSMR, 15th European Conference on Software Maintenance and Reengineering, pages 181–190. IEEE Computer Society, 2011.
Hani Abdeen, Houari A. Sahraoui, and Osama Shata. How we design interfaces, and how to assess it. In ICSM, pages 80–89, 2013.
Omar Al Jadaan and Lakishmi Rajamani. Improved selection operator for ga.
Journal of Theoretical and Applied Information Technology, 4(4), 2008.
Eyhab Al-Masri and Qusay H. Mahmoud. Investigating web services on the
world wide web. WWW, pages 795–804, New York, NY, USA, 2008. ACM.
ISBN 978-1-60558-085-2.
Saad Alahmari, Ed Zaluska, and David C. De Roure. A metrics framework for
evaluating soa service granularity. SCC, pages 512–519, Washington, DC,
USA, 2011. ISBN 978-0-7695-4462-5.
Gustavo Alonso, Fabio Casati, Harumi Kuno, and Vijay Machiraju. Web Services: Concepts, Architectures and Applications. Springer Publishing Company, Incorporated, 1st edition, 2010. ISBN 3642078885, 9783642078880.
Mohammad Alshayeb and Wei Li. An empirical validation of object-oriented
metrics in two different iterative software processes. Transactions on Software Engineering, 29:1043–1049, November 2003.
179
180
BIBLIOGRAPHY
Lerina Aversano, Marcello Bruno, Massimiliano Di Penta, Amedeo Falanga,
and Rita Scognamiglio. Visualizing the evolution of web services using formal concept analysis. In IWPSE, pages 57–60, 2005.
Jagdish Bansiya and Carl G. Davis. A hierarchical model for object-oriented
design quality assessment. IEEE Trans. Softw. Eng., 28(1):4–17, January
2002. ISSN 0098-5589.
I. Barker. What is information architecture? URL http://www.steptwo.
com.au.
Victor R. Basili, Lionel C. Briand, and Walcélio L. Melo. A validation of objectoriented design metrics as quality indicators. IEEE Trans. Software Eng., 22
(10):751–761, 1996.
Sujoy Basu, Fabio Casati, and Florian Daniel. Toward web service dependency
discovery for soa management. In Proceedings of the 2008 IEEE International
Conference on Services Computing - Volume 2, pages 422–429, Washington,
DC, USA, 2008. IEEE Computer Society. ISBN 978-0-7695-3283-7-02.
Abraham Bernstein, Jayalath Ekanayake, and Martin Pinzger. Improving defect prediction using temporal features and non linear models. In Ninth international workshop on Principles of software evolution: in conjunction with
the 6th ESEC/FSE joint meeting, IWPSE ’07, pages 11–18, New York, NY,
USA, 2007. ACM. ISBN 978-1-59593-722-3.
Shawn A. Bohner. Software change impacts - an evolving perspective. In ICSM,
pages 263–272, 2002.
Bart Du Bois, Serge Demeyer, Jan Verelst, Tom Mens, and Marijn Temmerman.
Does god class decomposition affect comprehensibility? In Proceedings of the
IASTED International Conference on Software Engineering, pages 346–355.
IASTED/ACTA Press, 2006.
Eric Bouwers, Arie van Deursen, and Joost Visser. Evaluating usefulness of
software metrics: an industrial experience report. In Proceedings of the International Conference on Software Engineering, pages 921–930, 2013.
Marcus A. S. Boxall and Saeed Araban. Interface metrics for reusability analysis of components. In Proceedings of the 2004 Australian Software Engineering Conference, ASWEC ’04, pages 40–, Washington, DC, USA, 2004. IEEE
Computer Society. ISBN 0-7695-2089-8.
BIBLIOGRAPHY
181
Lionel Briand, Walcelio Melo, and Juergen Wuest. Assessing the applicability
of fault-proneness models across object-oriented software projects. IEEE
Trans. Softw. Eng., 28:706–720, July 2002.
Lionel C. Briand, John W. Daly, and Jürgen Wüst. A unified framework for
cohesion measurement in object-oriented systems. Empirical Software Engineering, 3(1):65–117, July 1998. ISSN 1382-3256.
Lionel C. Briand, Yvan Labiche, and Johanne Leduc. Toward the reverse engineering of uml sequence diagrams for distributed java software. IEEE Trans.
Softw. Eng., 32:642–663, September 2006. ISSN 0098-5589.
Peter F Brown and Rebekah Metz Booz Allen Hamilton. Reference model for
service oriented architecture 1.0, 2006.
William J. Brown, Raphael C. Malveau, Hays W. McCormikk III, and T.J. Mowbray. Anti Patterns: Refactoring Software, Architectures, and Projects in Crisis.
Wiley, 1998.
Cedric Brun and Alfonso Pierantonio. Model differences in the eclipse modelling framework. UPGRADE The European Journal for the Informatics Professional, IX:29–34, 2008.
John Businge. Co-evolution of the eclipse SDK framework and its third-party
plug-ins. In 17th European Conference on Software Maintenance and Reengineering, CSMR 2013, Genova, Italy, March 5-8, 2013, pages 427–430, 2013.
John Businge, Alexander Serebrenik, and Mark van den Brand. An empirical study of the evolution of eclipse third-party plug-ins. In Proceedings
of the Joint ERCIM Workshop on Software Evolution (EVOL) and International Workshop on Principles of Software Evolution (IWPSE), IWPSE-EVOL
’10, pages 63–72, New York, NY, USA, 2010. ACM. ISBN 978-1-4503-01282.
John Businge, Alexander Serebrenik, and Mark van den Brand. Analyzing
the eclipse API usage: Putting the developer in the loop. In 17th European
Conference on Software Maintenance and Reengineering, CSMR 2013, Genova,
Italy, March 5-8, 2013, pages 37–46, 2013.
Gerardo Canfora, Michele Ceccarelli, Luigi Cerulo, and Massimiliano Di Penta.
Using multivariate time series and association rules to detect logical change
coupling: An empirical study. In Proceedings of the 2010 IEEE International
Conference on Software Maintenance, ICSM ’10, pages 1–10, Washington,
DC, USA, 2010. IEEE Computer Society. ISBN 978-1-4244-8630-4.
182
BIBLIOGRAPHY
Luba Cherbakov, Mamdouh Ibrahim, and Jenny Ang. Soa antipatterns: the
obstacles to the adoption and successful realization of service-oriented
architecture, 2006.
URL http://www.ibm.com/developerworks/
webservices/library/ws-antipatterns/.
Shyam R. Chidamber and Chris F. Kemerer. Towards a metrics suite for object
oriented design. In Proceedings of the Conference on Object-Oriented Programming Systems, Languages, and Applications, pages 197–211, 1991.
Shyam R. Chidamber and Chris F. Kemerer. A metrics suite for object oriented
design. Transactions on Software Engineering, 20(6):476–493, June 1994.
ISSN 0098-5589.
James O. Coplien and Neil B. Harrison. Organizational Patterns of Agile Software Development. Prentice-Hall, Upper Saddle River, NJ (2005), 1st edition, 2005.
Steve Counsell, Stephen Swift, and Jason Crampton. The interpretation and
utility of three cohesion metrics for object-oriented design. Transactions on
Software Engineering and Methodology, 15(2):123–149, April 2006. ISSN
1049-331X.
John W. Creswell and Vicki L.P. Clark. Designing and Conducting Mixed Methods
Research. SAGE Publications, 2010. ISBN 9781412975179.
Robert Daigneau. Service Design Patterns: Fundamental Design Solutions for
SOAP/WSDL and RESTful Web Services. Pearson Education, 2011. ISBN
032154420X.
Kalyanmoy Deb, Samir Agrawal, Amrit Pratap, and T. Meyarivan. A fast elitist
non-dominated sorting genetic algorithm for multi-objective optimisation:
Nsga-ii. In PPSN, volume 1917, pages 849–858, 2000. ISBN 3-540-410562.
Karim Dhambri, Houari Sahraoui, and Pierre Poulin. Visual detection of design anomalies. In Proceedings of the 12 th European Conference on Software
Maintenance and Reengineering, Tampere, Finland, pages 279–283. IEEE CS
Press, April 2008.
Danny Dig and Ralph E. Johnson. How do apis evolve? a story of refactoring.
Journal of Software Maintenance, 18(2):83–107, 2006.
BIBLIOGRAPHY
183
Bill Dudney, Joseph Krozak, Kevin Wittkopf, Stephen Asbury, and David Osborne. J2EE Antipatterns. John Wiley & Sons, Inc., New York, NY, USA, 1
edition, 2002. ISBN 0471146153.
Mahmoud O. Elish and Mojeeb-Al-Rahman Al-Khiaty. A suite of metrics for
quantifying historical changes to predict future change-prone classes in
object-oriented software. Journal of Software: Evolution and Process, 25
(5):407–437, 2013.
Thomas Erl. SOA Principles of Service Design (The Prentice Hall Service-Oriented
Computing Series from Thomas Erl). Prentice Hall PTR, Upper Saddle River,
NJ, USA, 2007.
Len Erlikh. Leveraging legacy system dollars for e-business. IT Professional, 2
(3):17 –23, may/jun 2000. ISSN 1520-9202.
Emanuel Falkenauer. Genetic Algorithms and Grouping Problems. John Wiley
& Sons, Inc., New York, NY, USA, 1998. ISBN 0471971502.
Zaiwen Feng, Keqing He, Rong Peng, and Yutao Ma. Taxonomy for evolution
of service-based system. In SERVICES, pages 331–338, 2011.
Colin J. Fidge. Timestamps in message-passing systems that preserve partial
ordering. In Proceedings of the 11th Australian Computer Science Conference,
pages 56–66, 1988.
Roy Thomas Fielding. Architectural Styles and the Design of Network-based
Software Architectures. PhD thesis, 2000. AAI9980887.
Jr. Floyd J. Fowler. Survey Research Methods (4th ed.). SAGE Publications, Inc.,
0 edition, 2009.
Beat Fluri and Harald C. Gall. Classifying change types for qualifying change
couplings. In Proceedings of the 14th IEEE International Conference on Program Comprehension, ICPC ’06, pages 35–45, Washington, DC, USA, 2006.
IEEE Computer Society. ISBN 0-7695-2601-2.
Beat Fluri, Michael Wuersch, Martin PInzger, and Harald Gall. Change distilling: Tree differencing for fine-grained source code change extraction. IEEE
Trans. Softw. Eng., 33:725–743, November 2007.
Marios Fokaefs, Rimon Mikhaiel, Nikolaos Tsantalis, Eleni Stroulia, and Alex
Lau. An empirical study on web service evolution. In Proceedings of the
International Conference on Web Services, pages 49–56, 2011.
184
BIBLIOGRAPHY
Martin Fowler. Refactoring – Improving the Design of Existing Code. AddisonWesley, 1st edition, June 1999. ISBN 0-201-48567-2.
Harald C. Gall, Beat Fluri, and Martin Pinzger. Change analysis with evolizer
and changedistiller. IEEE Softw., 26:26–33, January 2009. ISSN 0740-7459.
Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design patterns: elements of reusable object-oriented software. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1995. ISBN 0-201-63361-2.
Shadi Ghaith and Mel Ó Cinnéide. Improving software security using searchbased refactoring. In SSBSE, pages 121–135, 2012.
Adnane Ghannem, Ghizlane El-Boussaidi, and Marouane Kessentini. Model
refactoring using interactive genetic algorithm. In SSBSE, pages 96–110,
2013.
Emanuel Giger, Martin Pinzger, and Harald C. Gall. Comparing fine-grained
source code changes and code churn for bug prediction. In Proceedings of
the 8th Working Conference on Mining Software Repositories, MSR ’11, pages
83–92, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0574-7.
Tudor Girba, Stéphane Ducasse, and Michele Lanza. Yesterday’s weather:
Guiding early reverse engineering efforts by summarizing the evolution of
changes. In Proceedings of the International Conference on Software Maintenance, pages 40–49, 2004.
David M. Green and John A. Swets. Signal detection theory and psychophysics,
volume 1. Wiley, 1966.
Robert J. Grissom and John J. Kim. Effect sizes for research: A broad practical
approach. Lawrence Earlbaum Associates, 2nd edition edition, 2005.
Raf Haesen, Monique Snoeck, Wilfried Lemahieu, and Stephan Poelmans. On
the definition of service granularity and its architectural impact. CAiSE,
pages 375–389, Berlin, Heidelberg, 2008. ISBN 978-3-540-69533-2.
Mark Harman, Stephen Swift, and Kiarash Mahdavi. An empirical study of
the robustness of two module clustering fitness functions. In GECCO, pages
1029–1036, 2005. ISBN 1-59593-010-8.
Mark Harman, S. Afshin Mansouri, and Yuanyuan Zhang. Search-based software engineering: Trends, techniques and applications. ACM Comput. Surv.,
45(1):11:1–11:61, December 2012. ISSN 0360-0300.
BIBLIOGRAPHY
185
Brian Henderson-Sellers. Object-oriented metrics: measures of complexity.
Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1996. ISBN 0-13-2398729.
Brian Henderson-Sellers, Larry L. Constantine, and Ian M. Graham. Coupling
and cohesion (towards a valid metrics suite for object-oriented analysis and
design). Object Oriented Systems, 3:143–158, 1996.
Gregor Hohpe and Bobby Woolf. Enterprise Integration Patterns: Designing,
Building, and Deploying Messaging Solutions. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2003. ISBN 0321200683.
Will G. Hopkins. A new view of statistics. Internet Society for Sport Science,
2000.
Daqing Hou and Xiaojia Yao. Exploring the intent behind api evolution: A
case study. In Proceedings of the Working Conference on Reverse Engineering,
pages 131–140, 2011.
Curtis E. Hrischuk and Murray C. Woodside. Logical clock requirements for
reverse engineering scenarios from a distributed system. IEEE Trans. Softw.
Eng., 28:321–339, April 2002. ISSN 0098-5589.
Eduardo Raul Hruschka, Ricardo J. G. B. Campello, Alex A. Freitas, and André C. Ponce Leon F. De Carvalho. A survey of evolutionary algorithms for
clustering. Trans. Sys. Man Cyber Part C, 39(2):133–155, March 2009. ISSN
1094-6977.
Deligiannis Ignatios, Stamelos Ioannis, Angelis Lefteris, Roumeliotis Manos,
and Shepperd Martin. A controlled experiment investigation of an object
oriented design heuristic for maintainability. Journal of Systems and Software, 65(2), February 2003.
Deligiannis Ignatios, Shepperd Martin, Roumeliotis Manos, and Stamelos
Ioannis. An empirical investigation of an object-oriented design heuristic
for maintainability. Journal of Systems and Software, 72(2), 2004.
Daniel Jacobson.
Embracing the differences :
Inside the netflix
api redesign. http://techblog.netflix.com/2012/07/embracingdifferences-inside-netflix.html, 2012. [Online; accessed May2014].
186
BIBLIOGRAPHY
Jinlei Jiang, Yongwei Wu, and Guangwen Yang. Making service granularity
right: An assistant approach based on business process analysis. CHINAGRID, pages 204–210, Washington, DC, USA, 2011. ISBN 978-0-7695-44724.
Nicolai Josuttis. Soa in Practice: The Art of Distributed System Design. O’Reilly
Media, Inc., 2007. ISBN 0596529554.
Huzefa Kagdi, Michael L. Collard, and Jonathan I. Maletic. A survey and
taxonomy of approaches for mining software repositories in the context of
software evolution. J. Softw. Maint. Evol., 19(2):77–131, March 2007. ISSN
1532-060X.
Foutse Khomh, Massimiliano Di Penta, and Yann-Gael Gueheneuc. An exploratory study of the impact of code smells on software change-proneness.
In Proceedings of the Working Conference on Reverse Engineering, pages 75–
84, 2009.
Foutse Khomh, Stephane Vaucher, Yann-Gaël Guéhéneuc, and Houari
Sahraoui. Bdtex: A gqm-based bayesian approach for the detection of antipatterns. Journal of Systems and Software, 84(4):559 – 572, 2011. ISSN
0164-1212. <ce:title>The Ninth International Conference on Quality Software</ce:title>.
Foutse Khomh, Massimiliano Di Penta, Yann-Gaël Guéhéneuc, and Giuliano
Antoniol. An exploratory study of the impact of antipatterns on class
change- and fault-proneness. Empirical Software Engineering, 17(3):243–
275, 2012.
Taghi M. Khoshgoftaar and Robert M. Szabo. Improving code churn predictions during the system test and maintenance phases. In Proceedings of the
International Conference on Software Maintenance, pages 58–67, 1994.
Alireza Khoshkbarforoushha, R. Tabein, Pooyan Jamshidi, and Fereidoon Shams Aliee. Towards a metrics suite for measuring composite service
granularity level appropriateness. In SERVICES, pages 245–252, 2010. ISBN
978-0-7695-4129-7.
Dirk Krafzig, Karl Banke, and Dirk Slama. Enterprise SOA: Service-Oriented Architecture Best Practices (The Coad Series). Prentice Hall PTR, Upper Saddle
River, NJ, USA, 2004. ISBN 0131465759.
BIBLIOGRAPHY
187
Jaroslav Král and Michal Zemlicka. The most important service-oriented antipatterns. In Proceedings of the International Conference on Software Engineering Advances, page 29, 2007.
WH Kruskal and WA Wallis. Use of ranks in one-criterion variance analysis.
Journal of the American Statistical Association, 47(260):583–621, 1952.
Naveen N. Kulkarni and Vishal Dwivedi. The role of service granularity in a
successful soa realization - a case study. In SERVICES I, pages 423–430. IEEE
Computer Society, 2008. ISBN 978-0-7695-3286-8.
Avadhesh Kumar, Rajesh Kumar, and P. S. Grover. Unified cohesion measures
for aspect-oriented systems. International Journal of Software Engineering
and Knowledge Engineering, 21(1):143–163, 2011.
Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7):558–565, 1978.
Guillaume Langelier, Houari A. Sahraoui, and Pierre Poulin. Visualizationbased analysis of quality for large-scale software systems. In proceedings of
the 20 t h international conference on Automated Software Engineering. ACM
Press, Nov 2005.
Michele Lanza and Radu Marinescu. Object-Oriented Metrics in Practice.
Springer-Verlag, 2006. ISBN 3-540-24429-8.
Erich Leo Lehmann and H.J.M D’Abrera. Nonparametrics : Statistical Methods
Based on Ranks. Holden-Day Series in Probability and Statistics. Holden-Day
New York Dusseldorf Johannesbourg, 1975. ISBN 0-07-037073-7.
Philipp Leitner, Anton Michlmayr, Florian Rosenberg, and Schahram Dustdar.
End-to-end versioning support for web services. In 2008 IEEE International
Conference on Services Computing (SCC), pages 59–66, 2008.
Stefan Lessmann, Bart Baesens, Christophe Mues, and Swantje Pietsch. Benchmarking classification models for software defect prediction: A proposed
framework and novel findings. IEEE Trans. Softw. Eng., 34:485–496, July
2008. ISSN 0098-5589.
Wei Li and Sallie M. Henry. Object-oriented metrics which predict maintainability. Technical report, Virginia Polytechnic Institute & State University,
Blacksburg, VA, USA, 1993.
188
BIBLIOGRAPHY
Wei Li and Raed Shatnawi. An empirical study of the bad smells and class error
probability in the post-release object-oriented system evolution. Journal of
Systems and Software, 80(7), 2007.
Zheng Li, Yi Bian, Ruilian Zhao, and Jun Cheng. A fine-grained parallel multiobjective test case prioritization on gpu. In SSBSE, pages 111–125, 2013.
Fangfang Liu, Yuliang Shi, Jie Yu, Tianhong Wang, and Jingzhe Wu. Measuring
similarity of web services based on wsdl. In ICWS, pages 155–162, 2010.
Kiarash Mahdavi, Mark Harman, and Robert M. Hierons. A multiple hill climbing approach to software module clustering. In ICSM, pages 315–324, 2003.
ISBN 0-7695-1905-9.
Spiros Mancoridis, Brian S. Mitchell, Chris Rorres, Yih-Farn Chen, and Emden R. Gansner. Using automatic clustering to produce high-level system
organizations of source code. In IWPC, pages 45–52, 1998. ISBN 0-81868560-3.
Spiros Mancoridis, Brian S. Mitchell, Yih-Farn Chen, and Emden R. Gansner.
Bunch: A clustering tool for the recovery and maintenance of software system structures. In ICSM, pages 50–59, 1999.
Henry B. Mann and Whitney D. R. On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical
Statistics, 18(1):50–60, 1947.
Mika Mantyla. Bad Smells in Software - a Taxonomy and an Empirical Study.
PhD thesis, Helsinki University of Technology, 2003.
Radu Marinescu. Detection strategies: Metrics-based rules for detecting design flaws. In Proceedings of the 20 th International Conference on Software
Maintenance, pages 350–359. IEEE CS Press, 2004.
Robert C. Martin. Agile Software Development, Principles, Patterns, and Practices. Prentice-Hall, Inc, 2002.
Friedemann Mattern. Virtual time and global states of distributed systems. In
Parallel and Distributed Algorithms, pages 215–226. North-Holland, 1989.
Bernhart M. Mauczka A., Grechenig T. Predicting code change by using static
metrics. In Software Engineering Research, Management and Applications,
pages 64–71, 2009.
BIBLIOGRAPHY
189
Diego Mendez, Benoit Baudry, and Martin Monperrus. Empirical evidence
of large-scale diversity in api usage of object-oriented software. In SCAM,
pages 43–52, 2013.
Tim Menzies, Jeremy Greenwald, and Art Frank. Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng., 33:2–13, January
2007. ISSN 0098-5589.
Brian S. Mitchell and Spiros Mancoridis. Using heuristic search techniques to
extract design abstractions from source code. In GECCO, pages 1375–1382,
2002. ISBN 1-55860-878-8.
Brian S. Mitchell and Spiros Mancoridis. On the automatic modularization of
software systems using the bunch tool. IEEE Trans. Software Eng., 32(3):
193–208, 2006.
Naouel Moha, Yann-Gaël Guéhéneuc, Anne-Françoise Le Meur, and Laurence
Duchien. A domain analysis to specify design defects and generate detection algorithms. In Proceedings of the Theory and practice of software, 11th
international conference on Fundamental approaches to software engineering,
FASE’08/ETAPS’08, pages 276–291, Berlin, Heidelberg, 2008a. SpringerVerlag. ISBN 3-540-78742-9, 978-3-540-78742-6.
Naouel Moha, Amine Mohamed Rouane Hacene, Petko Valtchev, and YannGaël Guéhéneuc. Refactorings of design defects using relational concept
analysis. In Proceedings of the 6th international conference on Formal concept
analysis, ICFCA’08, pages 289–304, Berlin, Heidelberg, 2008b. SpringerVerlag. ISBN 3-540-78136-6, 978-3-540-78136-3.
Naouel Moha, Yann-Gael Gueheneuc, Laurence Duchien, and Anne-Francoise
Le Meur. Decor: A method for the specification and detection of code and
design smells. IEEE Trans. Softw. Eng., 36(1):20–36, January 2010. ISSN
0098-5589.
Naouel Moha, Francis Palma, Mathieu Nayrolles, Benjamin Joyen Conseil,
Yann-Gael.Gueheneuc@polymtl.Ca Yann-Gael, Guéhéneuc, Benoit Baudry,
and Jean-Marc Jézéquel. Specification and detection of soa antipatterns. In
Proceedings of the International Conference on Service Oriented Computing,
pages 1–16, Shanghai, China, 2012.
Matthew James Munro. Product metrics for automatic identification of “bad
smell" design problems in java source-code. In Proceedings of the 11 th International Software Metrics Symposium. IEEE Computer Society Press, September 2005.
190
BIBLIOGRAPHY
Stephan Murer. 13 years of soa at credit suisse: Lessons learned-remaining
challenges. In Proceedings of the European Conference on Web Services,
page 12, Sept 2011.
Stephan Murer, Bruno Bonati, and Frank Furrer. Managed Evolution - A Strategy for Very Large Information Systems. Springer, 2010. ISBN 3-642-01632-4.
Nachiappan Nagappan, Andreas Zeller, Thomas Zimmermann, Kim Herzig,
and Brendan Murphy. Change bursts as defect predictors. In ISSRE, pages
309–318, 2010.
Dongkyung Nam and Cheol Hoon Park. Multiobjective simulated snnealing:
a comparative study to evolutionary algorithms. International Journal of
Fuzzy Systems, 2(2):87–97, 2000.
Hans Neukom. Early use of computers in swiss banks. IEEE Annals of the
History of Computing, 26(3):50–59, 2004.
Mark O’Keeffe and Mel í Cinnéide. Search-based refactoring for software maintenance. J. Syst. Softw., 81(4):502–516, April 2008. ISSN 0164-1212.
Steffen Olbrich, Daniela S. Cruzes, Victor Basili, and Nico Zazworka. The
evolution and impact of code smells: A case study of two open source systems. In Third International Symposium on Empirical Software Engineering
and Measurement, 2009.
Rocco Oliveto, Foutse Khomh, Giuliano Antoniol, and Yann-Gaël Guéhéneuc.
Numerical signatures of antipatterns: An approach based on b-splines. In
Rafael Capilla, Rudolf Ferenc, and Juan Carlos Dueas, editors, Proceedings of
the 14 th Conference on Software Maintenance and Reengineering. IEEE Computer Society Press, March 2010.
Mike P. Papazoglou. The challenges of service evolution. In Proceedings of
the international Conference on Advanced Information Systems Engineering,
pages 1–15, 2008.
Cesare Pautasso and Erik Wilde. Why is the web loosely coupled?: A multifaceted metric for service design. In Proceedings of the 18th International
Conference on World Wide Web, WWW ’09, pages 911–920, New York, NY,
USA, 2009. ACM. ISBN 978-1-60558-487-4.
Cesare Pautasso, Olaf Zimmermann, and Frank Leymann. Restful web services
vs. "big"’ web services: Making the right architectural decision. In Proceedings of the 17th International Conference on World Wide Web, WWW ’08,
pages 805–814, New York, NY, USA, 2008. ACM. ISBN 978-1-60558-085-2.
BIBLIOGRAPHY
191
Massimiliano Di Penta, Luigi Cerulo, Yann-Gaël Guéhéneuc, and Giuliano Antoniol. An empirical study of the relationships between design pattern roles
and class change proneness. In Proceedings of the International Conference
on Software Maintenance, pages 217–226, 2008.
Mikhail Perepletchikov, Caspar Ryan, and Keith Frampton. Towards the definition and validation of coupling metrics for predicting maintainability in
service-oriented designs. In OTM Workshops (1), pages 34–35, 2006.
Mikhail Perepletchikov, Caspar Ryan, and Keith Frampton. Cohesion metrics
for predicting maintainability of service-oriented software. In Proceedings
of the International Conference on Quality Software, pages 328–335, 2007.
ISBN 0-7695-3035-4.
Mikhail Perepletchikov, Caspar Ryan, and Zahir Tari. The impact of service
cohesion on the analyzability of service-oriented software. Transactions on
Services Computing, 3(2):89–103, April 2010. ISSN 1939-1374.
Pierluigi Plebani and Barbara Pernici. Urbe: Web service retrieval based on
similarity evaluation. IEEE Trans. on Knowl. and Data Eng., 21:1629–1642,
November 2009. ISSN 1041-4347.
Daryl Posnett, Christian Bird, and Prem Dévanbu. An empirical study on the
influence of pattern roles on change-proneness. Empirical Software Engineering, 16(3):396–423, June 2011. ISSN 1382-3256.
Colin Potts. Software-engineering research revisited. IEEE Softw., 10(5):19–
28, September 1993. ISSN 0740-7459.
Kata Praditwong, Mark Harman, and Xin Yao. Software module clustering as a
multi-objective search problem. IEEE Trans. Software Eng., 37(2):264–282,
2011.
Steven Raemaekers, Arie van Deursen, and Joost Visser. Measuring software
library stability through historical version analysis. In Proceedings of the
International Conference on Software Maintenance, pages 378–387, 2012.
Steven Raemaekers, Arie van Deursen, and Joost Visser. The maven repository
dataset of metrics, changes, and dependencies. In MSR, pages 221–224,
2013.
Romain Robbes, Damien Pollet, and Michele Lanza. Logical coupling based
on fine-grained change information. In Proceedings of the 2008 15th Working Conference on Reverse Engineering, pages 42–46, Washington, DC, USA,
2008. IEEE Computer Society. ISBN 978-0-7695-3429-9.
192
BIBLIOGRAPHY
Daniele Romano and Martin Pinzger. Using source code metrics to predict
change-prone java interfaces. In ICSM, pages 303–312, 2011a. ISBN 9781-4577-0663-9.
Daniele Romano and Martin Pinzger. Using vector clocks to monitor dependencies among services at runtime. In Proceedings of the International Workshop on Quality Assurance for Service-Based Applications, QASBA ’11, pages
1–4, 2011b. ISBN 978-1-4503-0826-7.
Daniele Romano and Martin Pinzger. Analyzing the evolution of web services
using fine-grained changes. In ICWS, pages 392–399, 2012. ISBN 978-14673-2131-0.
Daniele Romano and Martin Pinzger. A genetic algorithm to find the adequate
granularity for service interfaces. In 2014 IEEE World Congress on Services,
Anchorage, AK, USA, June 27 - July 2, 2014, pages 478–485, 2014.
Daniele Romano, Martin Pinzger, and Eric Bouwers. Extracting dynamic dependencies between web services using vector clocks. In SOCA, pages 1–8,
2011.
Daniele Romano, Paulius Raila, Martin Pinzger, and Foutse Khomh. Analyzing
the impact of antipatterns on change-proneness using fine-grained source
code changes. In Proceedings of the Working Conference on Reverse Engineering, pages 437–446, 2012.
Daniele Romano, Maria Kalouda, and Martin Pinzger. Analyzing the impact of
external and internal cohesion on the change-proneness of web apis. Technical Report TUD-SERG-2013-018, Software Engineering Research Group,
Delft University of Technology, 2013. URL http://swerl.tudelft.nl/
twiki/pub/Main/TechnicalReports/TUD-SERG-2013-018.pdf.
Daniele Romano, Steven Raemaekers, and Martin Pinzger. Refactoring
fat interfaces using a genetic algorithm. Technical Report TUD-SERG2014-007, Software Engineering Research Group, Delft University of Technology, 2014. URL http://swerl.tudelft.nl/twiki/pub/Main/
TechnicalReports/TUD-SERG-2014-007.pdf.
Dieter H. Rombach. A controlled expeniment on the impact of software structure on maintainability. IEEE Trans. Softw. Eng., 13:344–354, March 1987.
ISSN 0098-5589.
Dieter H. Rombach. Design measurement: Some lessons learned. IEEE Softw.,
7:17–25, March 1990. ISSN 0740-7459.
BIBLIOGRAPHY
193
Arnon Rotem-Gal-Oz. SOA Patterns. Manning Pubblications, 1 edition, 2012.
ISBN 9781933988269.
Günter Rudolph. Evolutionary search for minimal elements in partially ordered finite sets. In Evolutionary Programming, volume 1447 of Lecture
Notes in Computer Science, pages 345–353, 1998. ISBN 3-540-64891-7.
R.S. Arnold Shawn A. Bohner. Software change impact analysis. IEEE Computer Society Press, 1996.
Jeffery Shelburg, Marouane Kessentini, and Daniel R. Tauritz. Regression testing for model transformations: A multi-objective approach. In SSBSE, volume 8084 of Lecture Notes in Computer Science, pages 209–223, 2013. ISBN
978-3-642-39741-7.
David J. Sheskin. Handbook of Parametric and Nonparametric Statistical Procedures. Chapman & Hall/CRC, 4 edition, 2007. ISBN 1584888148,
9781584888147.
Jelber Sayyad Shirabad, Timothy C. Lethbridge, and Stan Matwin. Mining the
maintenance history of a legacy software system. In Proceedings of the International Conference on Software Maintenance, ICSM ’03, pages 95–, Washington, DC, USA, 2003. IEEE Computer Society. ISBN 0-7695-1905-9.
Frank Simon, Frank Steinbrückner, and Claus Lewerentz. Metrics based refactoring. In Proceedings of the Fifth European Conference on Software Maintenance and Reengineering (CSMR’01), page 30. IEEE CS Press, 2001. ISBN
0-7695-1028-0.
Renuka Sindhgatta, Bikram Sengupta, and Karthikeyan Ponnalagu. Measuring the quality of service oriented design. In Proceedings of the International
Joint Conference on Service-Oriented Computing, pages 485–499, Berlin, Heidelberg, 2009. Springer-Verlag.
S. N. Sivanandam and S. N. Deepa. Introduction to Genetic Algorithms.
Springer Publishing Company, Incorporated, 1st edition, 2007. ISBN
354073189X, 9783540731894.
Edward Smith, Robert Loftin, Emerson Murphy-Hill, Christian Bird, and
Thomas Zimmermann. Improving developer participation rates in surveys.
In Proceedings of the International Workshop on Cooperative and Human Aspects of Software Engineering, pages 89–92, 2013.
194
BIBLIOGRAPHY
Ramanath Subramanyam and M. S. Krishnan. Empirical analysis of ck metrics for object-oriented design complexity: Implications for software defects.
IEEE Trans. Softw. Eng., 29:297–310, April 2003. ISSN 0098-5589.
S. Dowdy S.Weardon and D. Chilko. Statistics for Research. Probability and
Statistics. John Wiley and Sons, 2004.
Suresh Thummalapenta, Luigi Cerulo, Lerina Aversano, and Massimiliano Di
Penta. An empirical study on the maintenance of source code clones. Empirical Software Engineering, 15(1):1–34, 2010.
Sander Tichelaar, Stéphane Ducasse, and Serge Demeyer. Famix and xmi.
In Proceedings of the Seventh Working Conference on Reverse Engineering
(WCRE’00), WCRE ’00, pages 296–, Washington, DC, USA, 2000. IEEE Computer Society. ISBN 0-7695-0881-2.
Guilherme Travassos, Forrest Shull, Michael Fredericks, and Victor R. Basili.
Detecting defects in object-oriented designs: using reading techniques to
increase software quality. In Proceedings of the 14 th Conference on ObjectOriented Programming, Systems, Languages, and Applications, pages 47–56.
ACM Press, 1999.
Martin Treiber, Hong Linh Truong, and Schahram Dustdar. On analyzing evolutionary changes of web services. In ICSOC Workshops, pages 284–297,
2008.
Nikolaos Tsantalis, Alexander Chatzigeorgiou, and George Stephanides. Predicting the probability of change in object-oriented systems. IEEE Transactions on Software Engineering, 31(7):601–614, 2005. ISSN 0098-5589.
Nikolaos Tsantalis, Natalia Negara, and Eleni Stroulia. Webdiff: A generic
differencing service for software artifacts. In ICSM, pages 586–589, 2011.
Eva van Emden and Leon Moonen. Java quality assurance by detecting code
smells. In Proceedings of the 9th Working Conference on Reverse Engineering
(WCRE’02). IEEE CS Press, October 2002.
Mario Linares Vásquez, Gabriele Bavota, Carlos Bernal-Cárdenas, Massimiliano Di Penta, Rocco Oliveto, and Denys Poshyvanyk. Api change and fault
proneness: a threat to the success of android apps. In Proceedings of the
ESEC/SIGSOFT Foundations of Software Engineering, pages 477–487, 2013.
W3C. Web services architecture. http://www.w3.org/TR/ws-arch/,
2004. [Online; accessed May-2014].
BIBLIOGRAPHY
195
William C. Wake. Refactoring Workbook. Addison-Wesley Longman Publishing
Co., Inc., Boston, MA, USA, 2003. ISBN 0321109295.
Shuying Wang and Miriam A. M. Capretz. A dependency impact analysis
model for web services evolution. In ICWS, pages 359–365, 2009.
Bruce F. Webster. Pitfalls of Object Oriented Development. M & T Books, 1st
edition, February 1995. ISBN 1558513973.
Ian H. Witten and Eibe Frank. Data Mining: Practical Machine Learning Tools
and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA,
2005. ISBN 0120884070.
Claes Wohlin, Per Runeson, Martin Höst, Magnus C. Ohlsson, Bjöorn Regnell,
and Anders Wesslén. Experimentation in Software Engineering: An Introduction. Kluwer Academic Publishers, Norwell, MA, USA, 2000. ISBN 0-79238682-5.
Qian Wu, Ling Wu, Guangtai Liang, Qianxiang Wang, Tao Xie, and Hong Mei.
Inferring dependency constraints on parameters for web services. In WWW,
pages 1421–1432, 2013.
Zhenchang Xing and Eleni Stroulia. Umldiff: an algorithm for object-oriented
design differencing. In Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering, ASE ’05, pages 54–65, 2005a.
Zhenchang Xing and Eleni Stroulia. Analyzing the evolutionary history of the
logical design of object-oriented software. IEEE Trans. Software Eng., 31
(10):850–868, 2005b.
Aiko Fallas Yamashita and Leon Moonen. Exploring the impact of inter-smell
relations on software maintainability: an empirical study. In ICSE, pages
682–691, 2013. ISBN 978-1-4673-3076-3.
Shin Yoo and Mark Harman. Pareto efficient multi-objective test case selection.
In ISSTA, pages 140–150, 2007. ISBN 978-1-59593-734-6.
Kaizhong Zhang and Dennis Shasha. Simple fast algorithms for the editing
distance between trees and related problems. SIAM J. Comput., 18:1245–
1262, December 1989.
Yilei Zhang, Zibin Zheng, and Michael R. Lyu. Wsexpress: A qos-aware search
engine for web services. In ICWS, pages 91–98. IEEE Computer Society,
2010. ISBN 978-0-7695-4128-0.
196
BIBLIOGRAPHY
Yuanyuan Zhang, Mark Harman, and Soo Ling Lim. Empirical evaluation of
search based requirements interaction management. Information & Software
Technology, 55(1):126–152, 2013.
Jianjun Zhao and Baowen Xu. Measuring aspect cohesion. In Proceedings of
the Fundamental Approaches to Software Engineering, pages 54–68, 2004.
Yuming Zhou and Hareton Leung. Predicting object-oriented software maintainability using multivariate adaptive regression splines. J. Syst. Softw., 80:
1349–1361, August 2007. ISSN 0164-1212.
Yuming Zhou, Hareton Leung, and Baowen Xu. Examining the potentially
confounding effect of class size on the associations between object-oriented
metrics and change-proneness. Transactions on Software Engineering, 35(5):
607–623, 2009.
Thomas Zimmermann, Peter Weisgerber, Stephan Diehl, and Andreas Zeller.
Mining version histories to guide software changes. In Proceedings of the
26th International Conference on Software Engineering, ICSE ’04, pages 563–
572, Washington, DC, USA, 2004. IEEE Computer Society. ISBN 0-76952163-0.
Thomas Zimmermann, Rahul Premraj, and Andreas Zeller. Predicting defects
for eclipse. In Proceedings of the Third International Workshop on Predictor
Models in Software Engineering, PROMISE ’07, pages 9–, Washington, DC,
USA, 2007. IEEE Computer Society. ISBN 0-7695-2954-2.
Thomas Zimmermann, Nachiappan Nagappan, Harald Gall, Emanuel Giger,
and Brendan Murphy. Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In Proceedings of the the 7th joint
meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, ESEC/FSE ’09,
pages 91–100, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-001-2.
Eckart Zitzler and Lothar Thiele. Multiobjective evolutionary algorithms: a
comparative case study and the strength pareto approach. IEEE Trans. Evolutionary Computation, 3(4):257–271, 1999.
Summary
Analyzing the Change-Proneness of APIs and web APIs
APIs and web APIs are used to expose existing business logic and, hence, to
ease the reuse of functionalities across multiple software systems. Software
systems can use the business logic of legacy systems by binding their APIs
and web APIs. With the emergence of a new programming paradigm called
service-oriented, APIs are exposed as web APIs hiding the technologies used
to implement legacy systems. As a consequence, web APIs establish contracts
between legacy systems and their consumers and they should stay as stable as
possible to not break consumers’ systems.
This dissertation aims at better understanding the change-proneness of
APIs and web APIs. Specifically to that end, we investigated which indicators
can be used to highlight change-prone APIs and web APIs and we provided
approaches to assist practitioners in refactoring them. To perform this analysis
we adopted a research approach consisting of three different tracks: analysis
of change-prone APIs, analysis of change-prone web APIs, and refactoring of
change-prone APIs and web APIs.
Change-Prone APIs
Service-oriented systems are composed by web services. Each web service is
implemented by an implementation logic that is hidden to its clients through
its web APIs. Along the history of a software system the implementation
logic can be changed and its changes can be propagated and affect web APIs.
Among all the software units composing the implementation logic, APIs are
likely to be mapped directly into web APIs. This scenario is likely to happen
especially if a legacy API is made available through a web service.
197
198
Summary
In this first track we focused on analyzing the change-proneness of APIs
(i.e., the set of public methods declared in a software unit). Among all the
metrics we analyzed, we have shown that the Interface Usage Cohesion (IUC)
metric is the most suitable metric to highlight change-prone Java interfaces.
This result suggests that software engineers should design interfaces with high
external cohesion (measured with the IUC metric) to avoid frequent changes.
Moreover, we analyzed the impact of specific antipatterns on the changeproneness of APIs. We showed empirically that changes to APIs are more
likely to appear if APIs are affected by the ComplexClass, SpaghettiCode, and
SwissArmyKnife antipatterns. As a consequence software engineers should
refactor APIs affected by these antipatterns.
Change-Prone Web APIs
In the second track we analyzed the change-proneness of web APIs. First,
we developed two tools to analyze software systems composed of web APIs.
The first tool is called WSDLDiff and extracts fine-grained changes between
subsequent versions of WSDL APIs. The second tool extracts the full chains of
dependencies among web APIs at run time.
Second, we performed an empirical study to investigate which scenarios
can cause changes to web APIs. We showed that low externally cohesive APIs
change frequently to 1) improve understandability and 2) ease maintainability
and reduce clones in the APIs. Low internally cohesive APIs change frequently
to 1) reduce the impact of changes on the many clients they have, 2) avoid
that all the clients lead the APIs to be changed frequently, and 3) improve understandability. Moreover, we proposed a new internal cohesion metric (DTC)
to measure the internal cohesion of WSDL APIs.
Refactoring APIs and Web APIs
Based on the results of the studies performed in the first and second track, we
defined two approaches to refactor APIs and web APIs.
The first approach assists software engineers in refactoring APIs with low
external cohesion based on the Interface Segregation Principle (ISP). We defined the problem of splitting low externally cohesive APIs into smaller APIs
specific for each client (i.e., ISP) as a multi-objective clustering optimization
problem. To solve this problem we proposed a genetic algorithm that outperforms other search based approaches.
The second approach assists software engineers in refactoring fine-grained
web APIs. These APIs should be refactored into coarse-grained web APIs to
reduce the number of remote invocations and avoid performance problems.
199
To achieve this goal we proposed a genetic algorithm that looks for Façade
APIs that cluster together the fine-grained methods of the original API.
Conclusion
We believe that these results advance the state-of-the-art in designing, analyzing, and refactoring software systems composed of web APIs (i.e., serviceoriented systems) and provide to the research community new insights into
the change-proneness of APIs and web APIs.
Daniele Romano
Samenvatting
Analyse van Veranderlijke Web APIs en APIs
APIs en web APIs helpen om bestaande business-logica aan te bieden en vereenvoudigen het hergebruik van functionaliteit in meerdere software systemen.
Software systemen kunnen de business-logica van legacy systemen gebruiken
door elkaars APIs en web APIs te verbinden. Met de opkomst van het service georiënteerde programmeerparadigma worden APIs geëxposeerd als web
APIs die de technologie waarmee de legacy systemen geïmplementeerd zijn
verbergen. Als gevolg hiervan sluiten web APIs contracten af tussen legacy
systemen en hun gebruikers en dienen ze zo stabiel mogelijk te zijn zodat ze
de systemen van deze gebruikers niet kapot maken.
Het doel van dit proefschrift is om een beter begrip te krijgen van de veranderlijkheid van APIs en web APIs. Hiervoor hebben we onderzocht welke indicatoren gebruikt kunnen worden om APIs en web APIs met een hoge veranderlijkheid te identificeren en we hebben ontwikkelaars van methodes voorzien
om deze APIs te herschrijven. Om deze analyse uit te voeren hebben we een
onderzoeksmethode gehanteerd die is opgedeeld in drie delen: analyse van
veranderlijke APIs, analyse van veranderlijke web APIs en het herschrijven
van veranderlijke APIs en web APIs.
Veranderlijke APIs
Service-georiënteerde systemen zijn opgebouwd uit web services. Elke web
service is geïmplementeerd met behulp van een logica die verborgen is voor
zijn gebruikers door middel van web APIs. Gedurende de geschiedenis van
een software systeem, kunnen veranderingen in deze logica doorwerken naar
web APIs. Van alle software componenten waaruit de implementatie logica
bestaat, is de API de meest waarschijnlijke om direct gekoppeld te worden
201
202
Samenvatting
aan web APIs. Dit scenario komt vaak voor als een legacy API beschikbaar
wordt gemaakt als web service.
In dit eerste deel van het onderzoek focusten wij op het analyseren van de
veranderlijkheid van APIs (i.e., de set publieke methoden in een software component). We hebben aangetoond dat de Interface Usage Cohesion (IUC) metriek de meest geschikte metriek is om veranderlijke Java-interfaces te identificeren. Dit resultaat suggereert dat software engineers interfaces met een hoge
mate van externe cohesie (gemeten met de IUC metriek) zouden moeten ontwerpen om frequente veranderingen te vermijden. Ook hebben we de impact
op de veranderlijkheid van APIs van specifieke antipatronen geanalyseerd. We
hebben empirisch aangetoond dat veranderingen van APIs waarschijnlijker
zijn wanneer ze slachtoffer zijn van het ComplexClass, SpaghettiCode of SwissArmyKnife antipatroon. Daarom dienen software engineers APIs die geraakt
worden door deze antipatronen te herschrijven.
Veranderlijke web APIs
In het tweede deel van het onderzoek hebben we de veranderlijkheid van
web APIs geanalyseerd. Ten eerste hebben we twee tools ontwikkeld om software systemen die opgebouwd zijn uit web APIs te analyseren. De eerste tool,
WSDLDiff, extraheert zeer kleine veranderingen tussen opeenvolgende versies
van WSDL APIs. De tweede tool extraheert de volledige reeks van afhankelijkheden tussen web APIs tijdens run-time.
Daarnaast hebben we een empirische studie uitgevoerd om te onderzoeken
welke scenarios veranderingen in web APIs kunnen veroorzaken. We hebben
aangetoond dat APIs met een lage externe cohesie vaak veranderen om 1)
de begrijpelijkheid te verbeteren en 2) onderhoud te vereenvoudigen en het
aantal clones binnen de API te verkleinen. APIs met een lage interne cohesie veranderen vaak om 1) de impact van veranderingen op het grote aantal
klanten dat ze hebben te verkleinen, 2) te vermijden dat veranderende eisen
van klanten leiden tot veranderingen in de APIs en 3) om de begrijpelijkheid te
verbeteren. Daarnaast hebben we een nieuwe interne cohesie metriek (DTC)
voorgesteld voor het meten van interne cohesie van WSDL APIs.
Het herschrijven van APIs en Web APIs
Gebaseerd op de resultaten van de studies uit het eerste en tweede deel van
dit onderzoek hebben we twee methodes voor het herschrijven van APIs en
web APIs gepresenteerd.
De eerste methode assisteert software engineers met het herschrijven van
APIs met een lage externe cohesie en is gebaseerd op het Interface Segrega-
203
tion Principle (ISP). We hebben het probleem van het opdelen van APIs met
een lage externe cohesie in kleinere APIs specifiek voor elke klant (i.e., ISP)
gedefinieerd als een multi-objective clustering optimalisatie probleem. Om
dit probleem op te lossen hebben we een genetisch algoritme voorgesteld dat
beter presteert dan andere search-based methodes.
De tweede methode assisteert software engineers met het herschrijven van
fine-grained web APIs. Deze APIs dienen herschreven te worden als coarsegrained APIs om het aantal aanroepen van buitenaf te verkleinen en hiermee
performance problemen te vermijden. Om dit te bereiken hebben we een
genetisch algoritme voorgesteld dat zoekt naar Façade APIs die de fine-grained
methodes van de originele API samenvoegen.
Conclusie
We geloven dat deze resultaten de state-of-the-art in het ontwerpen, analyseren en herschrijven van software systemen die bestaan uit web APIs (i.e.,
service- georiënteerde systemen) vooruit helpen. Daarnaast bieden ze de onderzoeksgemeenschap nieuwe inzichten in de veranderlijkheid van APIs en
web APIs.
Daniele Romano
Curriculum Vitae
Education
2010 – 2014: Ph.D., Computer Science
Delft University of Technology, Delft, The Netherlands. Under the supervision of prof. dr. M. Pinzger.
2007 – 2010: M.Sc., Computer Science
University of Sannio, Benenvento, Italy.
Master’s thesis title: An Approach for Search Based Testing of Null Pointer
Exceptions.
2001 – 2006: B.Sc., Computer Science
University of Sannio, Benevento, Italy. Bachelor’s thesis title: Development and testing of a GUI tool for the creation and modification of nomadic
applications.
Work Experience
2014 – present: Advisory IT Specialist/Continuous Delivery Product Owner
ING Nederland, Amsterdam, The Netherlands.
2010: Software Engineering Researcher
Internship at École Polytechnique de Montréeal, Canada.
2005 – 2007 : Java and SOA Software Developer as Freelancer.
Benevento, Italy.
2006: Software Engineering Researcher
RCOST (Research Centre On Software Technology), Benevento, Italy.
205
206
Curriculum Vitae