Quality Metrics for Maintainability of Standard Software
Transcription
Quality Metrics for Maintainability of Standard Software
Quality Metrics for Maintainability of Standard Software Master Thesis Dipl.-Ing. Oleksandr Panchenko Matr.Nr. 724084 Mentors: Dr.-Ing. Bernhard Gröne, Hasso-Plattner-Institute for IT Systems Engineering Dr. Albert Becker, SAP AG, Systems Applications Products in Data Processing 23.02.2006, Potsdam, Germany 2 Abstract The handover of software from development to the support department is accompanied by many tests and checks, which prove the maturity and the readiness for „go to market“. However, these quality gates are not able to assess the complexity of the entire product and predict the effort of maintenance. This work aims the researching of metric-based quality indicators in order to be able to assess the most important maintainability aspects of the standard software. The static source code analysis is selected as the method for mining information about the complexity. The research of this thesis is restricted to the ABAP and Java environment. The used quality model is derived from the Goal Question Metric approach and extends it for purposes of the current thesis. After literature research, the quality model was expanded by standard metrics and some special newly invented metrics. The selected metrics were validated theoretically against numerical properties using Zuse’s software measurement framework and practically against the ability to predict the maintainability using experiments. After experiments with several SAP-projects, some metrics were recognized as reliable indicators of the maintainability. Some other metrics can be used to find non-maintainable code and provide additional metric-based audits. For semiautomated analysis, few tools were suggested and an XSLT converter was developed in order to process the measurement data and prepare reports. This thesis should prepare the basis for further implementation and usage of the metrics. 3 4 Zusammenfassung Vor der Softwareübergabe von der Entwicklung zur Wartung werden zahlreiche Tests und Untersuchungen durchgeführt, die überprüfen sollen, ob das Produkt bereits reif genug ist, um an den Markt zu gehen. Obwohl die Qualitätskontrollen sehr umfangreich sind, wurden die gesamte Softwarekomplexität und der Aufwand für die zukünftige Wartung bisher kaum berücksichtigt. Deshalb setzt sich die vorliegende Arbeit zum Ziel, die verschiedenen auf Metriken basierten Qualitätsindikatoren, die die wichtigsten Aspekte der Wartbarkeit von Standardsoftware einschätzen, zu untersuchen. Als Komplexitätsanalysemethode wurde die statische Quellcodeanalyse ausgewählt. Die Untersuchung ist auf die ABAP- und Java-Umgebung beschränkt. Das Qualitätsmodell ist von der „Goal Question Metric“ - Methode abgeleitet und auf die Anforderungen der vorliegenden Arbeit angepasst. Nach ausführlicher Literaturrecherche wurde das Qualitätsmodell um bereits vorhandene und neu entwickelte Metriken erweitert. Die numerischen Eigenschaften der ausgewählten Metriken wurden mit Hilfe des Messsystems von Zuse theoretisch validiert. Um die Aussagefähigkeit von Metriken einzuschätzen, wurden praktische Studien durchgeführt. Experimente mit ausgewählten SAP-Projekten bestätigten einige Metriken als zuverlässige Wartbarkeitsindikatoren. Andere Metriken können verwendet werden, um nicht wartbaren Programmcode zu finden und zusätzliche auf Metriken basierte Audits zu liefern. Für ein halbautomatisches Vorgehen wurden einige Werkzeuge ausgewählt und zusätzlich eine XSLT entwickelt, um Messdaten zu aggregieren und Berichte vorzubereiten. Die vorliegende Arbeit soll sowohl als Grundlage für weitere Forschungen als auch für zukünftige Implementierungen dienen. 5 6 Abbreviations A ABAP AMC AST Ca CBO CDEm Сe COBISOME CLON CQM CR CYC D D2IMS DCD DCI DD DIT DOCU FP FPM GQM GVAR H I IF IMS In ISO KPI LC LCC LCOM LOC LOCm m Abstractness Advanced Business Application Programming (Language) Average Method Complexity Abstract Syntax Tree Afferent Coupling Coupling between Objects Class Definition Entropy (modified) Efferent Coupling Complexity Based Independent Software Metrics Clonicity Code Quality Management Comments Rate Cyclic Dependencies Distance from Main Sequence Development to IMS Degree of Cohesion (direct) Degree of Cohesion (indirect) Defect Density Depth of Inheritance Tree Documentation Rate Function Points Functions Point Method Goal Question Metric Number of Global Variables Entropy Information Inheritance Factor Installed Base Maintenance & Support Instability International Standards Organization Key Performance Indicators Lack of Comments Loose Class Cohesion Lack of Cohesion of Methods Lines Of Code Average LOC in methods Structure Entropy 7 MCC MEDEA MI MTTM NAC NDC NOC NOD NOM NOO NOS OO-D PIL RFC SAP SMI TCC U UML XML XSLT V WMC ZCC 8 McCabe Cyclomatic Complexity Metric Definition Approach Maintainability Index Mean Time To Maintain Number of Ancestor Classes Number of Descendent Classes Number Of Children in inheritance tree Number Of Developers Number of Methods Number Of Objects Number of Statements OO-Degree Product Innovation Lifecycle Response For a Class Systems, Applications and Products in Data Processing Software Maturity Index Tight Class Cohesion Reuse Factor Unified Modeling Language eXtensible Markup Language eXtensible Stylesheet Language Transformation Halstead volume Weighted Methods per Class ZIP-Coefficient of Compression Table of content 1. Introduction . . . . . . . . . 2. Research problem description . . . . . . . Different methods for source analysis . . . . . Metrics vs. audits . . . . . . . Classification of the metrics . . . . . . Types of maintenance . . . . . . . Goal of the work . . . . . . . . 3. Related works and projects . . . . . . . Maintainability index (MI) . . . . . . . Functions point method (FPM) . . . . . . Key performance indicators (KPI) . . . . . Maintainability assessment . . . . . . Abstract syntax tree (AST) . . . . . . . Complexity based independent software metrics (COBISOME) . Kaizen . . . . . . . . . ISO/IEC 9126 quality model . . . . . . 4. Quality model – goals and questions . . . . . . Goal Question Metric approach . . . . . . Quality model . . . . . . . . Size-dependent and quality-dependent metrics . . . 5. Software quality metrics overview . . . . . . Model: Lexical model . . . . . . . Metric: LOC – Lines Of Code . . . . . Metric: CR – Comments Rate, LC – Lack of Comments . Metric: CLON – Clonicity . . . . . . Short introduction into information theory and Metric: CDEm – Class Definition Entropy (modified) . . . Model: Flow-graph . . . . . . . . Metric: MCC – McCabe Cyclomatic Complexity . . Model: Inheritance hierarchy . . . . . . Metric: NAC – Number of Ancestor Classes . . . Metric: NDC – Number of Descendant Classes . . . Geometry of Inheritance Tree . . . . . Metric: IF – Inheritance Factor . . . . . Model: Structure tree . . . . . . . Metric: CBO – Coupling Between Objects . . . Metric: RFC – Response For a Class . . . . Metric: m – Structure entropy . . . . . Metric: LCOM – Lack of Cohesion Of Methods . . . Metric: D – Distance from main sequence . . . Metric: CYC – Cyclic dependencies . . . . Metric: NOM – Number Of Methods and WMC – Weighted Methods per Class . . . . . . . 11 13 13 13 15 15 16 17 17 17 18 18 19 19 19 19 20 20 21 24 25 25 25 26 26 27 35 35 36 37 37 37 40 40 40 42 43 45 50 51 53 9 Model: Structure chart . . . . . Metric: FAN-IN and FAN-OUT . . . Metric: GVAR – Number of Global Variables . Other models . . . . . . Metric: DOCU – Documentation Rate . . Metric: OO-D – OO-Degree . . . Metric: SMI – Software Maturity Index . . Metric: NOD – Number Of Developers . . Correlation between metrics . . . . Metrics selected for further investigation . . Size-dependent metrics and additional metrics . 6. Theoretical validation of the selected metrics . . Problem of misinterpretation of metrics . . . Types of scale . . . . . . Types of metrics . . . . . . Conversion of the metrics . . . . . Other desirable properties of the metrics . . Visualization . . . . . . . 7. Tools . . . . . . . . ABAP-tools . . . . . . . Transaction SE28 . . . . . Z_ASSESSMENT . . . . . CheckMan, CodeInspector. . . . AUDITOR . . . . . . Java-tools . . . . . . . Borland Together Developer 2006 for Eclipse . Code Quality Management (CQM) . . CloneAnalyzer . . . . . Tools for dependencies analyze . . . JLin . . . . . . . Free tools: Metrics and JMetrics . . . Framework for GQM-approach . . . . 8. Results . . . . . . . . Overview of the code examples to be analyzed . . Experiments . . . . . . . Admissible values for the metrics . . . Interpretation of the results . . . . Measurement procedure . . . . . 9. Conclusion . . . . . . . 10. Outlook . . . . . . . . References . . . . . . . . Appendix . . . . . . . . 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 54 54 55 55 55 55 56 56 57 59 60 60 61 62 64 67 67 70 70 70 70 70 71 71 71 72 72 72 73 73 74 75 75 76 84 85 85 88 90 92 97 1. Introduction The Product Innovation Lifecycle (PIL) of SAP is divided into five phases with a set of milestones. A brief overview of the PIL can be seen in figure 1.1. Consider the milestone (so called Quality-Gate) “Development to IMS” (D2IMS) in more details. The “Installedbased Maintenance and Support Department” (IMS) gets the next release through such a Quality-Gate with the start of the Main Stream phase and will support it for the rest of its lifecycle. The Quality-Gate is a formal decision to hand over the release responsibility to IMS and is based on a readiness check, which proves the quality of the release by a set of checks. However, this check aims to establish the overall quality and absence of errors and is not intended for determination of how easy is it to maintain the release. For correct planning of resources for the maintenance the IMS needs additional information about those attributes of software that impact maintainability. This information will not influence the decision of the Quality-Gate, but will help IMS developers in the planning of resources and analyzing of the release. Such information can also support the code reviews and allows earlier feedback to the development. This thesis aims at filling this gap by providing of a set of indicators, which describe the release from the viewpoint of its maintainability, and provides instructions how these should be interpreted. The second goal is a set of indicators, which can find the badly maintainable code. Detailed descriptions of the PIL concept can be found at SAPNet Quicklink /pil or in [SAP04]. Figure 1.1: Goals of Quality-Gate “Development to IMS”. [SAP05 p.43] With maintainability is meant the attribute for how easy and rapidly the process of maintaining the software is. The high maintainability means smooth and well 11 structured software, which can be easily maintained. Other definitions of the maintainability like “likelihood of errors” are out of scope of this work. On time of the Quality-Gate D2IMS, the product has already been completely developed and tested. Thus the complete source code is accessible. However the product is only about to “go to market” and no data about customer messages or errors is available. Consequently, only internal static properties of the software can be analyzed at this point of time. One way of approaching this problem is to investigate the dependency between the maintainability of the software and its design, with the goal to find the design properties that can be used as maintainability indicators. Since standard software is usually very large and no human analysis is possible, such finding should be taken by an automated device and must be objective. Thus only objective measures can be used. The subject of this thesis is the complexity of the software, which often leads to the badly maintainable code. Metrics provide a mathematical fashion for a purposeful describing of certain properties of the objective. After comprehension of the maintainability basis and finding the design peculiarities which impact the maintainability, this thesis proposes a way to describe these properties of the design using the metrics. Consequently, several selected metrics should be able to indicate the most important aspects of maintainability, and the overall quality of the software. Moreover it is commonly accepted that a bad code or lack of design is much easier to discover than a good code. Therefore it should not be a big challenge to find code that could cause problems for maintenance. All in all the solution of this task allows: deep understanding of the essence of the maintainability and its factors, estimating the quality of the product from viewpoint of the maintainability, appropriate planning the resources for the maintenance and providing the earlier feedback to the development. A more detailed problem description and the goals of this thesis are presented in chapter 2. This thesis is composed in the following way: chapter 3 gives an overview of the related work. The quality model, which is used to determine the essence of the maintainability, is discussed in chapter 4. Chapter 5 provides short descriptions of the metrics – candidates for extending the quality model. Chapter 6 supplements metric’s descriptions with theoretical validation. Tools, which can be used for the software measurement, are discussed in chapter 7. Experiments and results are discussed in chapter 8. Conclusions are given in chapter 9 and a short outlook in chapter 10 finishes this thesis. 12 2. Research Problem Description Different Methods for Source Code Analysis All methods for the analysis of the source code can be divided into two groups: static and dynamic methods. The static methods work with the source code directly and don’t demand running the product. This quality allows using of the static methods in earlier phases of the lifecycle and is one of the important requirements for this thesis. To the static methods belong metrics and audits. To the dynamic methods belong metrics, tests and Ramp-Up. Dynamic methods can also consider dynamic complexity of the product, for example not only how the connections between modules are organized, but also how often these are actually used. Here and further in this paper with a module is meant any compilable unit: a class in Java and a program, a class, a function module etc. in ABAP. Noteworthy, that the dynamic methods can show different results in different environments. Above all it is important for applications which provide only generic framework and the customer composes its own product using provided functions (for example solutions for the business warehousing). Metrics for dynamic complexity are usually based on an UML specification of project, for instance state-diagrams, and are analyzed using colored Petri-Nets. The next possibility is collecting of statistical information about the running application. Several experts mean, that improving of only few modules (which are most often used) can significantly improve the quality of the entire system. Noteworthy, modules, which are often used from other modules, are more sensible to the changes and should have better quality. All methods except metrics are relatively good investigated at SAP and also supported with the tools. The author believes that the main reasons of program’s maintainability are placed directly in the code and indicators can be extracted from the static code without supplementary dynamic analysis. Three main activities of the maintenance are: analyzing the source code, changing the program and testing the changes. Therefore, most of the time maintainer works also with the static code. Metrics vs. Audits Two main types of the static code analysis are distinguished: metrics and audits. The metrics provide a description of the measured code, which means a homomorphic mapping from empirical objects to real numbers in order to describe their (objects) certain properties. The homomorphism in this case means “a mapping from the empirical relational system to the formal relational system which preserves all relations and structures between the considered objects” [ZUSE98 p.641]. Empirical objects can have different relations in between, for example: one program is larger than another, or one program is more understandable than another one. Of course researcher wants the metric to preserve such relations. Such mapping also means that metrics should be considered only in context of some relation between empirical objects. Common example of the 13 homomorphic mapping is presented in figure 2.1. More to the theoretic framework of the software measurement see [ZUSE98, in particular p.p. 103-130]. An example of metric is LOC (Lines Of Code), this metric preserves relation “Analyzability”, since smaller programs are in common easier to understand than larger programs. Other metric like NOC (Number Of Children in inheritance tree) preserves relation “Changeability”, since class with many sub-classes is more difficult to change, than class with only few or no sub-classes. Empirical Objects Numerical Objects Metric M1 P1 Numerical Relations Empirical Relations P2 PN Metric M2 MN Figure 2.1: Metric - Mapping between Empirical and Numerical objects which preserves all relations According to Zuse’s measurement framework [ZUSE98], the specifying of the metric includes the following steps: Identify attributes for real world entities Identify empirical relations for such attributes Identify numerical relations corresponding to each empirical relation Define mapping from real world entities to numbers Check that numerical relations preserve and are preserved by empirical relations In opposite to the metrics, audits are just verification of adherence to some rules or development standards. Usually audit is a simple calculation of violation of these rules or patterns. SAP uses a wide range of audit-based tools for ABAP: CHECKMAN, Code Inspector, Advanced Syntax Check, SLIN; and Java: JLin, Borland Together. Audits help developers find and fix the code potentially having errors and increase awareness of the quality in development. However audits are bad predictors of the maintainability, because though an application is conformant to the development standards, it could be poorly maintainable. Moreover, the audits give concrete recommendation to developers, but are not able to characterize the quality of the product in general. The second reason for rejecting of the audits is absence of the complexity analysis – main part of the maintainability analysis. Further in this work only metrics will be considered. Approaches for finding of the appropriate metrics are discussed in chapter 4. Research of numerical properties of metrics is discussed in chapter 6. 14 Based on the metric definition, the following scenarios of usage are thinkable: Compare the certain attributes of two or several software systems Formally describe the certain attributes of the software Prediction. If a strong correlation between the metrics is found, the value of one metric can be predicted based on the values of another metric. For example if the relation between some complexity metric (product metric) and the fault probability (process metric) is found, one can predict the probability of a fault in certain module based on its complexity Keep track of evolution of the software. Comparing different versions of the same software allows drawing conclusions about the evolution and trend of the product Classification of the Metrics In [DUMK96 p.p. 4, 8] Dumke considers three improvement (measurement) areas of the software development: software products, software processes and resources, and gives metrics classification of each area. Product metrics describe properties of the product (system) itself and thus depend on internal qualities of the product only. The examples of product metrics are number of lines of code or comments rate. Process metrics describe an interaction process between the product and its environment, the environment can be also the people, who develop or maintain the product. The examples of the process metrics are number of problem closed during the month or mean time to maintain (MTTM). Obviously, the maintainability can be measured directly in the process of maintenance using process metrics like MTTM. However this maintainability assessment should be made before the maintenance begins and these metrics are available. Thus this thesis tries to predict the maintainability in earlier phases of the lifecycle using the product metrics only. Resources metrics describe properties of the environment. The examples of resource metrics are number of developers in a team or amount of available memory on a server. This work is concentrated purely on the software product measurement. However the process metrics also can be used for empirical validation of the product metrics, because these can measure the maintainability directly and prove the prediction, which has been made by the product metrics. Once an appropriate correlation between the product metrics and the process metrics is established, one can talk about empirically validated product metrics. Types of Maintenance There are three main types of maintenance (based on [RUTH]): Corrective – making it right (also called repairs) o To correct residual faults: specification, design, implementation, documentation, or any other types of faults o Time consuming, because each repair must go through the full development lifecycle. 15 o On average, ~20% of the overall maintenance time (however at IMS reaches 60%, and with beginning of the Extended Maintenance even up to 100%) Adaptive – making it different (functional changes) o Responses to changes in environment, in which the product operates o Changed hardware o Changed connecting software, e.g. new database system o Change data, e.g. new phone dialing codes o On average, ~20% of maintenance time (at IMS 30%, with the time 10%) Perfective – making it better o Change software to improve it, usually requested by client o Add functionality o Improve efficiency – for example performance (also called polishing) o Improve maintainability (also called preventative maintenance) o On average, ~60% of the maintenance time (at IMS only 10-20%) For the IMS most important and time consuming is the corrective maintenance. However, this thesis doesn’t distinguish between special types of maintenance, because the general the process is the same for all types of maintenance. Nevertheless, the results of this analysis especially can be used for the planning of the preventative maintenance. Goal of the Work This thesis is going to answer the question: What are the metrics are able to do? In order to wrap this question into more practical task, the following formulation will be used: the set of metric-based indicators for the maintainability of the standard software should be found in order to be able to assess or predict the maintainability, based on the internal qualities of the software in earlier phases of the lifecycle. No singular metric can adequately and exhaustively evaluate the software and too many metrics may lead to the informational overload. Thus a well-chosen subset of about 12 measures should be selected and analyzed. For each metric admissible boundaries and recommendable values should be defined. Moreover, possible interpretations of the results and its meaning for the maintainability should be prepared. Since the measurement is going to be made in the automatic manner, overview of the most suitable tools should be provided and the description of the measurement process should be prepared. Detailed description, implementation hints and examples should be prepared for each metric. In order to fulfill all these requirements, the theoretical and empirical validation of the selected metrics also should be done. The approach must not use additional information sources (except the source code) like: requirements, business scenarios, documentation etc. In current work only metric-based static code analysis is considered. 16 3. Related Works and Projects In this chapter several relevant projects are introduced. This description should give an idea what has been done in this field so far. Besides the selected projects, a wide range of articles and books were written to research single metrics and measurement frameworks. These are not included in this chapter, but mentioned or referenced further in this thesis. Maintainability Index (MI) Hybrid measures are not measured directly from the software, but are a combination of other metrics. Most popular form for the combination is polynomial, nevertheless, there are also other forms. Such combination is used for having one resulting number of the whole evaluation. However this desire brings researches to the problem of hiding information. Hybrid measures show attributes of empirical object incompletely and imperfectly. One attempt to present the maintainability as a single hybrid measure is the Maintainability Index from Welker and Oman [WELK97], which includes some models with various member metrics and coefficients. One of them is the improved, four-metric MI model: MI = 171 – 5,2 ln(Ave-V) – 0,23 Ave-MCC – 16,2 ln(Ave-LOC) + 50 sin(√(2,4 Ave-CR)) where: Ave-V is the average Halstead volume per module, Ave-MCC is the average McCabe Cyclomatic Complexity per module, Ave-LOC is the average number of lines of code per module, Ave-CR is the average per cent of comments per module. Many of the metrics used here are discussed in chapter 5. The research in [WELK97] gives the following indications on the meaning of the MI values: MI < 65: poor maintainability 65 < MI < 85: fair maintainability 85 > MI: excellent maintainability Nevertheless, all used metrics are intra-modular and don’t concern inter-modular dependencies, which highly impact the maintainability, thus Maintainability Index was rejected from the further investigation. However, using this approach led to an interesting observation: MI was messed on two different points of time for the modules of the same system and it was shown, that less maintainable modules became more difficult to maintain, while good maintainable modules kept the good quality with the time. Functions Point Method (FPM) The Functions point method suggests assigning to each module, class, input form etc. certain number of functions points (FP) depending on its complexity. The sum of all 17 points predicts the development or maintenance effort for the whole application. Assumed that developer can daily implement certain number of FP on average, manager can predict number of developers and time needed. FPM is perfectly applicable at early project phases and allows predicting the development and maintenance effort when source code is not yet available. It also suits strong dataoriented concept of SAP applications. Nevertheless, in case of this work source code is already available and it could be difficult to conversely calculate the number of FPs, which were implemented. Especially, it could be difficult in case of the product, which has been bought from outside and no project or design documentation is available. To make matter worse, FP are subjective and don’t suit requested objective model. Also, these measures were rather designed for cost estimations (before source code is available) than for the measurement. Thus in the best way one can collect information from source code directly, not using FPM as additional layer of abstraction. For readers, who are interested in FPM, the following sources are recommendable: [AHN03], [ABRA04b] Key Performance Indicators (KPIs) The goal of this project is definition, measurement and interpretation of basic data and KPIs for the quality of the final product. For assessment of the product quality several (ca. 30) direct indicators were selected, this means that the data is collected immediately in the process of maintenance. Examples of the selected indicators are: Number of messages - Number of customer messages income per quarter Changed coding - Sum of inserted and deleted lines in all notes with category “program error” divided by the total number of lines of coding (counted per quarter) Callrate - Number of weekly incoming messages per number of active installations Defect Density (DD) - Defined as the number of defects (weighted by severity and latency) identified in a product divided by the size and complexity of the product Nevertheless, the earliest possible availability of such indicators is at the end of RampUp phase. Thus in context of current thesis it is only possible to use these KPIs for validating of the developed metrics. For more details about KPIs see [SAP05c]. Maintainability Assessment This project aims assessing of the maintainability of the SAP products shortly before handover to IMS. Thus the goal is nearly the same as with the current thesis. However the assessment chosen in this project is audit-based. Several aspects of the maintainability are inspected and the list with questions is prepared. Expert should analyze the product in manual manner, answer them and fill out a special form. After that the final conclusion about the maintainability can be automatically reported. Main drawbacks of the suggested method are the manual character of the assessment and only one single resulting value, which is difficult to interpret. In this project also some 18 primitive metrics like lines of code and comments rate are suggested and the tool for supporting of these metrics is provided. Abstract Syntax Tree (AST) In this project ABAP code is analyzed and a method for building an abstract syntax tree is suggested. A plug-in for Eclipse is also developed in order to automate this method. The plug-in allows saving the AST into an XML-document and analyzing it. Based on this technique, some metrics for ABAP can be implemented. Another way to use the AST is to find clones – copied fragments of coding. Complexity Based Independent Software Metrics This is a master thesis about Complexity Based Independent Software Metrics (short: COBISOME). The main point of this work is to find an algorithm for converting a set of correlated metrics into a set of independent metrics. Such conversion creates the list of virtual independent (orthogonal) metrics, what allows examining different aspects of the software independently and thus more effectively. Nevertheless, the complicated transformations and aggregation of several metrics to one make the analysis more difficult at the same time. For more details see [SAP05b]. Kaizen Objective of the project Kaizen is to analyze selected SAP code to understand it better and look for ways to continually improve it. Three possible objectives of the code improvement are: Improve readability and general maintainability Reduce cost of service enabling Enable future enhancements in functionality (when well understood) Kaizen will focus on objectives #1 and #2 as applicable to most SAP code. One of the first steps of the project is the analysis of the maintainability metrics. ISO 9126 – Standard Quality Model The ISO 9126 quality model was proposed as an international standard for the software quality measurement in 1992. It is a derivation of the McCall model (see appendix A). This model associates attributes and sub-characteristics of the software to one of the areas (so called characteristics) in hierarchical manner. For the area “Maintainability” the following attributes are arranged: analyzability, changeability, stability, testability and compliance. Although one has these attributes, the measuring of the quality still is not easy. This model is customizable, but not very flexible and in many cases not applicable. Hence this model is not common acceptable and only few tools are based on the ISO model. 19 4. Quality Model – Goals and Questions Goal Question Metric Approach A quality model is a model explaining the quality from certain point of view. An objective of the quality model could be products, processes or projects. Most of the models suggest a decomposition principle, where a more general characteristic is decomposed into several sub-characteristics and further into metrics. Various metric definition approaches (MEDEAs) were developed. Most effective are the hierarchical quality models organized in a top-down fashion: it must be focused, based on goals and models and at the same time provide appropriate detailing. “A bottom-up approach will not work because there are many observable characteristics in software, but which metrics one uses and how one interprets them it is not clear without the appropriate models and goals to define the context” [BASI94 p.2]. Nevertheless bottomup approach is useful by the metrics validation, when the metrics are already selected. The most flexible and very intuitive approach is the Goal Question Metric MEDEA (GQM), which suggests hierarchical top-down model for selecting of the appropriate metrics. This model has at least three levels: Conceptual level (goals): This level presents a measurement goal, which is derived from business goals. In case of this thesis the measurement goal would be “good maintainable software”. However, in order to facilitate formalizing of the top goal, the GQM specifies a template, which includes a purpose, an object, a quality issue, a viewpoint and a context. The formalized goal is given in the next section (see p. 21). Since the top goal is usually very complex, it can be broken down into several sub-goals in order to make easier the interfacing with the underlying levels Operational level (questions): As the goals are presented on the very abstract conceptual level, each goal should be refined into several quantifiable questions, which introduce more operational level and hence are more suitable for the interpretation. Answers to these questions have to determine whether the corresponding goal is being met. “Questions try to characterize the object of measurement with respect to a selected quality issue and to determine its quality from the selected viewpoint” [BASI94 p. 3]. Hence, questions help to understand the essence of the measurement objective and find the most appropriate indicators for it. These indicators could be explicitly formalized within an optional Formal level Quantitative level (metrics): Metrics placed on this level should provide all quantitative information to adequately answer the questions. Hence, metrics are a refinement of the questions into quantitative product measures. The metrics should provide enough sufficient information to answer the questions. The same metric can be used to answer multiple questions 20 Optional Tools level can be included into the model in order to show the tool assignment for the metrics The abstract example of the GQM model is illustrated in figure 4.1. A more detailed description of the GQM and step-by-step procedure of using it are described in [SOLI99]. “GQM is useful because it facilitates identifying not only the precise measures required, but also the reasons why the data are being collected” [PARK96, p. 53]. It is possible to range the impact of metrics on questions using weight coefficients, to make clear, which metric is more important. However, the used in this thesis model doesn’t aim to describe, which weights the metrics have. Author believes that the best way is to give the analyst full freedom in his decision. The analyst can decide dependently on the situation, which indicator is more important. Figure 4.1: The Goal Question Metric approach (abstract example) The measurement process using the GQM-approach includes four main steps: Definition of the top goal and the goal hierarchy Definition of the list of the questions, which explain the goals Selection of the appropriate metric set; theoretical and empirical analysis of each metric; selection of the measurement tools Collecting measurement data and interpreting of the results The first three steps are intended for the definition of the GQM-Quality model, the last step means the actually measurement and interpretation and can be repeated many times. Quality Model According to the GQM goal specification, the major goal for the maintainability purpose is: to assess (purpose) maintainability (quality issue) of standard software (object) from IMS’s viewpoint (viewpoint) in order to manage it and find possible ways to improve it (purpose) in the ABAP and Java environment (context). The question for the major goal could be “How easy is the location and fixing of an error in the software?”, but this question is very vague and can only be answered with the process metrics like MTTM. As it was mentioned before, measuring such process metrics is only possible during the 21 maintenance and thus is inappropriate for purposes of this work. Let’s call such goals external goals, because the degree of the goal achievement also depends on some external motive. The degree of achievement of internal goals depends only on internal properties of the software and hence can be described relatively early in the lifecycle. This major goal is highly complex and it is difficult to create appropriate questions for it, thus complex hierarchy of goals should be used including top goal, goals and subgoals. Moreover, on the bottom of the hierarchy should be placed internal goals only, so that questions will be addressed only to the internal goals. The goal hierarchy is depicted on figure 4.2, where blue boxes present external goals. Such decomposition allows sensible selection of questions and necessary granularity. The full model is presented in appendix B. Figure 4.2: Mapping of external and internal goals The used quality model is based on several validated and acknowledged quality models as: ISO 9126 standard quality model, McCall quality model, software quality characteristics tree from Boehm and Fenton’s decomposition of maintainability. Corresponding parts of these models can be found in appendix A or in [KHAS04]. Several sub-goals and metrics also were taken from [MISR03]. After examination of these quality models, theoretical speculation and research of the literature in this field, the following areas (goals) were recognized as important for maintainability of the software: Maturity Clonicity Analyzability Changeability Testability The goals Maturity and Clonicity are described together with corresponding metrics in chapter 5 (see p.55 and p.26, correspondently). Next, the aspects Analyzability, Changeability and Testability are discussed. The Analyzability is probably the most important factor of the maintainability. Nearly all metrics used in the model are also presented in the Analyzability area. First, author 22 wanted to include also goal Localizing into the model, which would characterize how easy it is to localize (find) the fault in the software. But later it was found, that most of metrics for this goal are already included in Analyzability and Localizing was rejected from the model. The following sub-goals should be fulfilled in order to create the easy comprehensible software: Algorithm Complexity - Keeping easy the internal (algorithm) complexity Selfdescriptiveness - Providing of the appropriate internal documentation (naming conventions and comments) Modularity - Keeping the modules small and appropriate encapsulation of the functionality into the modules (cohesiveness) Structuredness - Proper organization of the modules in the entire structure Consistency - Keeping the development process easy and well organized. There are a lot of researches trying to determine whether the well organized development process leads to good quality of the product. However no evident relation was found. Nevertheless, the maintainer is sometimes confused if he sees that the module was changed many times by different developers. Consistency in this context means clear distribution of tasks between developers. For the changeability (or easiness of making changes in the software) it is important to have proper design of software, which allows the maintenance without side-effects. The quality model includes the goals Structuredness, Modularity and Packaging in this area. Whereas the structuredness has several different aspects: Coupling describes the connectivity between classes Cohesiveness describes the functional unity of a class Inheritance describes the properties of inheritance trees The Testability means easiness of the testing and the maintenance of test-cases. Bruntink in [BRUN04] investigates the testability from the perspective of unit testing and distinguishes between two categories of source code factors: factors that influence the number of test cases required testing the system (let’s call the goal for these factors “Value”), and factors that influence the effort required to develop each individual test case (Let’s call the goal for these factors “Simplicity”). Noteworthy, that with the “Value” the number of the necessary test-cases is meant and not a number of actually available test-cases. Consequently, for the high maintainability is important to keep the “Value” small. Nevertheless, most efforts in the field of the test coverage are concentrated on the low procedure level, for example the percentage of the tested statements within a class. The quality model includes several metrics for the testability validated in [BRUN04]. In the SAP system important part of the complexity is included in the parameters for customization, however the experts argue, that most of the customization complexity is already included in source code, where the parameters are read and processed. The impact of individual metrics on the maintainability is discussed in greater detail in chapter 5. 23 Size-dependent and Quality-dependent Metrics Before the individual metrics can be discussed, one important property of the metric, namely size-dependency, should be introduced. Some metrics are highly sensible of the project size. That means such metrics will show higher values whenever software grows. Such metrics are size-dependent. Other metrics are quality-dependent and measure purely quality independent of size. That means larger software can have smaller values of such metrics. A good example of size-dependent metric could be LOC (Lines Of Code) because it continuously grows with each new statement. The metric Ave-LOC (average number of LOC in module or method) is on the contrary independent of the size and imparts important characteristic of the quality – the modularity. In order to be able to compare software of very different size, usage of the qualitydependent metrics is more preferable. However many size-dependent metrics impart the qualitative attributes of software as well, but they are too sensible of the size and need to be converted before usage in order to reinforce the quality constituent of the metric. Moreover, few size-dependent metrics should be included into the quality model, in order to gain some insight about the size of the considered system. For this purpose the metrics Total-LOC – total lines of code and Total-NOO – number of all objects (modules) are suggested. “Although size and complexity are truly two different aspects of software, traditionally various size metrics have been used to help indicate complexity” [ETZK02, p. 1]. Consequently, the metric that assesses a code complexity of a software component by the use of a size analysis alone will never provide a complete view of complexity. 24 5. Software Quality Metrics Overview In this chapter all metrics, which are supposed to be used in the quality model, will be discussed in more details. Many metrics are complex and difficult to measure directly, thus it is usual to build some abstraction of the system called model and measure the attributes of this model. There are five main models that suit for software product measurement in context of this thesis. Since the properties of a metric depend on the model where the metric is measured in, all metrics are grouped in sets concerning the model they belong to. In the literature two major classes of software measures can be found. They are based on modules and entire software systems and are called respectively intra- and intermodular measures. Metrics, based on lexical model and flow-graph, are intra-modular, metrics, based on inheritance hierarchy model, structure tree and structure chart, are usually inter-modular. Model: Lexical Model This model is intended for intra-modular measures and constitutes plain text in the programming language. It is also possible to partition the text into simple tokens and analyze the frequencies of usage for these tokens. Metric: LOC – Lines Of Code The metric LOC counts the number of lines of code excluding white spaces and the comments which take a whole line. Total LOC in the system reveals quantitative meaning and shows first of all the size of the system. It has no qualitative meaning because both small and huge system could be maintainable or not. In qualitative sense can be used the metric Ave-LOC – average amount of LOC per module or class. This metric shows how good the system is split in parts. It is widely accepted that small modules are in common easier to understand, change and test than bigger ones. However, a system with a large number of small modules has a large number of relations between them and is complex as well. See chapter “Correlation between metrics” for more details. If you really want to compare code written by different people, you might want to adjust for different styles. One simple way of doing it is to only count open braces and semicolons or full stops for ABAP (this works fairly well for ABAP and Java). From this point of the view metric NOS (Number of Statements) is more universal. However, in large systems both metrics are strongly correlated because of a mixture of different programming styles, and have the same empirical and numerical properties. Noteworthy that Java has a more compact syntax. In [WOLL03, p.5] it is shown that a program written in Java has about 1,4 times more functionality than an ABAP program of equal length. This should be considered by estimation of admissible values for LOC. Probably, the LOC is the most important metric because many other metrics correlate with LOC. Therefore the approximate value of other more complicated metrics could be 25 easily estimated by LOC. For example figure 5.14 depicts the correlation between LOC and WMC (Weighted Methods per Class). Metrics: CR – Comments Rate, LC – Lack of Comments It is obvious that comments in code help to understand it. Hence metric CR (Comments Rate) is a good candidate for the analyzability. CR is a ratio of the sum of all kinds of comments (full-line comments, part-line comments, JavaDoc, etc.) to LOC (Lines Of Code). CR is easy to calculate and interpret. However, many comments are created automatically and do not provide any additional information about the functionality. Noteworthy, these comments help to better lay out the source code and make it more readable, but do not help the maintainer in understanding the code. Additionally, a piece of code could be commented out and will be counted as the comment. Through the modern systems for versioning, many developers leave such fragments in the code. In this case CR can reach 70 – 80 %, what is much overstated. The metric which takes into account such “comments” and automatically generated comments, is no more trivial. Therefore CR should be considered critically and the maintainer should understand, that the CR could be overstated. During experiments it was detected that interfaces and abstract classes have very high amount of comments and only few LOC. Hence many interfaces and abstract classes increase overall percentage of comments. Noteworthy that CR is the only the metric in the quality model, which values become better with increasing. All other metrics should be minimized. Thus one new metric is suggested. LC (Lack of Comments) indicates deficiency of CR to the optimal value and is calculated using following approach: LC = 100 – Median-CR. Since CR is a percentage measure, the arithmetic mean must not be used. The difference should be calculated for the aggregated value for the entire system. This substitution will not worse numerical properties of the metric, since CR already has relatively bad numerical properties (see chapter 6). Now all metrics in the quality model should be minimized. Metric: CLON – Clonicity The code cloning or the act of copying the code fragments is a widespread technique for implementation and the acceleration of the development should not be underestimated. But cloning is also a well known problem for the maintenance. Clones increase work and cognitive load for maintainers because of many reasons [RIEG05]: The amount of code, that has to be maintained, is increased When maintaining or enhancing a piece of code, duplication multiplies the work to be done Since usually no record of the duplications exists, one cannot be sure that a defect has been eliminated from the entire system without performing a clone analysis If large pieces of software are copied, parts of the code may be unnecessary in the new context. Lacking a thorough analysis of the code, they may however not be 26 identified as such. It may also be the case that they are not removable without a major refactoring of the code. This may, firstly, result in dead code, which is never executed, and, secondly, such code increases the cognitive load of future maintainers Larger sequences repeated multiple times within a single function make the code unreadable, hiding what is actually different in the mass of code. Code is then also likely to be on different levels of detail, slowing down the process of understanding If all copies are to be enhanced collectively at one point, the necessary enhancements may require varying measures in the cases where copies have evolved differently. As an extreme case, one can imagine that a fix introduced in the original code actually breaks the copy Exact and parameterized clones are distinguished. Finding of exact clones is easier and is language independent. Parameterized clones are more difficult to find, but emphatically more helpful, because clones are often insignificantly changed already by coping. In [RYSS] various techniques for clone finding are classified. These techniques can be roughly classified into three categories: string-based, i.e. the program is divided into a number of strings (typically lines) and these strings are compared against each other to find sequences of duplicated strings token-based, i.e. a lexer tool divides the program into a stream of tokens and then searches for series of similar tokens parse–tree based, i.e., after building a complete parse-tree one performs pattern matching on the tree to search for similar sub–trees. Parse-tree based technique was considered also during ASP project. Choosing of the technique should be made according to the goal of the measurement. The finding of all possible clones for next audits preferably will use the token or parsetree based technique. In context of this thesis it is more interesting to know only approximately number of clones and thus the simple and quick string-based technique can be used. The next important property of string-based technique is language independency, since the ABAP and Java environments are considered. As most important indicator is suggested the metric CLON (Clonicity), which is a ratio of LOC in all detected clones to the Total-LOC. This metric should give an idea about usage of the copy-paste in development process and consequently about redundancy of the final product. Short Introduction into Information Theory and Metric: CDEm – Class Definition Entropy (modified) Methods for describing of complexity There are many methods, which allow describing the complexity of the system. Only few of them are listed below (partially taken from [CART03]): Human observation and (subjective) rating. The weakness of such evaluation is too subjective manner and the required human involvement. 27 Number of parts or distinct elements. Nevertheless, size and complexity are truly two different aspects of software. Despite of this fact traditionally various size metrics have been used to help indicate the complexity. However many such metrics are size-dependent and don’t allow the comparing of systems of different size. It is also not always clear, what should be counted as a distinct part. Number of parameters controlling the system. Here the same comments as by number of parts can be applied. Minimal description in some model/language presents some kind of abstraction. Obviously a system, which has smaller minimal description, is easier, than a system with larger minimal description. In this method a model (a description) includes only relevant information, thus the redundant information, which intensifies size without incrementing complexity, is avoided. Information content (how is defined/measured information?) Minimal generator/constructor (what machines/methods can be used?) Minimum energy/time to construct. Several experts argue that the system, which needs more time to be designed (implemented), is more complex. Obviously, the study of complex systems is going to demand that the analyst uses some kind of statistical method. Next, after short introduction into information theory, entropy-based metrics for supporting of some above mentioned methods are discussed. Information Remark: all following theses are considered in terms of the probability. Consider a process of reading of a random text, whereas it is supposed that alphabet is initially known to the reader. The reading of each next symbol can be seen like an event. The probability of this event depends on the symbol and its place within the text. Examine a measure related to how surprising or unexpected an observation or event is and let’s call this measure information. Thus the information, which is gotten from each new symbol, is in this context an amount of new knowledge, which reader gets from this symbol. It is obvious that information inversely related to the probability of the event: if the probability of occurrence of “i” is small, reader would be quite surprised if the outcome actually was “i”. Conversely, if probability of certain symbol is high (for example the probability of occurrence of the symbol “i” after “t” in the word “information” tend to 1), reader will not get much information form this symbol. Let’s describe the information measure more scientifically. For that Shannon proposed four axioms: Information is a non-negative quantity: I(p) >= 0 If two independent events occur (whose joint probability is the product of their individual probabilities), then the information reader gets from observing the events is the sum of the two informations: I(p1*p2) = I(p1)+I(p2) I(p) is continuous and monotonic function of the probability (slight changes in probability should result in slight changes in information) If an event has probability 1, reader gets no information from the occurrence of the event: I(1) = 0 28 Deriving from these axioms one can get the definition of information in terms of probability: I(p) = −log2(p). More detailed description of this derivation can be found in [CART03] or [FELD02]. Index 2 means binary character of the events. In this case units for the information are bits. However other indexes are also possible. Entropy Each symbol in the text brings different amount of information. Interesting would be the average amount of information within the text. For this propose term entropy is introduced. After simple transformation the following expression for entropy can be derived: Noteworthy, that H(P) is not a function of X. It is a function of the probability distribution of the random variable X. Entropy has the following important property: 0<=H(P)<=log(n). H(P) = 0, when exactly one of the probabilities is one and all the rest are zero (only one symbol is possible). H(P) = log(n) only when all of the events have the same probability 1/n. It is surely wanted to maximize H by a uniform distribution. Everything is equally likely to occur - you can't get much more uncertain than that. Since maximal possible entropy is known, the normalized Entropy can be introduced: It is important, since entropy is project size dependent. Remarkable, that entropy logarithmic depends on size: doubling of the size increments maximal entropy by one point. Next some possible interpretations of entropy are listed: Entropy of a probability distribution is just the expected value of the information of the distribution Entropy is also related to how difficult it is to guess the value of a random variable X [FELD02, p.p. 5-7]. One can show that H(X) <= Average # of Yes-No Questions to Determine X <= H(X) + 1 Entropy indicates the best possible compression for the distribution – average number of bits needed to store the value of the random variable X. Noteworthy that entropy suggests only theoretical basis, some practical algorithm should be used for the actually coding (for example Huffman codes). Next, some applications of entropy for the software measurement are discussed. Average Information Content Classification In [ETZK02, p 295] the work of Harrison is mentioned. Harrison and other scientists proposed to extend Halstead's number of operators and measure distribution of different operators and operands within a program. It should allow assessing the analyzability of one single chunk of code. However such method is not very useful since the main complexity is contained within user-defined strings. Remarkable, that syntactical rules in languages decrease entropy. For example it is not possible to have 2 operands without an operator in between and the compiler takes care for it. Hence the probabilities of the occurrence of the operands in the syntactical correct 29 programming text depend on the syntactical rules. Consequently, the entropy of the syntactical correct program will never reach maximum and normalizing with respect to the syntactical rules becomes much more difficult. Metric: CDEm - Class Definition Entropy (modified) This metric reduces alphabet for text to the user-defined strings used in a class, because these contain the most part of the complexity. Examples of the user-defined strings are: Names of classes, attributes and methods Package and class names within an import section Types of public attributes and types of return values for methods Types of parameters Method calls, etc By such restriction one can get another level of granularity. Let’s illustrate this metric by an example. Consider a maintainer seeking throw the source code. How surprised would be the maintainer, if he sees a reference to other class? Suppose that: Maintainers work easily if they confront with the same object again and again Maintainers work difficult if they often should analyze new unknown objects Consider two programs presented in figure 5.1. Assume that the maintainer has to fix two faults in the modules B and C. Both modules the use functionality provided by the modules A and E. In the first program the module A plays a role of an interface and the maintainer can work easy, because he has to keep in mind only one pattern for collaboration. In the second program the modules B and C have references to different modules, thus such model is more multifarious and more difficult to comprehend. Figure 5.2 show different patterns for frequency of occurrence of module names in an abstract program. The frequent used modules play a role of interface for its contained package. Entropy of frequency distribution is an indicator for evidence of interfaces. Positively P1 has less value for entropy than P2. Noteworthy that other metrics will show that P2 is much easier to comprehend: less coupling between modules – less complexity. Consequently high entropy for distribution of the user-defined strings will indicate the difficult comprehensible text. Different variances of this metric have been proposed. A very simple and intuitive variant is an analysis of the import-section only and calculation of the distribution of occurrences of class names in the import-sections. The classes, which occur most often in the import-sections, are also often used from outside of the package, where they are defined, and thus are the interface of this package. Clear (small) package interfaces are indicator of the good design. This metric is called CDEm – Class Definition Entropy (modified). Reader interested in other implementations of this kind of metric are referred to [ETZK99], [ETZK02] and [Yi04]. Incidentally some entropy-based metrics use also semantic analysis to improve its pronouncement. 30 P1 P2 H(P1) = 2,37 F F E E A B H(P2) = 2,5 C D A B C D Figure 5.1: The uniform (left) and the multifarious pattern for communication between modules Frequency diagram for user-defined strings Frequency / Probability of occurrence Frequency / Probability of occurrence ABCDEF . . . Z Name of module ABCDEF . . . Z Name of module The system with pronounced interfaces The system with ulterior interfaces Figure 5.2: The evidence of classes, which play the role of interface for the packages For the calculation of CDEm two programs were developed: Class Entropy.java prepares the list of all classes in the project and after that seeks the source code in order to find references on classes from this list. Next, the list is filled up with data about frequencies and entropy is calculated. Class EntropyImport.java doesn’t have a list of classes to be found, this tool seeks source files and calculates entropy of import-clauses. In this case list of user-defined strings (import clauses) is prepared dynamically. It is argued that both tools measure the same aspect, since the results of both tools are correlated. Since entropy based on analysis of import section is easier, it will be used for the further research. As very initial indicator of entropy the coefficient of compression (for example of ZIP archive) can be taken. The author believes that ZIP practically implements algorithm, which with its ZIP-coefficient of compression (ZCC) tends to the best possible 31 compression defined by means of entropy. However, ZIP works with symbols, while CDEm works with tokens. ZCC = size of the ZIP-archive / size of the project before the compression. Thus, high ZCC indicates high complexity of the project, low ZCC indicates simple project with high redundancy. High CDEm indicates complex design, low CDEm indicated simple design. The next simple experiment tries to find out whether these two metrics are correlated and proves whether there is a correlation between CDEm and ZIPcoefficient. Figure 5.3 shows dependency between ZIP-compression coefficient and import-based CDEm. Input for this experiment were examples of code described in chapter 8. CDEm (%) Correlation between ZCC and CDEm 89,0 0,29; 88,6 88,0 87,0 0,30; 87,1 0,28; 85,9 86,0 85,0 84,0 0,25; 83,3 0,32; 84,2 0,21; 83,0 83,0 82,0 0,17 0,18; 82,4 0,20 0,21; 82,5 0,23 0,26 0,29 ZCC 0,32 Figure 5.3: ZCC and CDEm do not have evident dependence During this experiment 4 pairs of projects were analyzed, whereas each pair presents two versions of the same project – old and new. Each newer project is supposed to have better values than older one. In figure 5.3 arrows connecting two measurement points indicate trend of values within one project. Since directions of the arrows are quite different one can say about absence of any connection between these metrics. Overview of all considered projects is also shown in table 5.1. Trend of metrics (improvement - or degradation - ) is shown using arrows. According to expert’s opinion all newer version should show improvement, however the metrics often show opposite results. Nevertheless, ZCC measures not pure entropy of the code, but also entropy of comments, most of which are generated or consist of the same predicates. Consequently, ZCC shows lower values that the entropy actually is. More accurate experiment should exclude comments before compression. Besides, high CLON can cause low ZIP-coefficient values as well. 32 Table 5.1: Dependence between ZCC and CDEm Metric Size on disk Size of ZIP archive ZCC CDEm (%) ObjMgr ObjMgr SLDClient SLDClient JLin 630 JLin dev old new old new old new 788202 833213 1916940 1454570 1896690 1822725 144018 0,18 82,4 209549 0,25 83,3 409760 0,21 82,5 300455 0,21 83,0 522499 0,28 85,9 590507 0,32 84,2 Mobile Client 7.0 722159 Mobile Client 7.1 1725178 216222 0,30 87,1 503508 0,29 88,6 The analysis of the examples also shows that many developers use “*” in import sections. Such inaccurate definition leads to inexact CDEm calculation. Hence the contradiction between ZCC and CDEm is most probably caused by not proper computation of the metrics. The author argues that more accurate experiments should be made in order to ascertain ability of these metrics to predict the maintainability. Noteworthy, that some peculiar properties of the software design can influence CDEm. For example the project Ant (www.apache.org) has very low value for CDEm, because almost every class uses the following classes: BuildException, Project, Task and some others. Such distribution of the user-defined strings leads to the underestimation of the entropy. Complexity of the development process Interesting approach is suggested in [HASS03] by Hassan. He argues, that if developers should intensively modify many files at the same time, it increases cognitive loading and can also cause problem for managers. Such strategy can also lead to the bad quality of the product. As measurement for the chaos of software development was suggested an entropy-based process metric. Input data for the measurement is a history of code development. Time is divided in periods and for each period is calculated the frequency of changes for each source file (see illustration in figure 5.4). Through main property of the entropy it will maximize for uniform distribution. Thus high entropy for the distribution of the source code changes indicates situation, when many files are changed very actively. Low entropy shows normal development process, when only few files are edited actively and the rest is kept untouched or is changed very insignificantly. Evolution of entropy during the development process is illustrated in figure 5.5. Hence high entropy can warn the project manager about insufficient organization of the development process. The next entropy-based metric is discussed in sub-chapter “Metric: m – Structure Entropy” after introduction of an appropriate model. As short conclusion to the usage of the information theory in the software measurement one can say: it is powerful non-counting method for describing of the semantic properties of the software, but before one can use it, more experiments with exact and perceptive tools should be made. 33 Figure 5.4: The Entropy of a Period of Development [HASS03, p. 3] Figure 5.5: The Evolution of the Entropy of Development [HASS03, p.4] 34 Model: Flow-graph The flow-graph model represents intra-modular control complexity in form of a graph. The flow-graph consists of edges and nodes, whereas nodes represent operators and edges represent possible control steps between the operators. A flow-chart is a kind of flow-graph, where decision nodes are marked out with a different symbol. Figure 5.6 provides an example of both notations. A region is an area within a graph that is completely bounded by nodes and edges. Nodes Nodes 11 Edges Edges 11 2,3 2,3 22 66 33 44 66 88 77 77 55 R1 R1 R2 R2 4,5 4,5 88 R4 R4 99 10 10 99 10 10 11 11 R3 R3 Regions Regions 11 11 Flow Flow FlowChart Chart FlowGraph Graph Figure 5.6: Example of Flow-graph and corresponding flow-chart [ALTU06, p. 15] Metric: MCC – McCabe Cyclomatic Complexity The cyclomatic number from graph theory presents the number of regions in the graph or number of linearly independent paths in the graph. Initially McCabe suggested using cyclomatic number to assess the number of test cases needed to sufficient testing of the module and called this metric MCC (McCabe Cyclomatic Complexity). Since all independent paths through the module should be tested independently, it is recommendable to have at least one test case for each path within the module. Thus MCC presents minimal number of test cases for the sufficient test coverage. However, later this metric was suggested for assessment of comprehension complexity and now MCC is also used as recommendation for the modularity in development process. Empirical researches have showed that probability of the fault increases in modules with MCC>10. Thus it is recommendable to split modules with MCC>10 into several modules. Many experts argue, that a lot of decision operators (IF, FOR, CASE, etc.) increases the algorithm complexity of the module. It is obvious that the program, where all operations are made sequentially, is easy independent of its size. Consequently, MCC is included in the quality model in the areas Analyzability and Testability. 35 One possible way to calculate MCC is: MCC = E - N + 2, where E = number of edges and N = number of nodes. It has also been shown that for a program with binary decisions only (all nodes have out-degree <= 2), MCC = P + 1, where P is number of predicated (decision) nodes (operators: if, case, while, for, do, etc.). Usage of MCC in the object-oriented environment This intra-modular metric can be used both in procedural and in object-oriented context. However, the usage of this popular metric in the object-oriented context has some peculiarities. Usual object-oriented programs show understated values of MCC, because up to 90% of methods could have MCC = 1. In [SER05] is hypothesized that part of the complexity is hidden behind object-oriented mechanisms such as inheritance, polymorphism or overloading. These mechanisms are in fact the hidden decision nodes. A good illustration of this phenomenon applied to overloading could be the following example: Listing 5.1: Illustration of the hidden decision node in case of the overloading class A{ method1(int arg){}; method1(String arg){} }; … public A a; a.method1(b); Hidden in the last statement decision node could be represented in procedural way by proving the type of the argument and calling the corresponding method. Additional decision nodes for polymorphism and inheritance could be presented similar. The hypothesis is: the less OO mechanisms are used, the more complex methods should be. Experiment described in [SER05] tried to find inverse correlation between inheritance factor (Depth in Inheritance Tree) and MCC, but didn’t show significant results. Nevertheless polymorphism or overloading could be a better factor to correlate with. Additional experiments are needed. Since MCC can be calculated only for the single chunk of code, in OO-environment one further metric is introduced in order to aggregate values and present metric for entire class, see metric WMC (Weighted Methods per Class) for more details. Model: Inheritance Hierarchy For the next group of metrics different models of an inheritance hierarchy can be used. Empirical objects for all these models are classes and interfaces, which are connected into hierarchies using “extends” or “implements” relation: In the Simple Inheritance Hierarchy nodes present classes and edges present inheritance connections between classes, whereas only simple inheritance is possible The Extended Inheritance Hierarchy can have more than one root and allows multiple inheritances. Such edges present “implements”- connections between 36 the interface and the class, which implements this interface. The Extended Inheritance Hierarchy is a directed acyclic graph with no loops The Advanced Inheritance Hierarchy supplements the Extended Inheritance hierarchy by adding attributes and methods for each class Because in ABAP and Java interfaces are widely used and they have great impact on the analyzability and changeability, the Simple Inheritance Hierarchy was rejected. On the other side the very detailed level of granularity, provided by the Advanced Inheritance Hierarchy, is not very useful in context of the maintainability. Consequently, the Extended Inheritance Hierarchy is chosen as most appropriate basis for the maintainability metrics. For this model the following metrics were proposed: Chidamber and Kemerer proposed the Depth of Inheritance Tree (DIT) metric, which is the length of the longest path from a class to the root in the inheritance hierarchy [CHID93 p.p. 14-18] and the Number of Children (NOC) metric, which is the number of classes, that directly inherit from a given class [CHID93 p.p. 18-20]. Later, Li suggested two substitution metrics: the Number of Ancestor Classes (NAC) metric to measure how many classes may potentially affect the design of the class because of inheritance and Number of Descendent Classes (NDC) metric to measure how many descendent classes the class may affect because of inheritance. These two metrics are good candidates for the quality model in the areas Analyzability and Changeability respectively and will be discussed in more details. Metric: NAC – Number of Ancestor Classes This metric indicates the analyzability from the viewpoint of inheritance and in general holds: the deeper a class is placed in the hierarchy, the more ancestors has the class and the more additional classes have to be analyzed and understood by the developer in order to understand the given class. It is also can be shown that a class with high NAC implements more complicated behavior. Several guides recommend avoiding classes with DIT more than 3 or NAC more than 6. Metric: NDC - Number of Descendent Classes This metric shows the changeability of the class and means how many classes could be affected by changing a given class by the developer. Noteworthy that experiments with Chidamber and Kemerer metrics set [BASI95] show that larger NOCs correlated with smaller defect probability. It can be explained by the fact that classes with many subclasses are the subject of much testing and most of error are found by the implementation of subclasses. “Inheritance introduces significant tight coupling between super classes and their subclasses” [ROSE, p.5]. Thus importance of NAC and NDC is high. Geometry of Inheritance Hierarchy Above the metrics NAC and NDC were introduced and it was shown that they are good descriptors of a single class. In this subsection author tries to use these metrics to describe the entire inheritance hierarchy. Let’s try to classify inheritance hierarchies into subtypes based on geometrical properties. The most important geometric characteristics are the width and weight 37 distribution. The width is a ratio of super-classes to total number of classes. An indicator of width is U metric (Reuse factor), where U = super-classes / classes = (CLS - LEAFS) / CLS. A super-class is a class that is not a leaf class. U measures reuse via inheritance. The high U indicates a deep class hierarchy with high reuse. The reuse ratio varies in the range 0 <= U < 1. The weight distribution means tendency to where main functionality is implemented. However there is no appropriate metric for the weight distribution. The best way to indicate weight distribution is a histogram, where vertical axis represents DIT and horizontal axis represents the number of classes, number of methods or sum of WMC. Figure 5.8 depicts an example of top-heavy hierarchy, as functionality indicator the metric WMC is selected. Different designs of inheritance hierarchy are presented in figure 5.7. The next experiment tries to estimate best geometry for inheritance hierarchy from viewpoint of the maintainability using metrics NAC and NDC. Figure 5.7: Types of Inheritance Hierarchies. First of all, values of metrics are calculated for each class and then aggregated using arithmetic mean. Let’s try to estimate the analyzability and changeability for each type of hierarchy based on the average values. The comments can be also seen in the figure 5.7. 38 DIT Distribution of Average WMC in Levels of Inheritance Tree 6 15,12 5 21,21 4 27,11 3 69,12 2 75,43 1 44,53 0 10 20 30 40 50 60 70 Ave-WMC Figure 5.8: Weight distribution Top-heavy hierarchies maybe not take advantage of reuse potential. Ultimately here the design is discussed from viewpoint of the maintainability. Top-heavy means, that the classes with main functionality are placed near the root, hence such a hierarchy should be easy to understand, because the classes have small number of ancestors. However, if classes have large number of descendents they are difficult to change. Bottom-heavy hierarchy is easy to change, because many classes have no children. Narrow bottom-heavy designs are difficult to understand because of many unnecessary levels of abstraction. Nevertheless this consideration has several problems: Though the metrics NAC and NDC seem to be comprehensive, mean values of this metrics are fungible and yield same numerical values. Ave-NAC can be calculated as the number of descendent-ancestor relations divided by the number of classes. Ave-NDC can be calculated as the number of ancestordescendent relations divided by the number of classes. Because each descendentancestor relation is the reversed ancestor-descendent relation, Ave-NAC = AveNDC. The numbers in figure 5.7 confirm it. Noteworthy, that the metrics DIT and NOC have the same property, applied to simple inheritance hierarchy: Ave-DIT = Ave-NOC. Therefore the aggregated values for these metrics are redundant In some cases the metrics NAC and NDC cannot distinct between different types of hierarchies, in the given example top-heavy narrow hierarchy has approximately equal values as a bottom-heavy wide hierarchy with the same number of classes. To distinct different designs additional metric is needed In common it is not possible to assess the maintainability based on the geometrical properties of the hierarchy, because it is more important to know how the inheritance is used Some experts mean, that the inheritance hierarchy should be optimized, for example using balanced trees. However the theory of balanced trees is intended for the search or change operation, such a tree will be always wide and bottom-heavy, because 50% of the nodes are leafs. Thus such optimization is misleading for the maintainability’s goals. 39 Many experts agree that the inheritance is a very important attribute of the software, which has impact also on the maintainability. However, there are different points of view, some experts recommend using a deep hierarchy, others prefer a wide hierarchy. Nevertheless author does not see any possibility to assess the entire inheritance hierarchy from maintainability’s point of view using the metrics. Consequently, the suggestion is to use metrics NAC and NDC for finding of classes, which could be difficult to maintain because of erroneous usage of the inheritance. A simple example of the audit is the report, which includes all classes with more than 3 super-classes or more than 10 sub-classes. Metric: IF – Inheritance Factor The metric IF (Inheritance Factor) shows the percentage of classes that belong to any inheritance hierarchy. A stand-alone class doesn’t belong to any inheritance hierarchy and thus doesn’t have any ascendant or descendant classes. Localizing of the faults in the large stand-alone classes is difficult, because such class implements accomplished functionality and is large. Classes, which belong to an inheritance hierarchy, provide only fractional functionality and thus it is relative easy to find, which part has caused fault, irrespective of the size. Additionally, classes within an inheritance tree can be maintained using inheritance concept and so new functionality can be added with preserving of the old functionality. Model: Structure Tree This model is presented by a directed graph composed of two different types of nodes: leaf nodes and interior nodes; and two different types of edges: structural edges and dependency edges. A leaf node corresponds to a function module, global variables, program or form for ABAP; method or public attributes for Java. An interior node corresponds to either: an aggregate of the leaf nodes (function pool, program or class) an aggregate of other interior nodes (directory or package) Structural edges, attached to interior and leaf nodes, create a tree that corresponds to the package and file structure of the source. Note that a structural edge may connect two interior nodes, or an interior node with a leaf node, but may never connect two leaf nodes. Figure 5.9 (see p. 44) shows an example of the structure tree. In this example the system has a package A, which has two classes (B and C). Points marked with small letters (leafs) are methods or attributes. Doted edges between leafs are dependency edges and represent calls. Metric: CBO - Coupling Between Objects The coupling is a quality, which characterizes the number and strength of connections between modules. In the scope of maintainability, the software elements A and B are interdependent if: Some change to A requires a change to B to maintain correctness, or 40 Some change elsewhere requires both A and B to be changed Obviously, the first case is much easier to find. In general, objects can be coupled in many different ways. The next list presents several important types of coupling, resulted from theoretical speculations of the author and partially taken from [ZUSE98]: By content coupling one module directly references the code of another module. This type of coupling is very strong because almost any change in the referred module will affect the referring module. In Java such type of coupling is impossible, in ABAP is implemented through the INCLUDE directive By common coupling two modules share a global data structure. In ABAP is most commonly used in the DATA DICTIONARY. Such coupling is not very dangerous, because data structures are changed very seldomly By external coupling two modules share a global variable. This coupling deserves attention, because excessive usage of global variables can lead to maintenance problems. To handle with the external coupling a metric GVAR (number of global variables) is suggested. However this metric is duplicated in metrics FANIN and FAN-OUT and thus rejected from further investigation (see p. 54) Data coupling is the most commonly used and unavoidable. In his work Yourdon stated that any program can be written using only data coupling [ZUSE98, p. 524]. Two modules are data coupled if one calls the other. In objectoriented environment there are even more possibilities to use the data coupling: o Class A has a method with local variable of type B o Class A has a method with return type B o Class A has a method with argument of type B o Class A has an attribute of type B There are several metrics for data coupling: FAN-IN, FAN-OUT for procedural and RFC (Response For a Class), CBO for object-oriented environment. For the metrics FAN-IN, FAN-OUT see section “Structure Chart” (p. 54) Inheritance coupling appears in an inheritance hierarchy between classes or interfaces. The metrics for this type of coupling have been discussed in the previous section Structural coupling appears between all units, which are combined together in a container. For example all methods within a class are structural coupled into the class; all classes within a package are coupled into the package. In order to qualify such coupling the term cohesion is introduced in one of the next sections (p. 44) Logical coupling is unusual coupling, because modules are not coupled physically, but changing of one will cause changing of another. Since there is no representation of such coupling in the source code, the logical coupling is very difficult to find out. Reader interested in this type of coupling can find reference to the research of logical coupling at the end of chapter 10 (p. 91) Indirect coupling. If class A has direct references to A1, A2, …, An, then class A has indirect references to those classes directly and indirectly referenced by A1, A2, …, An. In this thesis (except inheritance) only direct coupling is considered 41 Content, common, logical and indirect coupling are not considered in this thesis, structural coupling in form of cohesion of methods is discussed in one of the next sections. The metrics for inheritance coupling have been discussed in the previous section. Coupling Between Objects (CBO) is the most important metric for data coupling in the object-oriented paradigm. “CBO for a class is a count of the number of other classes to which it is coupled” [CHID93, p. 20]. However it would be more precise to call this metric Coupling Between Classes, because at the time of the measurement are no objects created yet. “In order to improve modularity and promote encapsulation, inter-object class couples should be kept to a minimum. The larger the number of couples, the higher the sensitivity to changes in other parts of the design, and therefore maintenance is more difficult” [CHID93, p. 20]. Also the class with many relations to other classes is difficult to test. Hence CBO impacts the Changeability and the Testability. Nevertheless, CBO can indicate the Analyzability as well, but the RFC metric indicates it more precisely. Metric: RFC - Response For a Class The response of a class is a number of methods that can potentially be executed in response to a message received by an object of that class and can be expressed as number of public methods of the class and sum of number of methods called by methods of given class. Example of calculation of the RFC is shown in listing 5.3. For more details see [CHID93, p.p. 22-24]. However, some implementation of the RFC (for example Borland Together) counts private methods as well. The class can implicit call the methods of its ancestors, for example in constructor, but in this case the constructor will not be called from outside, thus only explicit calls of foreign methods will be counted in this metric. This metric shows the analyzability of the class: The class with more methods is more difficult to understand than the class with fewer methods The method, which calls many other methods, is more difficult to understand, than a method calling fewer foreign methods In Java it is possible to use enclosed method calls, the example is showed in the following listing: Listing 5.2: Example of enclosed method calls a = b.getFactory().getBrige().getWidth(c.getXYZ, 15); Such calls embarrass the understanding of the program repeatedly and should be counted as separate method calls. Noteworthy in ABAP this is not possible. 42 Listing 5.3: Example of calculation of the metric RFC public class RFCexample { public ClassC c = new ClassC(); // constructor is not counted: RFC = 0 public int meth1() {// RFC = 1 int temp = 0; temp += c.D(); // RFC = 2 temp += c.D(); // duplicate call: RFC = 2 return c.getClassB(). // RFC = 3 D() + // RFC = 4 meth2(); // RFC = 5 }; private int meth2() {// private methods are counted: RFC = 6 return c.D(); // duplicate calls, which appear // in different methods are counted: RFC = 7 } } “If a large number of methods can be invoked in response to a message, the testing and debugging of the class becomes more complicated since it requires a greater level of understanding required on the part of the tester” [CHID93 p.22]. From the definition of the RFC is clear that it consists of two parts: number of methods within a class and number of calls of other methods. Hence, RFC correlates with NOM and FAN-OUT, this has been shown in [BRUN04, p.9]. RFC is an OO-metric and corresponds to FAN-OUT in the procedural context. Metric: m – Structure Entropy An interesting metric was proposed by Hewlett-Packard Laboratories in [SNID01]. A simplified version follows. The main question discussed in here is “how can you measure the degree of conformance of a large software system to the principles of maximal cohesion and minimal coupling?” The input to the model is the source code of the system to be measured. The output is the numeric measure of the degree of conformance Before the model is created the following assumptions are supposed: Since engineers work with source code when modifying a system, it is interesting to analyze the structure of the application at the lexical level It is more interesting to analyze the global relationships than local ones The more dependencies a module has to other parts of the system, the harder it is to modify “Remote” dependencies are more expensive (in terms of comprehensibility) than “local” dependencies (restatement of cohesion and coupling principle) An example of the used model is depicted in figure 5.9. Here some calls are short (within one class), others are middle (between classes) or long (between packages). In 43 agreement with assumptions, the system with many short calls and only few long has a good design. Let’s find an optimal method for describing of character of the calls. Initially each dotted edge can be described by a pair of numbers: start leaf and end leaf. Therefore, each leaf needs log2F bits, where F equals the number of leafs. For the description of each call 2*log2F bits are needed. However it is possible to reduce number of bits, by indicating a relative path for the end leaf. Hence short calls need shorter description and long calls – longer. If one describes all calls of the system in such way, and calculate average number of bits needed for each call one can gather about design of the system. Higher number of bits needed for description of average relation indicates poor design. System I System II Packages AA BB aa Classes C C bb cc dd AA ee Methods and attributes BB aa C C bb cc dd ee Figure 5.9: Example of structure tree Nevertheless the analyst doesn’t need to actually code all these calls. The information theory says that one can easily estimate average number of needed bits based on the entropy. As probability basis for entropy one can use the frequencies of length of call. To make matter worse, long calls can be additionally penalized by coefficients. For entropy background see section “Introduction into Information Theory”. More detailed description of this metric can be found in [SNID01, p.p. 7-9]. Here just one simple example is given in order to illustrate ability of this metric. Consider two small systems, depicted in figure 5.9. Both systems have equal number of classes, methods (F=5) and calls (E=4). However most of the calls in the first system are long, this disadvantage was fixed in the second system by better encapsulation: method c provide an interface for attributes d and e in its class. Thus it is supposed, that the second system is more maintainable, because of more easier and structured design. According to formulas given in [SNID01, p.p. 7-9], the structure entropies of the given systems are: m(I) = - (3/5*log2(3/5) + 1/5*log2(1/5) + 0 + 1/5*log2(1/5)) + 4/5 (¼*log2(5* 8/20) + ¾*log2(5* 12/20)) ≈ 2,52 m(II) = - (2/5*log2(2/5) + 2/5*log2(2/5) + 1/5*log2(1/5)) + 4/5 (¾*log2(5* 8/20) + ¼*log2(5* 12/20)) ≈ 2,44 44 Hence, the second system needs fewer bits for its description and has fewer long calls. Consequently metric m (Structural Entropy) can indicate the tendency of the system to have short or long calls. Metric: LCOM - Lack of Cohesion Of Methods The cohesion is one of the structural properties of a class. The cohesion is the degree, to which the elements in a class are logically related. Most often it is estimated by the degree of similarity of functions provided by methods. With respect to object-oriented design, a class has to consist only of methods and attributes, which have common functionality. If the class can be split in parts without the breaking the intra-modular calls, as it shown in figure 5.10, the class is supposed to be not cohesive. Figure 5.10 Non-cohesive class can be divided in parts. However, “coupling and cohesion are also interesting because they have been applied to procedural programming languages as well as OO languages” [DARC05, p.28]. In case of the procedural paradigm, the procedures and functions of the module should implement a single logical function. Concern an example with a functions pool in ABAP. It is an analogue of a class, it has internal global data (attributes) and functions (methods). Noteworthy that by a call of one function from the pool the entire function group is loaded into the memory. Consequently, if you create new function modules, you should deliberate how they will be organized in the function groups. In one function group you should combine only function modules, which use common components of this function groups, so that the loading into the memory is not useless (translation from [KELL01, p. 256]). Finally the low cohesion can indicate potential performance problems. For the maintenance, the low cohesion means that the maintainer has to understand the additional not related to the main part code, which may be badly structured. This fact has an impact on the analyzability. Additionally, a low cohesive component, which implements several different functionalities, will be more affected by the maintenance, because the changing of one logical part of the component can destroy other parts. “Components with low cohesion are modified more often since they implement multiple functions. Such components are also more difficult to modify, because a modification of one functionality may affect other functionalities. Thus, low cohesion implies lower maintainability. In contrast, components with high cohesion are modified less often and are also easier to modify. Thus, high cohesion implies higher maintainability” [NAND99]. This fact has impact on the changeability. 45 High cohesion indicates good class subdivision. The cohesion degree of a component is high, if it implements a single logical function. Objects with high cohesiveness cannot be split apart. Lack of cohesion or low cohesion increases complexity, thereby increasing effort to comprehend unnecessary parts of component. Classes with low cohesion could probably be subdivided into two or more subclasses with increased cohesion. It is widely recognized that highly cohesive components tend to have high maintainability and reusability. The cohesion of a component allows the measurement of its structure quality. “There are at least two different ways of measuring cohesion: 1. Calculate for each attribute in a class what percentage of the methods use that attribute. Average the percentages then subtract from 100%. Lower percentages mean greater cohesion of data and methods in the class. 2. Methods are more similar if they operate on the same attributes. Count the number of disjoint sets produced from the intersection of the sets of attributes used by the methods” [ROSE, p.4]. In [BADR03] most used metrics for cohesion are shortly described, see brief definitions in table 5.2. Metrics for cohesion are not applicable for classes and interfaces with: no attributes one or no methods only attributes with get and set methods for these (data-container classes) abstract classes numerous attributes for describing internal states, together with an equally large number of methods for individually manipulating these attributes multiple methods that share no variables but perform related functionality. Such situation can appear because of usage of several patterns Classes, where calculation of cohesion is not possible, are accepted as cohesive. To overcome these limitations the following various implementations of the LCOM metric are possible: Regarding inherited attributes and/or methods in the calculation or not Regarding constructor in the calculation or not Regarding only public method or all methods in the calculation Regarding get and set methods or not These implementations are independent of which definition is used. According to the recommendation from [ETZK97], [LAKS99] and [KABA] and theoretical speculation the following options were selected: Inherited attributes and methods are excluded from calculation Constructors are excluded from calculation Get and set methods are excluded from calculation Methods with all types of visibility are included into calculation 46 It is also possible to find and remove all data-container classes from research of cohesion. It can be easily made by an additional metric NOM (Number Of Methods). In case of the data-container class NOM=WMC. Table 5.2: The major existing cohesion metrics [BADR, p. 2] Metric Description LCOM1 The number of pairs of methods in a class using no attribute in common. LCOM2 Let P be the pairs of methods without shared instance variables, and Q be the pairs of methods with shared instance variables. Then LCOM2 = |P| |Q|, if |P| > |Q|. If this difference is negative, LCOM2 is set to zero. LCOM3 The Li and Henry definition of LCOM. Consider an undirected graph G, where the vertices are the methods of a class, and there is an edge between two vertices if the corresponding methods share at least one instance variable. Then LCOM3 = |connected components of G| LCOM4 Like LCOM3, where graph G additionally has an edge between vertices representing methods Mi and Mj, if Mi invokes Mj or vice versa. Co Connectivity. Let V be the vertices of graph G from LCOM4, and E its LCOM5 edges. Then Consider a set of methods {Mi} (i = 1, … , m) accessing a set of instance variables {Aj} (j = 1, …, a). Let µ (Aj) be the number of methods that reference Aj. Then Coh TCC LCC DCD DCI Cohesiveness is a variation on LCOM5. Tight Class Cohesion. Consider a class with N public methods. Let NP be the maximum number of public method pairs: NP = [N*(N – 1)]/2. Let NDC be the number of direct connections between public methods. Then TCC is defined as the relative number of directly connected public methods. Then, TCC = NDC / NP. Loose Class Cohesion. Let NIC be the number of direct or indirect connections between public methods. Then LCC is defined as the relative number of directly or indirectly connected public methods. LCC=NIC/NP. Degree of Cohesion (direct) is like TCC, but taking into account Methods Invocation Criterion as well. DCD gives the percentage of methods pairs, which are directly related. Degree of Cohesion (indirect) is like LCC, but taking into account Methods Invocation Criterion as well 47 In [LAKS99] and [ETZK97] various implementations of the cohesion metrics (LCOM2 and LCOM3) are compared on C++ code example classes. Best results show the following metrics: LCOM3, which did not include inherited variables, and that did include the constructor function in the calculations [ETZK97], [LAKS99] LCOM3 with consideration of inheritance and constructor [LAKS99] The metrics LCOM5 and Coh are not robust and are rejected from the further investigation. The next simple example presented in table 5.3 shows this. Table 5.3: Example for LCOM5 and Coh A1 A2 A3 A4 A5 M1 + M2 + M3 + + + M4 + + M5 µ 2 2 2 + + + 2 2 Obviously, the class is relative cohesive – all pairs of method have one common variable, but the metrics show the opposite. LCOM5 = ((1/a) Σ µ (Aj) – m) / (1 – m) = (10 / 5 – 5) / (1 - 5) = 0,75 Coh = Σ µ (Aj) / (m*a) = 10 / 5*5 = 0,4 In [BADR03] experts argue that methods can be connected in many ways: Attributes Usage Criterion – two methods are connected, if they use at least one attribute in common. Methods Invocation Criterion – two methods are connected, if one calls other Only three metrics (LCOM4, DCD, and DCI) consider both types of connections, all other metrics consider only attribute connection. The metrics have different empirical meaning: Number of pairs of methods (LCOM1, LCOM2) Number of connected components (LCOM3, LCOM4, Co) Relative number of connections (TCC, LCC, DCD, DCI) Most logically and interesting for the goals of this thesis is the number of connected components, this could be interpreted as number of parts, in which the class could be split. Noteworthy, that the values of normalized metrics (TCC, LCC, DCD, DCI, Co) are difficult to aggregate for representing of the result for entire system because averaging of the percentages leads to value with bad numerical and empirical properties. For more precise results the weighted mean value should be used. In case of size-dependent metrics (LCOM1, LCOM2, LCOM3, LCOM4) simply average value can be used. 48 Hence LCOM4 is the most appropriate metric. Basically it is the well-handled metric LCOM3 extended by the methods invocation criterion. “A non-cohesive class means that its components tend to support different tasks. According to common wisdom, this kind of class has more interactions with the rest of the system than classes encapsulating one single functionality. Thus, the coupling of this class with the rest of the system will be higher than the average coupling of the classes of the system. This relationship between cohesion and coupling means that a non-cohesive class should have a high coupling value” [KABA, p.2]. However in [KABA, p.6] by means of an experiment is shown that “in general, there is no relationship between these (LCC, LCOM) cohesion metrics and coupling metrics (CBO, RFC)”. Also one cannot say that less cohesive classes are more coupled to other classes. In [DARC05] Darcy believes that metrics for coupling and cohesion should be used only together and expects, that “for more highly coupled programs, higher levels of cohesion increase comprehension performance”. He motivated his conception by the following thought experiment (figure 5.11). Figure 5.11: Interaction of coupling and cohesion (according to [DARC05, p. 17]) “If a programmer needs to comprehend program unit 1, then the programmer must also have some understanding of the program units to which program unit 1 is coupled. In the simplest case, program unit 1 would not be coupled to any of the other program units. In that case, the programmer need only comprehend a single chunk (given that program unit 1 is highly cohesive). In the second case, if program unit 1 is coupled to program unit 2, then just 1 more chunk needs to be comprehended (given that program unit 2 also shows high cohesion). If program unit 1 is also coupled to program unit 3, then it can be expected that Short-Term Memory (STM) may fill up much more quickly because program unit 3 shows low cohesion and thus represents several chunks. But, the primary driver of what needs to be comprehended is the extent to which program unit 1 is coupled to other units. If coupling is evident, it is only then that the extent of cohesion becomes a comprehension issue.” Next, Darcy confirmed his hypotheses with an experiment with the maintenance of a test application. However, the very artificial sort of the experiment prevents reader from untried implementation of this hypothesis without more experiments. 49 Other types of cohesion – Functional Cohesion. Zuse [ZUSE98, p.525] distinguishes seven types of cohesion: Functional Cohesion: A functionally cohesive module contains elements that all contribute to the execution of one and only one problem related task Sequential Cohesion: A sequentially cohesive module is one whose elements are involved in activities such that output data from one activity serves as input data to the next Communicational Cohesion: A communicational cohesive module is one whose elements contribute to activities that use the same input or output data Procedural Cohesion: As we reach procedural cohesion, we cross the boundary from the easily maintainable modules to the higher levels of cohesion to the less easily maintainable modules of the middle levels of cohesion. A procedurally cohesive module is one whose elements are involved in different and possibly unrelated activities in which control flows from each activity to the next Temporal Cohesion: A temporally cohesive module is one whose elements are involved in activities that are related in time. Logical Cohesion: A logically cohesive module is one whose elements contribute to activities of the same general category in which the activity or activities to be executed are selected from outside the module. Coincidental Cohesion: A coincidentally cohesive module is one whose elements contribute to activities with no meaningful relationship to one another”. Since the functional cohesion is most desirable, some researchers ([BIEM94]) tried to develop a metric to measure it. Nevertheless, the “Functional Cohesion is actually an attribute of individual procedures or functions, rather than an attribute of a separately compliable program unit or module” [BIEM94 p.1] and is out of scope of this work. Inter-modular metrics are more important, since these are better indicators of the maintainability. Intra-modular cohesion seems to be too complicated in the calculation and weak in prediction of the maintainability of the entire system. The next type of cohesion is package cohesion or partition of classes into packages. Such kind of cohesion is also important, however difficult to analyze. Hence the package cohesion is the topic of separate research. LCOM Essential: It is the degree of relatedness of methods within a class. Cohesion can be used in the procedural and in object-oriented model as well. Has impact on the analyzability and the changeability Cohesion may be concerned together with coupling LOCM4 seems to be most appropriate metric from the theoretical point of view, additional experiments are needed. 50 Metric: D – Distance from Main Sequence This set of metrics was suggested by Martin in [MART95] and measures the responsibility, independency and stability of the packages. Martin proposes to consider a ratio between the amount of abstract classes within a package and its stability. A package is responsible, if it has big number of classes outside this package that depend upon classes within this package. This number is called Afferent Coupling (Ca). Package is independent, if it has small number of classes outside this package that are depended upon by classes inside this package. This metric is called Efferent Coupling (Ce). The responsible and independent package is stable, such package has no reason to change, and lots of reasons not to change. For measuring stability of the package Martin suggests the Instability metric: In = Ce / (Ca+Ce). This metric has the range [0,1]. In=0 indicates a maximally stable package. In=1 indicates a maximally instable package. The special case of a package coupled to no external classes (not mentioned by Martin) is considered to have the instability of 0 [REIS]. If all the packages in a system were maximally stable, the system would be unchangeable. In fact, designer wants portions of the design to be flexible enough to withstand significant amount of change. Also package should have sufficiently number of classes that are flexible enough to be extended without requiring modification abstract classes. To measure it, Martin suggests the Abstractness metric: A = # of abstract classes in the package / total # of classes in the package. This metric also has the range [0,1]. 0 means concrete and 1 means completely abstract package. The more stable is the package, the more abstract classes should it have in order to keep the ability to extension. These metrics are presented graphically on the figure 5.12 Each dot in the coordinate frame presents one package with two characteristics: stability and abstractness. Packages placed in area A are highly stable and concrete. Such packages are not desirable because they are rigid. These cannot be extended because they are not abstract. And they are very difficult to change because of high stability. Packages from area C are also undesirable, because they are maximally abstract and yet have no dependencies. Packages from area B are partially extensible, because they are partially abstract. Moreover, these are partially stable so that the extensions are not subject to maximal instability. Such a category seems to be "balanced". Its stability is in balance with its abstractness. The size of the dot in figure 5.12 indicates the size of the corresponding package. As final metric was suggested the distance from the dot, which presents the package, to line A+In=1. Because of its similarity to the graph used in astronomy, Martin calls this line the Main Sequence. The perpendicular distance of a package from the main sequence is D = (A+In-1)/√2. This metric ranges from [0, ~0.707]. One can normalize this metric to range between [0, 1] by using the simpler form D=|(A+In-1)|. The big distance from the main sequence doesn’t mean that this package was bad designed. It also depends on place of the package in the architecture. Packages working 51 with database or offering tools usually have high afferent coupling and low efferent coupling, therefore are highly stable and difficult to change. Thus it is useful to have more abstract classes here in order to be able to extend these and in such way maintain the packages. Packages for user interface depend from many other packages thus they have low afferent coupling and high efferent coupling and are mostly instable. Hence designers don’t need to have many abstract classes here, because these packages could be easily changed. This statement should be empirically proved. In figure 5.12 the analysis of project Mobile Client 7.1 (detailed described in chapter 8) is presented. As it can be seen, the packages are evenly distributed on the whole square, and it is impossible to conclude whether the entire system has good or bad design. The same situation can be seen in all other analyzed projects. Hence D-metric is bad indicator for the maintainability of the entire project. But one can notice that single packages from areas A and C possible may by difficult to maintain. Thus D-metric is supposed to be used for the metric-based audits. However experiments and discussion with the designers show, that audits based on D-metric can find only evident errors of design (for example not used abstract classes). Consequently, D-metric is rejected from the quality model. Figure 5.12: Demonstration of the analysis of Martin on project Mobile Client 7.1 Metric: CYC - Cyclic Dependencies The metric CYC determines the number of mutual coupling dependencies between packages. That is, the numbers of other packages the package depends upon and which 52 in turn depend on that package. Cyclic dependencies are difficult to maintain and indicate potential code to apply refactoring changes, since cyclically dependent packages are not only harder to comprehend/compile individually, but they cannot be packaged, versioned, and distributed independently. Thus, they violate the idea that a package is the unit of release. Unfortunately, this metric is project size dependent and it is impossible to compare two projects based on this metric. Consequently, the audits based on this metric can be useful to catch cyclic package dependencies before they make it into a software baseline. Metric: NOM - Number of Methods and WMC - Weighted Methods per Class Consider a class with n methods. Let c1...cn be the complexity of the methods. Then: If all method complexities are considered to be unity (equal to 1), then WMC = NOM = n, the number of methods. However in most cases complexity of the methods is estimated by MCC. The metric WMC was introduced by Chidamber and Kemerer's [CHID93, p. 12] and criticized by Churcher, Shepperd and Etzkorn. In particular, Etzkorn has suggested new metric for complexity of the methods [ETZK99] – Average Method Complexity. He argued that WMC has overstated values for classes with many simple methods. For example, a class with 10 attributes has 10 get-methods, 10 set-methods and the constructor, thus WMC = 21, what is very high value for such a primitive class. AMC, on the opposite, will have understated values for classes with a few really complex methods (MCC > 100) and many simple methods. “Thus, AMC is not intended primarily as a replacement for the WMC metric, but rather as an additional way to examine particular classes for complexity” [ETZK99, p. 12]. In this thesis it is more preferable to use WMC instead of AMC, because WMC is class size dependent, but independent from the project size. Additionally, it has very clear meaning: number of all decision statements in the class plus number of methods. Consequently, WMC is a good metric for estimating of overall algorithm complexity of the class. For data-container classes NOM = WMC because such classes have only get- and setmethods, which have MCC = 1. Thus, NOM can be used as additional metric for finding data-container classes. It is important for rejecting the data-container classes from the cohesion research. Structure Chart This model describes the communication between modules in the procedural environment and suits for illustration of processes in non-OO ABAP programs. Example of structure chart is depicted in figure 5.13. Boxes present modules (function 53 modules, programs, includes, etc.), circles present global variables and arcs present calls, whereas parameters of the call can be also depicted. Direction of the arrows distinguishes between importing and exporting parameters. Figure 5.13: Example of structure chart Metric: FAN-IN and FAN-OUT These metrics describe the data and external coupling in the procedural environment. Besides FAN-IN and FAN-OUT describe opposite directions of coupling: Parameters passed by values count toward Fan In External variables used before being modifies count toward Fan In External variables modified in the block count toward Fan Out Return values count toward Fan Out Parameters passed by reference depend on their use Drawback of these metrics is that it is assumed that all the pieces of information have the same size, however the distinction of complexity of procedure calls requires much more detailed analysis. All in all these metric can impart quite thorough idea about coupling. In the ABAP-environment functions for “where-used” and “usage” can be used for reporting of FAN-IN and FAN-OUT respectively. Based on these metrics a wide list of hybrid metrics was suggested in order to aggregate metrics to one single value for the entire system. One example is D-INFO = (SUM(FAN-IN*FAN-OUT))2, where SUM means sum for all modules (see [ZUSE98] for more details). However these derived metrics are project size dependent and have less meaning. Metric: GVAR - Number of Global Variables This metric presents the number of global variables used in a system. Usually, to overcome the size-dependency of this metric by normalizing, Number of Global Variables is divided by number of modules. Nevertheless, this metric is indirectly included in FAN-IN and FAN-OUT, hence it is senseless to include this excessive metric into the quality model even in spite of its ease. 54 Other Models Here some simple metrics, which don’t suit to any previous introduced models, are discussed Metric: DOCU – Documentation Rate This quantitative metric indicates the percentage of modules, which have external documentation: DOCU for ABAP or JavaDoc for Java. However, the quality of the documentation itself is not considered and is very difficult to automatically assess at all. Moreover, this metric is a part of the Maintainability Assessment. Thus this metric is excluded from the model. Metric: OO-D – OO-Degree This is an additional metric for ABAP (for Java application is always 100%). It shows percentage of compilable units created using the object-oriented paradigm (classes or interfaces) to total number of compilable units. This metric like other additional metric don’t have any qualitative meaning, but indicates the importance of OO-metrics: if only small part of a system is created using OO-paradigm, the analyst will pay less attention to OO-metrics. Metric: SMI – Software Maturity Index It is possible that the customer changes some modules in order to customize his system. Before the maintainers can start with the analysis and updating of the customer’s system, they should make sure that modules, which should be maintained, are not affected by the customization. It is important to know how different from the standard release the system actually is. For this reason the list of new created, changed or deleted objects should be written. As the metric for the modifying degree, the metric SMI is suggested. This is the rate of new created, changed or deleted objects with respect to the total number of objects, whereas it is unimportant who has made the changes: IMS or the customer. M − (M a + M c + M d ) SMI = M If SMI is less than 1, it is very likely that maintainers should compare the current customer version with the standard release before the update. The SMI approaches 1 as product begins to stabilize. Empirical meaning of SMI is: which percent of modules is not changed with respect to last standard release. This metric has type Percentage, thus it has bad numerical properties, see next chapter for more details. For the ABAP environment the number of changed LOC can be calculated by Note Assistant. Noteworthy, that in the ABAP environment only small part of the system cannot be changed by the customer, in the Java environment unchangeable part is much bigger. 55 Metric: NOD – Number Of Developers This metric shows the average number of developers which have ever touched an object. Author believes that modules, which were changed many times by different developers, have complicated behavior and are hard to modify. Moreover, such modules likely have very different styles and names conventions. All these factors decrease the maintainability. The interpretation of the values depends on the used for the development process methodology. For example eXtreme Programming doesn’t distinguish the code ownership and in this case the metric NOD is senseless. Correlation between Metrics Various methods can be used in order to prove whether one metric depends on another. Depends on numerical properties (see next chapter), several methods are possible. Pearson and Kendall’s correlation coefficients are applicable with measures with a ratio scale, whereas Spearman is used when the measure has an ordinal scale. For the small amount of data the covariation can be used: Covariation = SUM(xave(x))(y-ave(y))/(n-1) Several experts argue that the positive result of the correlation not necessarily implies the presence of casual relationship between the correlated metrics. In this thesis the relation between the metrics is proved using the correlation and deduction as well. This procedure is used in order to find correlated metrics in the quality model and reject the less important metrics, which do not provide additional information. The second scenario is an empirical validation of metrics, in this case it is proved whether the selected product metrics are correlated with process metrics. Unfortunately, in this thesis such study is impossible because lack of data for process metrics. For the illustration and to stress out the important properties of the correlation, a diagram can be used. The example depicted in figure 5.14 presents the correlation between LOC and WMC. The area marked with “1” presents several generated message classes, which have a lot of LOC, but no methods and thus no WMC. The area marked with “2” presents interfaces and abstract classes, which have few methods and approximately as few LOC. 56 WMC Correlation between LOC and WMC 200 150 100 50 2 0 0 1 400 800 1200 1600 LOC Figure 5.14: Correlation between LOC and WMC The next possible relation between metrics is not so obvious. Each product has the minimal inherent complexity, which depends only on the problem statement. If complexity of one perspective is reduced, complexity of other perspective will increase. For example reducing the high intra-modular complexity by increasing the total number of classes will lead to increasing the inter-modular complexity. An example of such relation is depicted in figure 5.15. With the numbers “1” and “2”, two releases with the same functionality are marked off. Here can be seen that decreasing the average MCC leads to increasing the total number of classes in the second release. Figure 5.15: Example of dependency between MCC and NOO Metrics Selected for Further Investigation Some parts of the quality model were rejected already by the creation of the model and expansion of it with the metrics: Questions “Does the system have data flow anomalies?” and “Is a code that is unreachable or that does not affect the program avoided?” were removed from 57 the quality model, because they have no great impact on the maintainability and are difficult to calculate. For the question “Are the naming conventions followed?” has not been found any appropriate simple metric and this question was removed from the quality model. All in all the proving of the naming conventions is not a trivial task. The question “How complex is the DB-schema?” is not urgent for SAP architecture and was removed. The question about the complexity of the data types was removed because of lack of metrics. However the quality model is still redundant. Different metrics get the same qualitative statement. Therefore, the metrics, which cannot get new qualitative information or are difficult to calculate, should be discarded. These metrics may be put on a waiting list for the implementing in the future. For the selection of the most important metrics, three additional criteria for each metric were provided: the importance, estimation or judgment in the literature and ease of implementation. In the following the list of all rejected metrics are enumerated with an indication of the reason for rejection: A (Abstractness) – is used also in D, the aggregated value doesn't provide appreciable qualitative meaning CN (Control Nesting) – is included in and correlates with MCC CR (Comments Rate) – is replaced by LC (Lack of Comments) CYC (Cyclic Dependencies) – is size-dependent; optional can be used for additional audits D (Distance from Main Sequence) – its aggregated value doesn't provide appreciable qualitative meaning DIT (Depth of Inheritance Tree) – is extended and replaced by NDC DOCU (Documentation Rate) – is difficult to analyze, is a part of the Maintainability Assessment D-INFO – is size-dependent and included in FAN-IN and FAN-OUT GVAR (Number of Global Variables) – is included in FAN-IN and FAN-OUT NOC (Number of Children) – is extended by NAC NOF (Number of Fields) - correlates with LOC NOM (Number of Methods) - correlates with LOC NOS (Number of Statements) - correlates with LOC and MCC U (Reuse Factor) – is not very important for the maintenance After the truncation of the quality model the following metrics were supposed as maintainability indicators and thus selected for further research: CBO - Coupling between objects CDEm - Class Definition Entropy (Modified) CLON - Clonicity LC – Lack of Comments FAN-IN (substitutes CBO in non OO ABAP environment) 58 FAN-OUT (substitutes RFC in non OO ABAP environment) LCOM - Lack of Cohesion of Methods LOC - Lines Of Code LOCm – Average LOC in methods m – Structure Entropy MCC - McCabe Cyclomatic Complexity (substitutes WMC in non OO ABAP environment) NOD - Number of Developers RFC - Response For a Class SMI - Software Maturity Index WMC - Weighted Methods per Class The metrics NAC (Number of Ancestor Classes) and NDC (Number of Descendent Classes) are suggested to support metric-based audits. The selected metrics are expected to describe the maintainability of the software and should cover the following aspects: Incoming and outgoing connections between programming objects Quantity of the internal code documentation Cohesion of programming objects Degree of conformance to the principle high cohesion - low coupling Modularity Algorithmic complexity Number of developers Usage of the inheritance Maturity Clonicity Size-dependent Metrics and Additional Metrics The size-dependent metrics will not provide any qualitative statement about system, but can give some idea about the size of the concerning system. The metrics Total-LOC (Total Lines of Code) and NOO (Number of Objects) are suggested. Additional metrics help to get some idea about importance of some other metrics. The metrics OO-D and IF are suggested. OO-D has an effect only upon ABAP, because for Java OO-D is always = 1. OO-D designates whether OO-metrics (like NAC, NDC, RFC, CBO, D and other) should be taken into the maintainability assessment or not. If less than 15% of entire system is made using OO approach, OO metrics have no evident impact on the maintainability. IF (Inheritance Factor) designates, whether inheritance metrics (NAC and NDC) should be taken into assessment or not. Nevertheless IF imparts some qualitative statement as well, because by changing of stand-alone classes no OO methods can be used, hence such classes are more difficult to change. 59 6. Theoretical Validation of the Selected Metrics Problem of Misinterpretation of Metrics After the calculation of the metric values for concrete source code one can analyze, compare or transform these. Nevertheless, in order to exclude misinterpretation, the application of the metrics should only take place after the metrics have been shown to be theoretically valid, in the sense that their numerical properties are well known and all operations after the results collection are deliberate. The next example shows a misleading usage of metrics. The maintenance can be seen as the process of exchanging one component of the system with another (newer) component. Figure 6.1 presents a simple illustration of this process. M5 M2 M4 M3 M1 M3‘ Figure 6.1: The maintenance as the process of exchanging components. Obviously the new component M3’ is somehow better than the substituted component M3 and one of the metrics should show it. In this example it would be the Defects Density (DD). However an improvement of the single part (or even each part) of the system not always leads to the improvement of the entire system. Often this is not a problem of the improvement, but of the description of it. Thus the right metrics for estimation and right operation at metrics should be used. The next table (6.1) is taken from [ZUSE98, p. 47] and shows two versions of one system with five modules. Each module in newer version is better than the correspondent module in old version – it has smaller DD. However, overall DD for the system becomes worse. This can happen because DD is a percentage measure and thus depends on the size of module. The overall DD depends also on distribution of size between the modules and the analyst must interpret such metrics very carefully. It is helpful to follow the number of steps to ensure the reliability of the proposed metrics. Some approaches were found to check the numerical properties of metrics in 60 order to find admissible transformations and prepare hints for analyst how to handle metrics. One of them is axiomatic approach, proposed by Weyuker [WEYU88], which provides a framework based on a set of nine axioms. Table 6.1: Trend of DD for two versions of the system [ZUSE98, p. 47] Version 1 Version 2 # of module # of errors LOC DD # of errors LOC DD 1 3 55 0.0545 12 777 0.0154 2 6 110 0.0545 5 110 0.0454 3 3 110 0.0272 2 110 0.0181 4 70 10000 0.0070 6 1000 0.0060 5 4 110 0.0363 3 110 0.0272 SUM 86 10385 0.0082 28 2107 0.0132 Trend of DD improvement improvement improvement improvement improvement degradation ! Zuse’s framework for the software measurement provides also a set of axioms, the so called extensive structure. Depending on the fulfillment of these, one can conclude about the type of scale of the metric and hence admissible transformation, which can be applied to the metric. This framework was used in this thesis for the examination of the selected metrics as it is more competent, common accepted and simple. Types of Scale Admissible transformations and hence types of scale are probably the most important properties of metrics, because other properties follow from types of scale. In the early 1940’s Stevens introduced a hierarchy of measurement scales and classified statistical procedures according to the scales, for which they were “permissible.” A brief description and criticism can be found in [VELL]. Here some basics will be introduced. All types of scale are summarized in table 6.2. The first very primitive scale is nominal. The values of metrics on this scale have no qualitative meaning and characterize just belonging to one or another class. The only possible operation is equality – one can define whether two values belong to the same class. Examples of nominal scale would be labels or classifications such as: f(P) = 1, if Program is written in ABAP f(P) = 2, if Program is written in Java The ordinal scale introduces the qualitative relation between values, thus they are able to be compared. The expert’s notes are on ordinal scale and one can use the empirical operation “more maintainable”. Values on the interval scale have equal distance between values. The Ratio scale allows comparing ratios between the values. The Absolute scale is a special case of a ratio scale and presents the actual count. The used in this thesis metrics are placed on the ordinal and ratio scale. One can see that higher scales (ratio) provide more possibilities for interpretation (wide range of empirical and statistical operation), but are more sensible to admissible transformation. Using a not appropriate transformation will lead to decreasing of type of scale for the result value or even to wrong conclusion. 61 The main idea of analyzing the types of scale is to help choosing the appropriate model and correctly analyzing the results by using of the appropriate operations. Admissible Transform. Basic Empirical Operations Statistical Operations Examples Any oneto-one transformat ion determination of equality mode, histograms, Non-parametric statistics (frequency counts, …) y2 > y1 iff x2 > x1 (strictly monotone increasing transformat ion) y = ax + b, a>0 (positive linear transformat ion) y = ax, a > 0 (similarity transformat ion) y = x (identity) the above, plus determination of greater or less The above plus Rank order statistics (Spearman and Kendall Tau correlation coefficient, Median), Maximum, Minimum the above, plus determination of the equality of intervals or differences The above plus Comparisons of arithmetic means, the Pearson correlation coefficient the above, plus determination of the equality of ratios The above plus Comparison of percentage calculations, Variance the above, plus determination of equality with values obtained from other scales of the same type the above labels or classifications such as: f(P) = 1, if Program is written in ABAP f(P) = 2, if Program is written in Java; activities (analyzing, designing, coding, testing); problem types; numbering of football players rankings or orderings such as severity and priority assignments f(P) = 1, if Program is easy to read f(P) = 2, if Program is not hard to read; NAC; NDC; CR; LC; OO-D; SMI; D; NOD; DOCU; IF; m; CDEm; GVAR; CLON; DD The absolute time when an event occurred; calendar date; temperature in degrees Fahrenheit or Celsius; intelligence scores (“standard scores”) time intervals; cost, effort (staffhours), length, weight, & height; temperature in degrees Kelvin; LOC; LCOM; CBO; RFC; WMC; FAN-IN; FAN-OUT; NOM Counting; probability Absolute Ratio Interval Ordinal Nominal Scale Type Table 6.2: Types of Scales (partially taken from [PARK96, p.9]) Types of Metrics Measures can be divided into different groups regarding the kind of receiving information for the metric. See [ZUSE98, p.p. 242 – 246] for the full list. Below only used in this thesis types of measure are presented: Counting – simple calculating of objects or their artifacts. The following operations can be applied: Range, Sum (for additive metrics), Average (for additive metrics, carefully), Weighted Mean, Median, Standard Deviation, Graphic, Aggregation (very careful). 62 Examples are: LOC, WMC, RFC, CBO, LCOM4, FAN-IN, FAN-OUT, NOM, MCC, NOD. Density – one metric value is divided by another independent metric value. Examples are: GVAR, DD. The following operations can be applied: Range (very careful), Weighted Mean, Median, Standard Deviation, Graphic, Aggregation (very careful) Percentage – a metric expressed as ratio of one part of empirical objects or their artifacts with respect to their total number. Examples are: CR, OO-D, SMI, DOCU. The following operations can be applied: Range (very careful), Weighted Mean, Median, Standard Deviation, Graphic, Aggregation (very careful). In particular this means that the percentage metrics must not be used for arithmetic mean. For example if one module has CR(P1) = 50% and other module CR(P2) = 10%, one must not average these to 30%. Dependent on size of modules the real CR(P1 + P2) could be between 10 and 50%. Especially for the CR the weighted mean will be smaller than the arithmetic mean, because smaller classes usually have larger CR and weighted mean weights small classes with smaller coefficients. Distribution for LOC and CR by the example of date from the project ObjMgr (new) is shown in figure 6.2. Distribution of LOC and CR LOC 2000 1500 1000 500 0 0 50 100 150 200 250 CR 300 Figure 6.2: Distribution between LOC and CR, smaller classes usually have larger CR. Minimum, Maximum – minimal or maximal value of a population (metric set of each empirical object). Only Range operation can be used. Hybrid metric is a metric, which consists of the union of other metrics using the addition or multiplication. Examples are: LC, m, CDEm, D, MI. Hybrid metric inherits lowest numerical properties of its components. Hence, usually such metrics have relatively poor numerical properties and only few operations can be applied. Concatenation operation for inheritance hierarchies is indefinably, because new nodes can be added at any place in hierarchy. Thus all metrics, based on this model have only ordinal scale. This is detailed discussed in [ZUSE98 p.p. 273 - 335]. 63 Conversion of Metrics The conversion of metrics is a numerical operation with one or more metrics in order to get metrics with new numerical or qualitative properties. The first type of conversion is the aggregation. Some metrics like SMI, CLON or OO-D are calculated for the project as whole and don’t need to be aggregated, but many others metrics describe one single class or even method and need to be aggregated in some way to one single value indicates the entire group of empirical objects (package, inheritance hierarchy or whole system). Depends on the type of scale, different methods are possible. The first method is the range, in this method only maximal (minimal) value is taken. However one extreme value is a bad indicator for the entire system and only can be used together with other methods. Range can be used for metrics on ordinal or higher scale. The second method is the averaging using arithmetic mean: values of all modules are summed and divided by number of modules. Keep on mind that this method can be applied only to metrics on interval and higher scales. This is probably the most simple and popular way of the averaging. Nevertheless it will change empirical statement of the resulting metric and its properties. Consider some features of the arithmetic mean applied to the inter-modular metrics. As example the metrics FAN-IN and FAN-OUT are taken. The metrics FAN-IN and FAN-OUT are good indicators of the analyzability and changeability of a single module. But how the values of single modules can be combined to one common indicator for the entire system? Consider an example system on figure 6.3 to illustrate this. A LOC:200 FAN-IN:0 FAN-OUT:2 B LOC: 100 FAN-IN:0 FAN-OUT:2 C LOC:100 FAN-IN:2 FAN-OUT:2 E LOC:50 FAN-IN:2 FAN-OUT:2 H LOC:50 FAN-IN:2 FAN-OUT:0 D LOC:150 FAN-IN:1 FAN-OUT:2 F LOC:50 FAN-IN:3 FAN-OUT:3 G LOC:200 FAN-IN:1 FAN-OUT:3 I LOC:50 FAN-IN:2 FAN-OUT:1 J LOC:50 FAN-IN:4 FAN-OUT:0 Figure 6.3: Example system for weighted mean Average values of FAN-IN and FAN-OUT are: Ave-FAN-IN = (0+0+2+1+2+3+1+2+2+4)/10 = 1,7 Ave-FAN-OUT = (2+2+2+2+2+3+3+0+1+0)/10 = 1,7 64 It is not a singular coincidence. For any closed system average values are equal because each relation is directional and is calculated twice: once in FAN-IN and once in FANOUT by the constant number of modules. Thus for the arithmetic mean in closed systems simple formula Ave-FAN-IN = AveFAN-OUT = Number of Relations / Number of Objects can be used. The problem is that all modules have equal weight, thus all empirical objects have equal impact on the average value. In real world large and complex objects have more impact on attribute of the whole system, but averaging will equalize in rights complex and simple objects. It is quite reasonable to calculate weighted mean value, which characterizes quality attribute of the entire system more precisely. Different systems for the weighting can be used. Suppose that possibility of changing the larger module is more than the smaller one, because it has more LOC, which could be changed. Consider again the example on figure 6.3, notice that the darkness of rectangles represents the size of the module. Let’s calculate weighted by size (in LOC) mean values of FAN-IN and FAN-OUT. Mean-FAN-IN = 0*0,2 + 0*0,1 + 2*0,1 + 1*0,15 + 2*0,05 + 3*0,05 + 1*0,2 + 2*0,05 + 2*0,05 + 4*0,05 = 1,2 Mean-FAN-OUT = 2*0,2 + 2*0,1 + 2*0,1 + 2*0,15 + 2*0,05 + 3*0,05 + 3*0,2 + 0*0,05 + 1*0,05 + 0*0,05 = 2 The results show, that on average by the analysis of the system the developer should analyze 2 modules and by changing should keep stable 1,2 modules. That means the system is relative stable, but difficult to analyze. Hence the weighted mean allows not only more precise calculating of aggregated values, but also distinguishing the systems with tendency to predominance of one or other direction of relations. Noteworthy predominance in this case means the probability of having to analyze relations in this direction. Nevertheless, in generally a large module uses more other modules than smaller one. That means weighted mean for FAN-OUT tends to be more than weighted mean for FAN-IN. Figure 6.4 shows that usually large classes (larger LOC) have more connections with other classes (larger RFC). This observation is based on 8 SAP Java projects. Hence weighted FAN-OUT supposed to be larger than weighted FAN-IN. Since WMC is good indicator for class complexity and should be correlated with the fault probability, this metric also can be used for weighting. In this case the weighted mean for the FAN-OUT can be interpreted as the average number of modules the developer has to analyze by localizing of a fault. The second type of conversion is the normalizing. Metrics have different ranges of values and for comparison or presentation it is useful to have all metrics having the same range (usually in interval [0; 1]). This could be achieved by the normalizing. Example for this conversion is the normalizing of entropy: because maximal entropy is known, the normalizing is easy. Anyway since the entropy metric is on ordinal scale, the normalizing will not worsen its numerical properties. 65 Correlation between LOC and RFC RFC 400 350 300 250 200 150 100 50 0 0 200 400 600 800 1000 1200 LOC 1400 Figure 6.4: Correlation between LOC and RFC The third type of conversion is the composition of set of metrics to one hybrid metric. Very popular is polynomial composition, however other types are also possible. Example is the Maintainability Index. The fourth type of conversion is the percentage grouping. All modules are grouped into three groups based on the metrics values (normal values, high but still acceptable, inadmissible) after that the percentage of modules in each group are presented using pie-diagram as it is shown in figure 6.5. Such diagrams also can give an idea about distribution of values within the system and good complements the aggregated value. Percentage grouping for LOC Mobile Client - 7.0 red: 62 36% red: 118 30% green: 99 58% yellow : 10 6% Percentage grouping for LOC Mobile Client - 7.1 yellow : 32 8% green:240 62% Figure 6.5: Example of percentage grouping. Comparison of two versions of Mobile Client 66 The percentage grouping can also be done at more detailed level of granularity, namely in LOC. For that it is needed to recalculate all values in LOC and present the distribution of LOC, like it is shown in figure 8.6 (see p. 82). Such detailed description allows not only the analyzing the module distribution into normal, high and inadmissible areas, but also analyzing how many LOC do the inadmissible modules have. The aggregation and normalizing are also used to convert size-dependent metrics to quality-dependent metrics. Usually, average values of size-dependent metric are no more size-dependent. Nevertheless any conversion should be used very carefully, because such transformation can change the numerical properties and qualitative meaning of the metric. Other Desirable Properties of the Metrics Meta-metrics or properties of the metrics are discussed in [MUST05]: Compliance: The ability to cover all aspects of quality factors and the design characteristics Orthogonality: The ability to represent different aspects of the system under measurement. This property is detailed studied during the COBISOME project Formality: The ability to get the same value for the same systems for different people at different times through precise, objective and unambiguous specification Minimality: The ability to be used with the minimum number of metrics Implementability/Usability: The implementation technology independent ability Accuracy: A quantitative measure of the magnitude of error, preferably expressed as a function of relative error Validity refers to the degree to which a study accurately reflects or assesses the specific concept that the researcher is attempting to measure Reliability: The probability of failure free software operation for a specified period of time in a specified environment Interpretability: The ease with which the user may understand and properly use and analyze the metrics results Visualisation The very popular method for the presentation of metric results is the Kiviat-Diagram. For the maintainability’s dimensions such diagram is presented in figure 6.6. These diagrams are very pictorial and present information simple and intuitive. However this intuition can be misunderstood. Usually, all dimensions are on ordinal scale and any ratios between numbers by comparison of two systems are meaningless. But the numbers are graphically presented on each dimension using the rational intervals, what can lead to the situation, when the analyst will try to interpret the ratio between intervals on Kiviat-Diagram as the ratio between metrics on ordinal scale. This is the misunderstanding. The same holds for other column-diagrams as well. 67 The next drawback is hiding of the information: behind each dimension several metrics are hidden and after aggregation it is not clear, which metric caused the deviation. It is also not clear, which weights for each metric should be taken. Figure 6.6: The Kiviat-diagram for the maintainability dimensions Consequently, the best possibility for presenting of the multidimensional information remains a table, where all used metrics with aggregated values are listed. Additionally some color marker accent indicators with high or not permissible values. By comparing multiple releases of one system it is possible to indicate trends of values with arrows. The example of such presentation is given in table 6.3. Table 6.3: Example of output table (Mobile Client 7.0 vs. 7.1) The next possibility is the usage of a Business Warehousing system. Here different reports can be prepared and the history can be saved. However at this point of time it is not possible and can be planned for remote future. In this work the simple tables will be used. 68 It is interesting to inspect how values of metric are distributed between modules. The possibility to take a deeper look into essential of metric is the distribution graphic. One of them is depicted in figure 6.7. Here can be seen that most of methods have few LOC and only few methods are very large. Thus averaging of LOC can lead to underestimating. LOC Distribution of LOC in Methods 176 151 126 101 76 51 26 1 1 51 101 151 201 251 301 351 401 451 # of Me thod Figure 6.7: Distribution of LOC in Methods (small part of the project Mobile Client 7.1) The class blueprint graphically presents information in for of the chart, where simple metrics and trivial class diagrams are combined. Elements of a class are graphically represented as boxes. Their shape, size and color reflect semantic information. The two dimensions of the box are given by two selected metrics. Example is given in figure 6.8. Three-dimensional diagrams are also possible. Figure 6.8: Example of visualization of class diagram. Other class graphs for the software metrics visualization are detailed discussed in [LANZ99]. For his work Lanza used the tool called CodeCrawler, which is a language independent reverse engineering tool, which combines the metrics and software visualization. See also http://www.iam.unibe.ch/~scg/Research/CodeCrawler/ . 69 7. Tools In this chapter several tools for the software measurement and analysis are discussed. In the first section tools for ABAP are presented, in second section Java tools are introduced and in third section tools for automation of the GQM-approach and the integration of several tools in order to automate experiments are discussed. ABAP-tools Transaction SE28 The transaction SE28, which uses the package SCME, can be used to calculate metrics and visualize these in form of the hierarchy. Program SAPRCODD can be used for calculating of a set of metrics for a single program or for a set of the programs. In this case the mask “*” should be used as for the parameter object name. SE28 is not a standard tool, which can be found in each standard installation of the basis. It is available for example in BC0 and B20 systems. For parsing of the source code the ABAP command SCAN ABAP-SOURCE INTO TOKENS is used. However, only few metrics are implemented: LOC, MCC, Comparative complexity (calculates number of ORs and ANDs within IF statements), DIF – Halstead Difficulty, Number of comments, etc. Unfortunately, it is impossible to show more than 4 metrics at the same time. Noteworthy, that this transaction only presents the results, the actually calculation is made by the job EU_PUT. Additionally, for the measurement some standard ABAP tools can be used. The transaction DECO and its components can be used for calculating of the FAN-IN and FAN-OUT metrics. See tables BUMF_DECO and TADIR, which contain entire structure of the system, the package SEWA and functions MRMY_ENVIR_CHECK, RS_EU_CROSSREF and REPOSITORY_ENVINRONMENT_CHECK. These functions return a list with “where-used” and “uses” objects. Before these functions can be used a special job should be started in order to fulfill the tables. Questions refer to Martin Runte or Andreas Borchardt. Z_ASSESSMENT This small report calculates metrics needed for the Maintainability Assessment project. They are: Total number of forms, Forms with > 150 lines, Percent with > 150 lines and Ratio comments to code. It is very small, but useful report. Please contact Alpesh Patel for further questions. CheckMan, CodeInspector These tools increase awareness of the quality in development by checking of code against a great many rules (audits). In opposite to CheckMan, CodeInspector provides also possibility to counting of some software attributes and hence allows metric’s 70 implementation. However until now no metric was implemented in such way. ABAP Test Cockpit is successor for Code Inspector and CheckMan. It also suits to counting of metrics. But its productive start is only planned. For more details about CheckMan see [SAP03], for more details about CodeInspector see [SAP03b]. AUDITOR This tool from third vendor (www.caseconsult.de) is supposed for audits and doesn’t suit for the calculation of sums. Each ABAP program is processed separately and the result is stored in a HTML-file. Thus, if one wants to have information about the whole system, or some of the inter-modular metrics, these HTML-files should be additionally processed, in order to collect this information. The second problem is that AUDITOR can read only flat files, thus the entire system should be exported before the measurement. Additional audits can be implemented in form of C++ libraries. All listed problems make this tool awkward for current task. At this point of time the transaction SE28, report Z_ASSESSMENT and some standard tools can be used. Nevertheless, the ABAP environment has deficit in tools for the following metrics: CDEm, m, SMI, LCOM4, RFC, CBO, NAC and NDC. Other metrics can be measured directly or with a small work-around. For the metric CLON the tool CloneAnalyzer can be used. Java-tools Some tools for Java need the binary code or compilable source code because these have to parse the classes first and only then the metrics will be calculated. Some other tools don’t need the compilable code. This allows calculating of metrics for the coding, which contains syntax errors or references on missed classes. This makes it possible to use such a tool for analyzing of a part of the application only. Nevertheless in this case results can be slightly garbled and the analyst should draw the conclusions very guardedly. Borland Together Developer 2006 for Eclipse Among other features Borland Together Developer provides also the functionality for the metrics measurement. This tool collects measurement data on source code level. It is important for the analysis of only a part of the software. However not all selected metrics are provided by Together. This tool is used for initial research with substituting of some metrics. For the next usage the missing metrics should be implemented. Next, several important for the experiment specialties are introduced. LOC is the number of lines of code in a method or a class. LOC counts all rows within class body only. Comments, package specification and imports before the class body are not counted. There are two options for LOC: Documentation and implementation comments as well as blank lines and fullrow comments can be optionally interpreted as code 71 Empty strings and full-row comments are rejected. Noteworthy if several statements are written in one line, they are counted separately. This option has been chosen for the experiments The metric for CR is called TCR and counts the ratio of the documentation and implementation comments to the Total-LOC. Comments inside a file, which contains the class, but outside the class (before the class body) are not counted. The metric for WMC is called WMPC1. The metric for DIT is called DOIH. For NOM (number of methods) the metric NOO – Number of operations (except constructors) is used. The tool provides several metrics for cohesion. LCOM is calculated using attributesmethods coupling only. The coupling between methods and inherited attributes are not considered. Thus no proper metric for cohesion was found. Borland Together allows processing the classes with missed “imports”, however it can lead to erroneous results. For example DIT will be equal to zero if the parent class is missed. The same can be said about many other inter-modular metrics. Thus, after the calculation the selectively results should be proved manually. Code Quality Management (CQM) The CQM supports the entire development process by providing a landscape for the measurement and analysis. Measurement data is stored in the repository. After that the web-based quality reports can be prepared. Other tools (like Together or JDepend) can be integrated into the CQM using adapters. For more details see [SAP05d]. CloneAnalyzer CloneAnalyzer is a free Eclipse plug-in for the software quality analysis. It allows finding, displaying and inspecting of clones, which are fragments of duplicated source code resulting from lack of proper reuse. Noteworthy this tool finds only exact clones. That means this tool is language independent and can be used both for the Java and for ABAP environment. Nevertheless, the CloneAnalyzer works with flat files only. Thus the corresponding part of the ABAP system should be exported and the file filter should be set in options. In options it is also possible to set the minimal size of the clones to be found: in context of this thesis clones with length of 15 LOC and larger were found. The founded clones can be saved in CSV-format with specifying of number of clones in clone set and the length of a clone. This data allows calculation of total LOC in the clones and thus Clonicity. For the more details see http://cloneanalyzer.sourceforge.net/ . Tools for Dependencies Analyze All tools discussed in this subsection calculate metrics for the packaging including: Afferent Couplings (Ca), Efferent Couplings (Ce), Abstractness (A), Instability (In) and Distance from the Main Sequence (D). JDepend is a free stand-alone tool and is also available as plug-in for Eclipse. Here only some basic packaging metrics are provided. For more information see 72 http://www.clarkware.com/software/JDepend.html and http://andrei.gmxhome.de/jdepend4eclipse/ . OptimalAdvisor 4.0 is a commercial tool, which graphically presents results for each package, whereas if package has sub-packages, those classes will be taken into account. Though well graphical representation no possibility for saving information is provided. For more details see http://javacentral.compuware.com/products/optimaladvisor/ . Code Analysis Plug-In (CAP) is a free plug-in for Eclipse. It provides handy interface for browsing classes and showing dependencies. Unfortunately, it is not possible to save results. This plug-in doesn’t take into account standard java libraries. CodePro is a commercial plug-in for Eclipse and parallel with dependency analysis also provides few other metrics and other functionality. CodePro metrics take into account dependencies to standard java libraries. For more information see http://www.instantiations.com/codepro/ . The problem of many tools is the aggregation, all tools do it in different ways and the methods used for the aggregation are even not documented. For example CodePro just takes the value of the top package (for example com or org). Since this package usually has no responsibility, the D-metric will be equal to the abstractness, what is quite confusing. The best way to get the average value for D (or other metric) is to calculate it ourselves using XSLT. JLin SAP-internal tool JLin performs static tests. Possible applications include Identification of potential error sources Enforcing of code conventions Evaluation of metrics and statistics Enforcing of architectural patterns Monitoring In order to fulfill these requirements, JLin can be used in the following environments: As an Eclipse plug-in Within the SAP make process (which currently uses ant, in the near future the Component Build Server) via an API Nevertheless, only few metrics are implemented. For more details see [JLIN05]. Free tools: Metrics and JMetrics Metrics 1.3.6 is a free plug-in for Eclipse. Calculation is based on binary code thus the source code should be compilable. Most important metrics like Number of Classes, Number of Children (subclasses) of a class, DIT, NOM, LOC, Total-LOC, LOCm, WMC, LCOM1, LCOM2 are provided. It is possible to export the measurement data in XMLformat. For more details see http://metrics.sourceforge.net/ JMetric is a free stand-alone tool for the metrics collection and analysis. JMetric collects information from Java source files and compiles a metrics model. This model is then populated and extended with metrics such as LOC, Total-LOC, LOC in Methods, Number Of Classes, Lack of Cohesion Of Methods, WMC, Number Of Children classes, 73 DIT, NOM. JMetric also provides few analysis methods in the form of drill down and cross section tables, charts, and raw metrics text. For more details see http://www.it.swin.edu.au/projects/jmetric/ . Though the wide range of tools several metrics for Java should be additionally implemented: CDEm, m, SMI, LCOM4, NAC, NDC and NOD. Framework for GQM-approach In current thesis the presentation of the quality model is made in MS Visio. However there are tools for automated support of model building, measurement and interpreting, which could be integrated into the development landscape. GQM-DIVA, GQMaspect and MetriFlame are some of these. For more details in GQM-approach supporting tools readers are referred to [LAVA00]. However these tools are not handy and don’t support appropriate input source of information. Thus for the goals of the experiments presented in this thesis some XMLand XSLT-documents were prepared in order to provide automatic generation of the reports. In common the measurement process works in the following way: third tools provide the measurement data in form of XML-files. The quality model is also saved in form of XML-file. All these XML-files are input for XSL-transformation, which generates the output report concerned to the GQM-model. See figure 7.1 for more architectural details. Example of usage is given on p. 85. Figure 7.1: Architecture of the metric report generator In the experiments the trial version of Borland Together Developer 2006 for Eclipse, trial version of CodePro 4.2 and Clone Analyzer 0.0.2 were used. This selection is explained by a wide number of provided metrics and proper output format. However, on conditions that the analyzed project doesn’t contain compiler errors, the free tools like Metric 1.3.6 or JMetric can be used in order to save costs. 74 8. Results For empirical validation of the selected metrics several experiments were made. In this chapter the description of the experiments, their results and drawn conclusion are discussed. Overview of the Code Examples to be Analyzed For empirical validation of the selected metrics several SAP projects were selected. Overview is depicted in table 8.1. Table 8.1: List of the projects analyzed during this work Components Contact person or organization Java Two versions of Object Manager (SLD): perforce3008:\\sdt\com.sap.lcr\620_SP_COR\src\java\com\sap\lcr\objmg r\... (old) perforce3301:\\engine\j2ee.lm.sld\dev\src\tc~sld~sldcimom_lib\_tc~sld~s ldcimom_lib\java\com\sap\lcr\objmgr\... (new) Two versions of SLD-Client (WBEM API): perforce3301:\\base\common.sld\dev\src\tc~sld~lcrclient_lib\_tc~sld~lcrc lient_lib\java\com\sap\lcr\api\cim\... and \cimclient\... (old) perforce3301:\\base\common.sld\dev\src\tc~sld~sldclient_lib\_tc~sld~sld client_lib\java\com\sap\sld\api\wbem\... (new) JLin/ATX project comparison: "perforce3002\tc\jtools\630_SP_COR" vs. "perforce3227\buildenv\BE.JLin\dev" Mobile Client routine comparison: 7.0 (2.1) vs. 7.1 (2.5) Thorsten Himmel Thorsten Himmel Georg Lokowandt Janosz Smilek ABAP Package CRM_DNO Package ME Richard Himmelsbach Joachim Seidel For assessment of the actually maintainability for selected project either process metrics or expert’s opinion are needed. The process metrics could be easily extracted from the system for the customer messages management. Examples are MTTM or Backlog Management Index (number of problem closed during the month / number of problem arrivals during the month). Further examples for the available process metrics can be found in [SAP05c]. However, for the new projects no information about the maintenance is available yet, for other projects it was a bureaucratic problem to get the data. Hence the alternative option using the expert’s opinion should be used. Very popular method for evaluation of software metrics is finding the correlation between expert’s estimation and automated calculated metrics. As statistical method the convergence or other similar methods can be used. However this works in case of enough estimated sources are available. Furthermore, it is desirable to have all 75 estimations made by one expert in order to ensure the uniformity of the evaluation. In context of the current experiment it is impossible. Hence for initial evaluation a simplified procedure is suggested. Several experiments will be made. In each experiment only two releases of one software component will be compared. Older release supposed to have improper design, which is improved in the last release. It is assumed that releases have near the same functionality and thus the same minimal complexity, to be achieved. Comparing of metric values for two releases should give an idea whether the selected metrics are robust enough to indicate the maintainability improvement. This methodology is simple, but powerful. One of the experts remarks, that he couldn’t characterize any of provided examples as the good or bad maintainable one, but the ranking among examples is obvious. Because of lack of tools for ABAP only Java projects will participate in the experiments. However here short description of the selected ABAP projects is presented, which can be used for further researches. The first example comes from Richard Himmelsbach. Package CRM_DNO presents classical usage scenario of ABAP and OO-ABAP. The example can be found in the system TSL 001. This package contains several reports for DNO monitor and also includes used classes and functions. Advantages of design are: the clear structure and readability, the function encapsulation, modularity, exceptions handling, comments and naming conventions and customizability because of used parameters. The second example was suggested by Joachim Seidel and includes several objects from the package ME. The following objects are worthy of notice, because these have being continuously changed by many notes in different releases; the history of changes in release 4.6C is the longest one: the function module ME_READ_LAST_GR in the function group EINR the function module ME_CONFIRMATION_MAINTAIN_AVIS and include LEINBF0Q in the function group EINB the includes LMEDRUCKF17, LMEDRUCKF06 and the function module ME_READ_PO_FOR_PRINTING in the function group MEDRUCK the program SAPMM06E and include MM06EF0B_BUCHEN the function group MEPO Such high number of faults denotes a bad design of the earlier releases, however no opinion about the maintainability of the new version of ME is available. Experiments The overview of all analyzed projects can be found in table 8.2. The arrows in the cells of the newer versions indicates the improvement () or degradation (). It can be seen, that most pairs of metrics (old-new version) show improvement. Because of lack of tools fulfilling of the following goals was not proved: Consistency, Maturity and Packaging. 76 Table 8.2: Overview of analyzed projects ObjMgr old Metric LOC 378,8 LOCm 14,9 WMC 50,1 RFC 87,0 CBO 7,4 LC 77,5 CLON 11% CDEm 0,82 20,455 Total-LOC 54 Total-NOC ObjMgr SLDClien SLDClien new t old t new 224,2 268,3 328,9 8,9 4,2 8,0 19,4 35,4 25,7 30,1 54,1 45,3 6,9 9,9 8,8 81,0 70,0 43,5 1% 2% 1% 0.833 0,825 0,830 24,888 48,824 48,020 111 182 146 JLin/AT X 630 132,9 9,5 15,9 33,0 7,0 54,0 7,2% 0,859 50,249 378 JLin/AT Mobile X dev Client 7.0 108,1 157,9 7,7 6,3 13,8 16,8 30,2 24,3 6,4 4,7 40,0 35,0 2,2% 6% 0,842 0,871 44,116 27,000 408 171 Mobile Client 7.1 153,0 6,2 16,6 23,7 5,1 31,0 3% 0,886 59,664 390 400 37 8,8 Project: ObjMgr ObjMgr 350 250 224 ,2 300 7,4 10, 0 111 54 8 20 ,45 24, 88 5 83, 3 82 ,4 77 ,5 3 0, 1 6,9 8,9 14 ,9 50 1 9, 4 50 ,1 100 81, 0 87 ,0 150 11 0,0 200 0 LOC LOCm WMC RFC CBO LC CLON*10 CDEm * 100 TotalKLOC TotalNOC Figure 8.1: Evolution of the project ObjMgr The measurement data for the project ObjMgr is presented in figure 8.1. Red columns (the left columns in each pair) present old version, yellow – new. Two metrics on the right side are additional and represent the size of the system in Total-LOC and TotalNOC. The new version has twice more classes, but total amount of code in LOC rose insignificantly. It caused the reduction of the average number LOC in classes. But for all that the inter-modular metrics are also improved. Old version had the redundancy of the complexity. It is shown by the metrics WMC and CLON. In new version significant reduction of number of clones leads to decreasing of the intra-modular complexity. The 77 metric WMC has greatly improved from 50,1 in old version to 19,4 in new version. Such large difference could be explained by the distribution of the complexity in twice larger number of classes. However total amount of complexity was also reduced – in old version sum of WMC in all classes is equal 2700, in newer version – about 2150. Assumed, that about 10% of complexity was decreased by reducing of clonicity and the residual difference is caused by proper design. Insignificant degradation is shown only by the metrics LC and CDEm. ObjMgr (new) is a new developed version of ObjMgr (old). During the development the design of the application was very simplified. This leaded to more handy usage of the API and therefore to the higher maintainability. Project: SLDClient The same can be said about the evolution of the SLDClient, whereas here new version provides additional new functionality. Probably new version seems to be more complex, because of the new features, however from viewpoint of the maintainability old version was very poor, because often the changes of one part caused faults in another part. Redundancy was also reduced in newer version. Noteworthy, the extensive usage of the patterns, in particular Visitor, leads to increasing of LOC in the classes of newer version. 400 300 3 28,9 SLD C lie nt 268,3 350 182 250 146 200 20,0 48,82 4 4 8, 0 20 83, 0 10, 0 43, 5 70,0 9,9 54,1 4 5, 3 8,8 8,0 4,2 50 25, 7 35,4 100 82,5 150 0 LOC LOCm W MC RFC CBO LC CLON*10 CDEm * 100 TotalKLOC TotalNOC Figure 8.2: Evolution of the project SLDClient Project: JLin/ATX This project is special, because all metrics without exception show improvement of the maintainability and thus completely meet the expert’s opinion. 78 378 408 In particular one can see, that the newer version has less Total-LOC, whereas it provides more functionality. Moreover, Total-NOO was slightly increased and clonicity was reduced, all this lead to decreasing of the average LOC and WMC in classes and methods. Noteworthy, that though the increasing of the total number of classes, the intermodular metrics (RFC, CBO) show also improvement. See figure 8.3 for the overview. 400 JLin/ATX 350 300 250 WMC 7,0 6,4 15,9 13,8 LOCm 85,9 84,2 72,0 50,24 9 44,11 6 33,0 30,2 9,5 7,7 50 54,0 40,0 100 22,0 150 132,9 108,1 200 0 LOC RFC CBO LC CLON*10 CDEm * 100 TotalKLOC TotalNOC Figure 8.3: Evolution of the project JLin/ATX Project: Mobile Client This project will be considered in more details. In this sub-section the short code names 7.0 and 7.1 will be used. The 7.1 (new version of the Mobile Client) provides a lot of new functionality, what is also reflected by doubling of the Total-LOC and Total-NOO. Nevertheless, the inter-modular metrics remain in the recommendable area. Algorithm Complexity The algorithm complexity is relative high, but still acceptable in both releases. The average WMC has been slightly reduced in 7.1: Ave-WMC(7.0) = 16,754 Ave-WMC(7.1) = 16,592 79 3 90 400 Mobile Client 350 300 157, 9 153,0 200 171 250 4 27, 00 0 59, 66 87, 1 8 8, 6 3 0, 0 3 5, 0 31, 0 4,7 5,1 24, 3 23, 7 6,3 6,2 50 16, 8 16, 6 100 60, 0 150 0 LOC LOCm WMC RFC CBO LC CLON*10 CDEm * 100 TotalKLOC TotalNOC Figure 8.4: Evolution of the project Mobile Client Selfdescriptiveness In common 7.1 is better commented than 7.0. LC(7.0) = 35. This means 65 comments for each 100 LOC. LC(7.1) = 31. This means 69 comment lines for each 100 LOC. However, the manual examination shows, that many comments are automatically generated (JavaDoc), not very meaningful or are just the code commented out. Noteworthy, that 7.1 has more interfaces, abstract classes and data-container classes, which have very high comments/code ratio. In this case such small changes of LC are difficult to interpret. This metric deserves attention only in case of inadmissible values. Modularity Most important metrics for modularity are presented in table 8.3. 7.1 is twice so large than 7.0 based on size on disk, number of classes and LOC. 7.1 has on average smaller classes (153 lines of code). Typical class of 7.1 is smaller than typical class of 7.0 (see medians in table 8.3). This means that in 7.1 only few classes are large, 7.0 has more large classes. See appendix D with lists of complex classes and complex methods for more details. 7.1 has slightly smaller methods as well. Only 22 methods are larger than 80 LOC. In 7.0 – 40 methods have more than 80 LOC. Based on these facts the modularization of 7.1 is better. 80 Table 8.3: Modularity analysis Size on disk NOC (Number Of Classes) NOC (Number of all Classes, including internal) LOC (Lines Of Code) Median-LOC Ave-LOC Ave-LOCm (of methods) 7.0 772 159 139 171 27 000 92 157,89 6,33 7.1 1 725 178 326 390 59 664 83,5 152,98 6,22 Structuredness For coupling two metrics were selected: RFC and CBO. 7.1 has smaller RFC but bigger CBO. For 7.0 this means that on average classes are coupled to less number of other classes, but use these relations more actively. For qualitative statement about the coupling the experts input is required. Ave-RFC(7.0) = 24,292 Ave-RFC(7.1) = 23,667 Ave-CBO(7.0) = 4,737 Ave-CBO(7.1) = 5,15 For assessing of the cohesiveness metric LCOM4 is proposed. However, Borland Together is not able to calculate cohesion in form it is supposed, thus four other available metrics were analyzed, average values for which are presented in table 8.4. The arrows in the cells of the newer versions indicates the improvement () or degradation (). Table 8.4: Comparison of different cohesion metrics for all classes Component LCOM1 LCOM2 LCOM3 TCC Mobile Client 7.0 31,94 39,74 34,33 11,45 Mobile Client 7.1 51,35 38,91 30,36 8,73 Thus based on 3 of 4 metrics 7.1 is appreciably more cohesive. More detailed analysis has showed that many classes, which have a role of data storage, are not cohesive because of get and set methods. Nevertheless, such classes don’t have to be cohesive in sense of intersection of attribute usage. Noteworthy that 7.1 has more data containers than 7.0: 148 of 390 classes against only 52 of 171. Cohesion in 7.1 is relative high, what is the indicator of good design. However, in order to be able to give some suggestion for improvement, the new implementation of cohesion metric (LCOM4) is needed. For assessing of inheritance usage metrics NAC (Number of Ascendant Classes) and NDC (Number of Descendant Classes) are suggested. However Together doesn’t calculate these metrics, and the metrics DIT (Depth in Inheritance Tree) and NOC (Number Of direct Children) are used instead. 81 Since these metrics are intended for a single class, any aggregation will lead to uninterpretable results, thus these metrics are used in form of audits. Percentage grouping is used in order to aggregate values for the entire system. Any other method of aggregation needs additional human input. Classes with DIT > 3 (7.0) Classes with DIT > 3 (7.1) red: 24 6% red: 13 8% green: 366 94% green: 158 92% Figure 8.5: Percentage of classes with more than 3 parents. LOC in classes with DIT>3 (7.0) red: LOC in classes with DIT>3 red: (7.1) 5662 9% 2224 8% green: 24776 92% green: 54002 91% Figure 8.6: Percentage of LOC in classes with more than 3 parents. Figure 8.5 shows percentage of classes with more than 3 parents, the list of such classes for 7.1 can be found in appendix D. However, the check in more detailed level of granularity, namely in LOC, shows, that number of LOC in classes with DIT>3 has been slightly increased from 8% to 9%. It is shown in figure 8.6. 7.0 has higher IF and so less stand-alone classes: IF (7.0) = 0,77 IF (7.1) = 0,68 The list of the complex stand-alone classes, which probably should be broken into small hierarchies, can be found in appendix D. 82 Clonicity 7.0 has 4 clone sets with totally 1636 LOC. Thus Clonicity is 6,1% 7.1 has 8 clone sets with totally 1850 LOC. Thus Clonicity is 3,1% Both components have quite small clonicity, the difference is insignificant. The list with clones for 7.1 can be found in appendix D. Entropy An archive compression rate can be used as very primitive indicator of entropy – average amount of information within the text. Based on compression rate both components have approximately equal entropy of source code. Import-based CDEm shows that 7.1 has slightly higher entropy of the package names usage and thus is expected to require more cognitive loading from the maintainer. Nevertheless 7.0 has only 21 packages, while 7.1 has 44 packages, thus 7.1 has much more possibilities for composing of import-section. This fact together with improper calculation of the CDEm demands the additional research. Table 8.5: Comparison of compression coefficients Size on disk Size of ZIP-archive Coefficient of compression (ZIP) Normalized CDEm (import-based) CDEm (import-based) 7.0 772 159 216 556 0,280 0,85 4,48 7.1 1 725 178 501 998 0,291 0,89 6,63 Value 7.1 has slightly smaller average values for the metrics LOC and WMC, which affect the number of test-cases. Thus it is expected that 7.1 should have on average slightly smaller number of test-cases needed pro class. Simplicity 7.1 has slightly smaller average values for the metrics LOC and RFC, which affect the simplicity of the test-cases. Thus it is expected that 7.1 should have on average slightly easier test-cases. Summary Based on the metrics investigation the Mobile Client 7.1 has less complexity than Mobile Client 7.0. In most investigated aspects 7.1 is more maintainable than 7.0. This conclusion is based on the analysis of the set of the selected metrics. Only CDEm (entropy-metric) and CBO (Coupling Between Objects) have shown degradation of newer release. Here the additional research is needed. Under condition 83 that 7.1 is twice so large than 7.0 and also provides new functionality, such small degradation is insignificant and expected. Admissible Values for the Metrics This section establishes the recommendable and admissible values for the validated metrics in the following way: two boundary values partition all possible values of each metric in three areas – recommendable area (green), admissible area (yellow) and inadmissible area (red). Inadmissible values should attract attention of the analyst and indicate possible problems by the maintenance. Multiple studies confirm an optimal value for LOC based on studies of defect density for classes with different LOC. These studies result that small classes usually have high DD because of low number of LOC and large classes have high DD because of high complexity and thus the high probability of the fault. Consequently, from the viewpoint of the DD, the optimal size for a class is approximately 100 LOC. Similar method can be used for most other metrics. Nevertheless, in the current experiment no process data (like DD) is available, hence a method based on the expert’s judgments should be used. This method tries to set the boundaries between the areas so, that the distinction between the old and new projects is maximally enlarged and the values for more maintainable project are placed in better areas. Table 8.6 presents boundary values for validated metrics. Let’s call the jump of the metric from less desirable area to more desirable area as confirmed improvement of the release. Most desirable is to have 32 confirmed improvements: 8 metrics, 4 projects. With chosen boundary values the metrics indicate 21 improvements, what is quite good recognition grad. Table 8.6: The admissible values for Java projects Y R Metric G LOC 120 155 LOCm 9 12 WMC 15 26 RFC 24 32 CBO 6 9,5 LC 32 53 CLON 5% 10% CDEm 0,85 0,9 It is desirable to have the equal boundaries for all projects. However, small differences can be found between different types of projects and especially between the ABAP and Java projects. For example it is expected that methods in classes are smaller than procedures, because of another kind of encapsulation and inheritance. In [WOLL03, p. 5] has been also shown that program in Java expected to have more functionality, than program in ABAP of equal size. This peculiarity appears because of more compact syntax in Java. 84 Interpretation of the Results Like it was already mentioned in chapter 4, the quality model can be used not only for the metrics definition and validation, but also for the interpretation of the measurement results. In this section several instructions will be given, how to interpret the results. On this occasion the analysis of the model is studied in the bottom-up manner and starts on the metric layer by the aggregation of the measurement data into higher layers. After the metrics are calculated and the admissibility of the values is determined, the corresponding questions can be answered. Achievement of the recommendable value of the metric signifies the positive sense of the answer. Depends on the answers for the corresponding questions the achievement of the goals can be estimated. Following a very short quotation of the interpretation for project Mobile Client 7.1 is given: The metrics LOC and WMC are in the admissible area. Thus the goal “Low algorithm complexity” is only partially achieved. However LOCm is in the recommendable area and hence in spite of high LOC one can conclude that the goal “Modularity” is complete. The goal “Structuredness” is also complete because of proper values of RFC and CBO, however there is a lack of design in sphere of interface for packages, what is shown by a bit higher value for CDEm. The goal “Low test effort” is achieved only partially, because high LOC and WMC indicate high number of complicated test-cases needed. The goals “Selfdescriptiveness” and “Clonicity” are completely achieved. All in all one can conclude about the relatively high maintainability of this project. Measurement Procedure For reader, who is interested in repeating of this experiment with his own data, the following instruction might be helpful. For the measurement is recommendable to use Borland Together Developer 2006 for Eclipse, because this tool provides most of needed metrics and allows saving of the results in XML-format. It is possible to use trial version in order to save costs. Visit http://www.borland.com/us/products/together/ in order to download and install the tool. After the installation a new entry “Quality Assurance” in context menu for project should appear in your Eclipse platform. Go to the Java perspective and choose “Quality Assurance”->”Metrics” from context menu for your project. Before the calculation of the metrics can start, some options should be set. Choose “Option” and select the following metrics from the list: LOC, NOO, LCOM1, LCOM2, LCOM3, TCC, RFC, WMPC1 (in current work this metric is called WMC), CBO, DOIH (DIT), NOCC (NOC), TCR (CR). Additional settings for each metric can be set. However it is recommendable to leave all settings by default value, because Together calculates average values not properly. Confirm your choice by “OK” and start calculation. After that the measurement data will appear in hierarchical form in a new Eclipse view called “Metric”. For the analysis it is useful to present the metric data in tabular form like is shown in table 6.3. This could be done automatically using XSLT transformation. Before automatic filling up, export the measurement data from Together to a file in XML85 format. Put your XML-file into directory with XSLT files and start XSLT transformation using the sequence of two following commands: java XslTransformator // Java-class for transformation myproject.xml // your saved XML-file MMtogether2xml_average.xsl // XSLT adapter for Together MetricXml.xml // temporary file java XslTransformator // Java-class for transformation MMGQM.xml // GQM quality model in XML format MMGQM2table.xsl // XSLT for the output table MetricTable.html // Output file Make sure that your CLASSPATH includes a link to the JAXP-compliant XSLT processor, for example “xerces”. The output table will be saved in the file you selected as third parameter in the second transformation (in given example it is “MetricTable.html”). After this procedure several metrics are still missing in the output table. To insert these values some other tools are suggested. One of them is CloneAnalyzer, discussed in chapter 7. Download and install this Eclipse Plug-In, the new menu element called “CloneAnalyzer” should appear. Select “CloneAnalyzer” -> “Build” and new Eclipse view “CloneTreeViewer” should come in sight, where all clones are recorded with indication of the size of the clone and source file, where this clone was found. Noteworthy that CloneAnalyzer search in all open projects, thus please close unnecessary projects. Unfortunately, this tool does not provide metric CLON, and number of LOC in all clones should be calculated manually and divided by Total-LOC, in order to get the aggregated value. After that metric CLON can be included into the output table. For the metric CDEm special Java-class was developed. Select next command for the start: java com.sap.olek.EntropyImport "C:\Program Files\workspace\objmgr" // Path to directory with source code import_objmgr_old.txt // Output file After execution of this command two files are generated: output file and statistic file. At the end of the output file find value for “Norm. Entropy” and put it into the output table for CDEm value. It is also possible to automatically prepare the metric-based audits using the command: java XslTransformator // Java-class for transformation myproject.xml // your saved XML-file MMtogether2xml_list_reports.xsl // XSLT for audits MetricTable.html // Output file 86 The output file presents HTML report with classes, which violate one of the following audits (example of this report for Mobile Client 7.1 is given in appendix D): The list of complex methods (LOC > 80) The list of classes with DIT > 3 The list of classes with NOC > 10 The list of complex stand-alone classes (WMC > 50) The list of large classes (LOC > 500) The used for the transformations files can be found in CD for this master thesis. It is also possible to use free tools in order to save costs. The most appropriate candidate is the tool Metrics (see corresponding section in chapter “Tools”). Nevertheless, the XSLT adapters for new tools should be implemented. 87 9. Conclusion The experiments have shown that most of the selected and validated metrics can be used as reliable maintainability indicators. Nevertheless, many metrics provided by available tools have other implementation as it was initially supposed. Such deflection is acceptable for the initial examination, however for the next usage the metric implementations should be corrected. After the analysis of the ability to assess the maintainability, the following groups of metrics have been distinguished: 1. Metrics – possible indicators of the maintainability These metrics consecutively correspond with expert’s opinions. This means that in most experiments they indicated the improvement of newer (better) version. Nevertheless, the difference between the values is sometimes not evident (only 3-5%). Hence in this case human input is needed to make the conclusion. Additionally, all these metrics can be used in form of audits for the finding of potentially bad maintainable code. Metrics: WMC (Weighted Method Complexity), LOC (Lines Of Code), RFC (Response For a Class), CBO (Coupling Between Objects), LC (Lack of Comments), CLON (Clonicity), MCC (McCabe Cyclomatic Complexity) Decision: admit into the quality model for the maintainability assessment 2. Metrics – candidates for audits or code reviews The metrics NAC (Number of Ancestor Classes) and NDC (Number of Descendant Classes) are very good descriptors for a single class, thus can indicate the classes which probably can cause problems for maintenance, but the aggregated values are not representative. The experiments show that the aggregated values for these metrics often differ from expert’s opinions. Thus the inheritance metrics are poor indicators of the maintainability of the entire application. Therefore, these metrics are kept in the quality model, but as optional component for code reviews. 3. Metrics which didn’t show appropriate results The packaging metrics are able to find only evident error of design (for example – not used abstract classes) and don’t suit even for the audits. As it was expected, the metrics A (Abstractness), In (Instability) and D (Distance from the main sequence) are bad indicators for the maintainability. These metrics were rejected from the quality model. The metric CDEm (Class Definition Entropy - modified) also didn’t show high correlation with expert’s opinions. Nevertheless, probably bad results are caused by inexact calculation and because of many “*” in import sections. More experiments with this metric are needed. 88 4. Metrics that have not been participating in the experiments, but are supposed to be good indicators of the maintainability Metrics: m (Entropy), LCOM (Lack of Cohesion Of Methods), NOD (Number Of Developers), FAN-IN, FAN-OUT and SMI (Software Maturity Index) have not been participating in the experiments because of lack of tools or data. These metrics can only restrictedly be admitted into the quality model. The result of this thesis is a deeper knowledge about the maintainability, which is essentially formalized in form of the quality model. Based on this model it is possible to understand the substance of maintainability and also measure the most important indicators of it. Based on the theoretical speculation and the experiments presented above, the following conclusions can be made: It is possible to describe the different maintainability related aspects of the software using metrics-based indicators. Several metrics chosen in this study appear to be useful in predicting the maintainability Since only limited aggregation is possible and the output of this research is a list of the maintainability indicators, only a semi-automated process is possible. The metrics can provide only the description of the system. The final decision should be made by human Because thousands of metrics exist, it should not be a problem to find the appropriate metrics among them. However, during this thesis two new metrics were suggested Metrics have different levels of granularity: some of them describe a single class or method, others can characterize the entire system. Since final indicators have to describe the system, all metric values should be aggregated Most problems occur when aggregating the data in order to characterize the entire system. It is caused by the data garbling, information hiding or inadmissible operations, since the metrics are good indicators of a single module. Therefore the aggregation should be done very carefully Since a poor quality of code can be found much more easily than a good designed code, metrics can be used in form of audits. Metric-based audits are good supporter for code reviews In this work the admissible values for each practically investigated metric were determined. On the other hand, these values depend on the used programming paradigm and language Metrics are able to show the trend (improvement or degradation) within the list of releases of one component Metrics are of limited use to compare different components Nevertheless, the metrics are just one possibility to describe certain properties of the software. Interpretation whether such description means good or bad maintainable code depends on the design and goals. The final conclusion about the maintainability of software can only be made by human. 89 10. Outlook The current thesis gives an idea about the abilities of the metrics from viewpoint of the maintainability assessment. Parallel with the theoretical introduction into the software measurement, also a practical example of usage has been produced. Hence the results of this work can be already practically used. Nevertheless, before the successful usage several open issues should be settled. The most important issue is finding the appropriate tool for the measurement. In chapter 7 several tools are discussed, but none of them provides all required metrics. Several metrics for Java should be additionally implemented. The ABAP environment has even more deficit in the tool support. Also several metrics were not validated because of a lack of data or tools. The author believes that all selected metrics are reliable, but to ensure results the additional experiments should be made. The next important step is to research how the usage of patterns impacts values of metrics. For example, the usage of the pattern “Visitor” can lead to increasing of LOC in the class-Visitor and it would not be unwanted. Exact impacts of different patterns and their consequences for the metrics have to be researched. In [GARZ02] some metrics for patterns are discussed. In [KHOS04] the assessment of 23 patterns from viewpoint of the simplicity, modularity, understandability etc is provided. Khosravi argues that the patterns should be used very carefully, for example the pattern Proxy makes debugging much harder and increases the number of classes. In the second part of the outlook, several interesting approaches are mentioned, which make possible the further expansion of the metric-based quality mechanisms. One of the problems of integrating the measurement tools and automating the measurement procedure is the handling in heterogeneous and changing environment, because during the lifecycle various tools can be used. In [AUER02] a simple metric data exchange format and a data exchange protocol to communicate the metric data is proposed. This approach aims filling the gap between frameworks and tools by offering detailed instructions on how to implement metric data collection, yet an open and simple standard, which allows easy integration of existing tools and their data handling processes. A flexible, easy and fast implementation of new metrics is also important. In [MARI] a Static Analysis Interrogative Language is introduced. This is the language dedicated to the aforementioned type of static analyses of source code and allows the implementation of various metrics in a homogenous manner. After the parsing of the source code, the simple but powerful queries can be written in order to obtain information about certain properties of the code and calculate the metrics. In this thesis the quality model aims only at collection and presentation of metrics data to expert who should make decisions about the maintainability of the product based on measurement data and their experiences. However, the processing of the metrics data can be fully automated as well by using for example Fuzzy Logic or Neuronal Networks. In 90 [THWI] Thwin uses the neuronal networks for proving the ability of object-oriented metrics to predict the number of software defects and the maintenance effort. The next approach is not metric-based, but nonetheless very interesting and useful. In [GALL02] Gall uses CVS history for detecting of not obvious logical relations between classes. The classes, which are often changed together by a single change request, may have a logical relation not necessarily reflected in physical relations. After the study of the change history, the list of the classes, which were often changed together by one change request, is generated and hints about such relations could be given to the maintainer. 91 References [ABRA04] Alain Abran, Miguel Lopez, Naji Habra, An Analysis of the McCabe Cyclomatic Complexity Number , in 14th International Workshop on Software Measurement (IWSM) IWSM-Metrikon 2004, Konigs Wusterhausen , Magdeburg, Germany , Shaker-Verlag , 2004 , pp. 391-405 . [ABRA04b] Alain Abran, Olga Ormandjieva, Manar Abu Talib, Information Theorybased Functional Complexity Measures and Functional Size with COSMIC-FFP, 2004 [AHN03] Yunsik Ahn, Jungseok Suh, Seungryeol Kim and Hyunsoo Kim, The software maintenance project effort estimation model based on function points, J. Softw. Maint. Evol.: Res. Pract. 2003; 15:71–85 [ALTU06] Yusuf Altunel, Component-Based Software Engineering, Chapter 9: Component-Based SW Testing, Lecture Notes, 26.01.2006 [AUER02] Martin Auer, Measuring the Whole Software Process: A Simple Metric Data Exchange Format and Protocol, 2002 [BADR03] Linda Badri and Mourad Badri, A New Class Cohesion Criterion: An empirical study on several systems, 7th ECOOP Workshop on Quantitative Approaches in Object-Oriented Software Engineering, (QAOOSE'2003), July 22nd, 2003 [BASI94] Victor R. Basili, Gianluigi Caldiera, H. Dieter Rombach, The Goal Question Metric Approach, 1994 [BASI95] Victor R. Basili, Lionel Briand and Walcélio L. Melo, A validation of objectoriented design metrics as quality indicators, Technical Report, Univ. of Maryland, Dep. of Computer Science, College Park, MD, 20742 USA. April 1995. [BIEM94] James M. Bieman and Linda M. Ott, Measuring Functional Cohesion, IEEE Transactions on software engineering, Vol. 20, No. 8. August 1994, p.p. 644 – 657 [BRUN04] Magiel Bruntink, Arie van Deursen, Predicting Class Testability using Object-Oriented Metrics, 2004 [CART03] Tom Carter, An introduction to information theory and entropy, Complex Systems Summer School, June, 2003 [CHID93] Shyam R. Chidamber, Chris F. Kemerer, A metrics suite for object-oriented design, M.I.T. Sloan School of Management, Revised December 1993 [DARC05] David P. Darcy, Chris F. Kemerer, Sandra A. Slaughter, The Structural Complexity of Software: Testing the Interaction of Coupling and Cohesion, January 22, 2005 [DOSP03] Jana Dospisil, Measuring Code Complexity in Projects Designed with Aspect/J, Informing Science InSITE - “Where Parallels Intersect” June 2003 92 [DUMK96] Reiner R. Dumke, Erik Foltin, Metrics-based Evaluation of Object-Oriented Software Development Methods, 1996 [ETZK97] Letha Etzkorn, Carl Davis, and Wei Li, "A Statistical Comparison of Various Definitions of the LCOM Metric," Technical Report TR-UAH-CS-1997-02, Computer Science Dept., Univ. Alabama in Huntsville, 1997 [ETZK99] Letha Etzkorn, Jagdish Bansiya, and Carl Davis, Design and Code Complexity Metrics for OO Classes, Journal of Object Oriented Programming 1999; 12(1):35–40 [ETZK02] Letha H. Etzkorn, Sampson Gholston and William E. Hughes, A semantic entropy metric, J. Softw. Maint. Evol.: Res. Pract. 2002; 14:293–310 [FELD02] David Feldman, A Brief Introduction to: Information Theory, Excess Entropy and Computational Mechanics, April 1998 (Revised October 2002) [GALL02] Harald Gall, Mehdi Jazayeri and Jacek Krajewski, CVS Release History Data for Detecting Logical Couplings, Technical University of Vienna, Distributed Systems Group, Proceedings of the Sixth International Workshop on Principles of Software Evolution (IWPSE’03) [GARZ02] Javier Garzás and Mario Piattini, Analyzability and Changeability in Design Patterns, SugarloafPLoP 2002 Conference [HASS03] Ahmed E. Hassan and Richard C. Holt, The Chaos of Software Development, 2003 [JLIN05] SAP-intern documentation. See in SAPnet http://bis.wdf.sap.corp:1080/twiki/bin/view/Techdev/JavaTestTools -> JLin [KABA] Hind Kabaili, Rudolf K. Keller, François Lustman and Guy Saint-Denis, Class Cohesion Revisited: An Empirical Study on Industrial Systems [KAJK] Mira Kajko-Mattsson, Software Evolution and Maintenance [KELL01] Horst Keller, Sascha Krüger, ABAP Objects, Einführung in die SAPProgrammiereung, 2001, SAP PRESS [KHOS04] Khashayar Khosravi, Yann-Gael Gueheneuc, A Quality Model for Design Patterns, Summer 2004 [LAKS99] Anuradha Lakshminarayana and Timothy S. Newman, "Principal Component Analysis of Lack of Cohesion in Methods (LCOM) metrics," Technical Report TR-UAH-CS-1999-01, Computer Science, Dept., Univ. Alabama in Huntsville, 1999 [LANZ99] Michele Lanza, Combining Metrics and Graphs for Object Oriented Reverse Engineering, 1999 [LAVA00] Luigi Lavazza, Providing Automated Support for the GQM Measurement Process, IEEE SOFTWARE May/June 2000, p.p. 56-62 93 [MARI] Cristina Marinescu, Radu Marinescu, Tudor Gırba, A Dedicated Language for Object-Oriented Design Analyses [MART95] Robert Martin: OO Design Quality Metrics - An Analysis of Dependencies, August 14, 1994 (revised June 20, 1995) [MISR03] SUBHAS C. MISRA, VIRENDRAKUMAR C. BHAVSAR, Measures of Software System Difficulty, SQP VOL. 5, NO. 4/2003, ASQ [MUST05] K. Mustafa and R. A. Khan, Quality Metric Development Framework (qMDF), Journal of Computer Science 1 (3): 437-444, 2005 [NAND99] Jagadeesh Nandigam, Arun Lakhotia and Claude G. Cech, Experimental Evaluation of Agreement among Programmers in Applying the Rules of Cohesion, Journal of Software Maintenance: Research and Practice, J. Softw. Maint: Res. Pract. 11, 35–53 (1999) [PARK96] Robert E. Park, Wolfhart B. Goethert, William A. Florac, Goal-Driven Software Measurement — A Guidebook, August 1996, Software Engineering Institute [PIAT] Mario Piattini and Antonio Martínez, Measuring for Database Programs Maintainability [REIS] Ralf Reißing, Towards a Model for Object-Oriented Design Measurement [RIEG05] Matthias Rieger, Effective Clone Detection Without Language Barriers, Inauguraldissertation der Philosophisch-naturwissenschaftlichen Fakultat der Universitat Bern, 10.06.2005 [ROSE] Dr. Linda H. Rosenberg and Lawrence E. Hyatt, Software Quality Metrics for Object-Oriented Environments, NASA [RUTH] Ian Ruthven, Maintenance [RYSS] Filip Van Rysselberghe, Serge Demeyer, Evaluating Clone Detection Techniques [SAP03] Cüneyt Çam, W. Hagen Thümmel, Philip J. Zhang, Essentials of CheckMan, SAP AG 2003 [SAP03b] Randolf Eilenberger and Andreas Simon Schmitt, Evaluating the Quality of Your ABAP Programs and Other Repository Objects with the Code Inspector, SAP Professional Journal, 2003 [SAP04] Product Innovation Lifecycle, From Ideas to Customer Value, Whitepaper Version 1.1, July 2004, Mat. Nr. 500 70 026 [SAP05] Dr. Eckart Spitzberg, Process Description: Quality Gates, Version 4.1 31.03.2005 [SAP05b] Pieter Bloemendaal, SAP Code Quality Management Newsflash - June 15, 2005 94 [SAP05c] Thomas Haertlein, Ulrich Weber, Neelakantan Padmanabhan, Horst Pax, Project ‚Quality Indicators‘,23 May 2005 [SAP05d] Pieter Bloemendaal, Code Quality Management (CQM), 2005 SAP SI AG [SERO05] Gregory Seront, Miguel Lopez, Valerie Paulus, Naji Habra: On the Relationship between Cyclomatic Complexity and the Degree of Object Orientation, 2005 [SHEL02] Frederick T. Sheldon, Kshamta Jerath and Hong Chung, Metrics for maintainability of class inheritance hierarchies, J. Softw. Maint. Evol.: Res. Pract. 2002; 14:147–160 [SNID01] Greg Snider, Measuring the Entropy of Large Software Systems, HP Laboratories Palo Alto, HPL-2001-221, September 10th, 2001 [SOLI99] Rini van Solingen and Egon Berghout, The Goal/Question/Metric Method: a practical guide for quality improvement of software development, 1999, McGraw-Hill Publishing Company, London [THWI] Mie Mie Thet Thwin, Tong-Seng Quah, Application of Neural Networks for Software Quality Prediction Using Object-Oriented Metrics [VELL] Paul Velleman and Leland Wilkinson, Nominal, Ordinal, Interval, and Ratio Typologies are Misleading [WELK97] Kurt D. Welker, Paul W. Oman And Gerald G. Atkinson, Development And Application Of An Automated Source Code Maintainability Index, Software Maintenance: Research And Practice, Vol. 9, 127–159 (1997) [WEYU88] Weyuker, E. J., Evaluating Software Complexity Measures. IEEE Transactions on Software Engineering, Volume: 14, No. 9, pp. 1357 – 1365. 1988. [WOLL03] Björn Wolle, Analyze von ABAP- und Java-Anwendungen im Hinblick auf Software-Wartung, CC GmbH, Wiesbaden, Published in MetriKon 2003 „SoftwareMessung in der Praxis“ [Yi04] Tong Yi, Fangjun Wu, Empirical Analysis of Entropy Distance Metric for UML Class Diagrams, ACM SIGSOFT Software Engineering Notes, Volume 29, Issue 5, September 2004 [ZUSE98] Horst Zuse: A Framework of Software Measurement, Walter de Gruyter, Berlin, 1998, 755 pages. ISBN: 3-11-015587-7. 95 Abschließende Erklärung Ich versichere hiermit, dass ich die vorliegende Masterarbeit selbständig, ohne unzulässige Hilfe Dritter und ohne Benutzung anderer als der angegebenen Hilfsmittel angefertigt habe. Die aus fremden Quellen direkt oder indirekt übernommenen Gedanken sind als solche kenntlich gemacht. Potsdam, den 23. Februar 2006 96 Appendix A. Software Quality Models (Taken from [KAJK]) McCall Quality Model, 1977 (Part) Fenton’s decomposition of the maintainability 97 Software Quality Characteristics Tree (Boehm, 1978) 98 SMI Rejected metrics Metrics for audits Non-counting metrics (Entropy) Additional or process metrics Inter-modular metrics Intra-modular metrics Metrics Question Goal (Internal Properties) Goal (External Attributes) Legenda How many new, changed, deleted objects does the system have? How significant was the system changed? Should the SW be compared with original release because of changes? Maturity (MA) CLON Does the system have clones? Clonicity (CL) LC How many developers have toched an object? Consistency (CO) NOD LOC How affect parts on each other? How often does developer face to new unknown object? H NAC RFC How many other objects have to understand developer in order to completely comprehend given object? FANOUT LCOM Is cohesion high enough? Should developer analyze not relevant code? How is the code divided in parts? MCC WMC How easy is to comprehend the SW? Should developer understand other parts of the system? Structuredness (ST) Is the System in all levels good splitted in parts? Should developer understand large chunks of code at a time? Modularity (MO) Does the code have sufficient comments? Selfdescriptiveness (SD) How complex is intramodular algorithm complexity? Algorithm Complexity (AC) Analysability (Understandability) Quality Metrics for Maintainability of the Standard Software Appendix B. GQM – Quality Model Are reusable elements isolated from non-reusable elements? Packaging (PA) Should developer change large chinks of code at a time? Modularity (MO) I m LCOM D How full are abstact classes used? A LOC CBO FAN IN Is the code sufficient divided in parts? How high is degree of conformance of system to principles of maximal cohesion and minimal coupling? Should developer analyze not relevant code? NDC What is the number of modules changed per change cause? How many relation are between objects? How many global variables there are? Is coupling low enough? Structuredness (ST) How easy is to change the SW? Should developer change other objects? Or check either these have to be corrected? Changeability (Modifiability) Maintainability LOC RFC NOO OO-D IF Additional metrics LOC NOM Size-dependent metrics FAN OUT WMC MCC How easy is to test the SW? LOC How easy are testcases to maintain? Simplicity (SI) How many Testcases should be changed/proved? Value (VA) Testability How easy is to maintaint the SW? Appendix C. Quality Model in XMLformat <?xml version="1.0" encoding="UTF-8"?> <Model> <Goal name="Maintainability"> <Question text="How easy is to maintain the SW?"> <Goal name="Analyzability"> <Question text="How easy is to comprehend the SW?"> <Goal name="Algorithm Complexity"> <Question text="How complex is intra-modular algorithm complexity?"> <Metric name="Weigthed Method Complexity" shortname="WMC" /> <Metric name="" shortname="MCC" /> </Question> </Goal> <Goal name="Selfdescriptiveness"> <Question text="Does the code have sufficient comments?"> <Metric name="" shortname="CR" /> </Question> </Goal> <Goal name="Modularity"> <Question text="How is the code divided in parts?"></Question> <Question text="Should developer understand large chunks of code at a time?"> <Question text="Is the System in all levels good splitted in parts?"> <Metric name="" shortname="LOC" /> <Metric name="" shortname="LOCm" /> </Question> </Question> <Question text="Should developer analyze not relevant code?"> <Question text="Is cohesion high enough?"> <Metric name="" shortname="LCOMm" /> </Question> </Question> </Goal> <Goal name="Structuredness"> <Goal name="Consistency"> <Question text="How many developers have touched an object?"> <Metric name="" shortname="NOD" /> </Question> </Goal> <Question text="How affect parts on each other?"></Question> <Question text="Should developer understand other parts of the system?"> <Question text="How many other objects have to understand developer in order to completely comprehend given object?"> <Metric name="" shortname="NAC" /> <Metric name="" shortname="RFC" /> <Metric name="" shortname="FAN-OUT" /> </Question> </Question> <Question text="How often does developer face to new unknown object?"> <Metric name="" shortname="CDEm" /> </Question> </Goal> </Question> </Goal> <Goal name="Changeability"> <Question text="How easy is to change the SW?"> <Goal name="Structuredness"> <Question text="Should developer change other objects? Or check either these have to be corrected?"> <Question text="What is the number of modules changed per change cause?"> </Question> <Question text="Is coupling low enough?"> <Question text="How many relation are between objects?"> <Question text="How many global variables there are?"></Question> <Metric name="" shortname="NDC" /> <Metric name="" shortname="CBO" /> <Metric name="" shortname="FAN-IN" /> </Question></Question> <Question text="Should developer analyze not relevant code?"> <Metric name="" shortname="LCOM" /> </Question> <Question text="How high is degree of conformance of system to principles of maximal cohesion and minimal coupling?"> <Metric name="" shortname="m" /> </Question> </Question> </Goal> <Goal name="Modularity"> <Question text="Should developer change large chinks of code at a time?"> <Question text="Is the code sufficient divided in parts?"> <Metric name="" shortname="LOC" /> </Question> </Question> </Goal> <Goal name="Packaging"> <Question text="Are reusable elements isolated from non-reusable elements?"> <Metric name="" shortname="I" /> </Question> <Question text="How full are abstract classes used?"> <Metric name="" shortname="A" /> </Question> 101 <Metric name="" shortname="D" /> </Goal> </Question> </Goal> <Goal name="Testability"> <Question text="How easy is to test the SW?"> <Goal name="Value"> <Question text="How many test-cases should be changed/proved?"> <Metric name="" shortname="LOC" /> <Metric name="" shortname="WMC" /> <Metric name="" shortname="MCC" /> </Question> </Goal> <Goal name="Simplicity"> <Question text="How easy are test-cases to maintain?"> <Metric name="" shortname="FAN-OUT" /> <Metric name="" shortname="LOC" /> <Metric name="" shortname="RFC" /> </Question> </Goal> </Question> </Goal> <Goal name="Clonicity"> <Question text="Does the system have clones?"> <Metric name="" shortname="CLON" /> </Question> </Goal> <Goal name="Maturity"> <Question text="Should the SW be compared with original release because of changes?"> <Question text="How significant was the system changed?"> <Question text="How many new, changed, deleted objects does the system have?"> <Metric name="" shortname="SMI" /> </Question> </Question> </Question> </Goal> </Question> </Goal> <Additional> <Metric name="" shortname="TotalLOC" /> <Metric name="" shortname="TotalNOC" /> </Additional> </Model> 102 Appendix D. Metric-based quality report for Mobile Client 7.1 The list of complex methods (LOC > 80) Methodname com.sap.tc.mobile.cfs.console.MgmtConsole.run() com.sap.tc.mobile.cfs.utils.MutableString.compareTo() com.sap.tc.mobile.cfs.utils.AbstractSorter.sortIndexed() com.sap.tc.mobile.cfs.meta.mmw.MMWModelProviderImpl.importRelations() com.sap.tc.mobile.cfs.meta.mmw.MMWModelProviderImpl.importNodes() com.sap.tc.mobile.cfs.meta.mmw.MMWModelProviderImpl.importAttrs() com.sap.tc.mobile.cfs.meta.mmw.MMWModelProviderImpl.importDDIC() com.sap.tc.mobile.cfs.meta.mmw.MMWModelProviderImpl.getISOLanguage() com.sap.tc.mobile.cfs.pers.cache.DefaultPersistenceManager.flushPersistentObjects() com.sap.tc.mobile.cfs.pers.cache.DefaultPersistenceManager.deletePersistent() com.sap.tc.mobile.cfs.pers.cache.DefaultPersistenceManager.makePersistent() com.sap.tc.mobile.cfs.meta.ddic.DdicJcoImporterStruct.onlineImportDdic() com.sap.tc.mobile.cfs.meta.ddic.DdicJcoImporterStruct.linkSimpleWithValueEnum() com.sap.tc.mobile.cfs.meta.ddic.DdicModelDataProviderImpl.main() com.sap.tc.mobile.cfs.pers.spi.query.QueryParser.parseQueryInternal() com.sap.tc.mobile.cfs.pers.spi.query.QueryParser.parseConditionPrim() com.sap.tc.mobile.cfs.meta.mi25io.FieldContentHandler.createAttributeDescriptor() com.sap.tc.mobile.cfs.meta.io.TypeContentProducer.writeType() com.sap.tc.mobile.cfs.utils.io.SplitMessageFile.SplitMessageFile() com.sap.tc.mobile.logging.impl.FileLogger.log() MIXMLParser.ElementParser.parse() com.sap.tc.mobile.cfs.xml.api.MIXMLParser.parseEntity() LOC 121 101 83 106 112 123 131 129 119 153 88 92 81 89 274 94 116 115 91 131 111 98 Number of complex methods: 22 Total number of methods: 3396 103 The list of classes with DIT > 3 DIT - Depth in Inheritance Tree Classname DIT com.sap.tc.mobile.cfs.utils.ReadOnlyProperties 4 com.sap.tc.mobile.cfs.utils.ChainedException 4 com.sap.tc.mobile.cfs.pers.spi.DuplicateKeyException 6 com.sap.tc.mobile.cfs.pers.spi.ObjectNotFoundException 6 com.sap.tc.mobile.cfs.pers.spi.DBException 5 com.sap.tc.mobile.cfs.pers.spi.DBFatalException 6 com.sap.tc.mobile.cfs.pers.spi.ObjectDirtyException 6 com.sap.tc.mobile.exception.standard.SAPNumberFormatException 6 com.sap.tc.mobile.exception.standard.SAPUnsupportedOperationException 5 com.sap.tc.mobile.exception.standard.SAPIllegalAccessException 4 com.sap.tc.mobile.exception.standard.SAPIllegalStateException 5 com.sap.tc.mobile.exception.standard.SAPNullPointerException 5 com.sap.tc.mobile.exception.standard.SAPIllegalArgumentException 5 com.sap.tc.mobile.exception.standard.SAPIOException 4 com.sap.tc.mobile.cfs.pers.cache.JavaHashSet 4 com.sap.tc.mobile.cfs.pers.cache.DefaultCacheHandle 4 com.sap.tc.mobile.cfs.pers.cache.DefaultPersistenceManager 4 com.sap.tc.mobile.cfs.meta.mi25io.TopStructureContentHandler 4 com.sap.tc.mobile.cfs.meta.mi25io.ChildStructureContentHandler 4 com.sap.tc.mobile.cfs.pers.impl.spi.util.DBChainedException 6 com.sap.tc.mobile.cfs.xml.api.MIParseException 5 com.sap.tc.mobile.session.SessionChainedException 5 com.sap.tc.mobile.exception.BaseRuntimeException 4 com.sap.tc.mobile.cfs.utils.config.ConfigException 5 Number of classes with DIT > 3: 24 Total number of classes: 390 The list of complex stand-alone classes (WMC > 50) WMC - Weighted Methods of Class Classname WMC LOC com.sap.tc.mobile.cfs.utils.FastStringBuffer 95 490 com.sap.tc.mobile.cfs.pers.query.QueryLocalCandidateImpl 55 562 com.sap.tc.mobile.cfs.pers.spi.query.QueryParser 206 1330 com.sap.tc.mobile.cfs.xml.api.MIXMLParser 62 1024 Total number of classes: 390 104 The list of classes with NOC > 10 NOC - Number Of direct Children in inheritance tree Classname com.sap.tc.mobile.cfs.meta.api.StorageTypeDescriptor com.sap.tc.mobile.cfs.meta.api.AbstractDescriptor ClassDescriptorSPI.Instantiator com.sap.tc.mobile.cfs.meta.spi.AbstractDescriptorSPI com.sap.tc.mobile.cfs.utils.FastObjectHashEntry com.sap.tc.mobile.cfs.pers.spi.Persistable com.sap.tc.mobile.cfs.pers.spi.PersistableSPI com.sap.tc.mobile.cfs.type.spi.GenericAccessCapableSPI com.sap.tc.mobile.cfs.meta.mi25io.AbstractContentHandler com.sap.tc.mobile.cfs.meta.mi25io.BaseContentHandler com.sap.tc.mobile.cfs.type.api.GenericAccessCapable com.sap.tc.mobile.cfs.pers.impl.spi.cache.AbstractPersistable com.sap.tc.mobile.cfs.pers.impl.spi.cache.PersistableImpl com.sap.tc.mobile.cfs.xml.api.MIContentHandler com.sap.tc.mobile.cfs.xml.api.AbstractMIContentHandler com.sap.tc.mobile.exception.IBaseException NOC 15 34 15 23 19 16 13 14 13 12 18 11 12 27 25 17 Number of classes with NOC > 10: 16 Total number of classes: 390 The list of large classes (LOC > 500) Classname WMC LOC com.sap.tc.mobile.cfs.console.MgmtConsole 51 533 com.sap.tc.mobile.cfs.pers.spi.PersistenceManager 23 630 com.sap.tc.mobile.cfs.pers.query.QueryResultClassImpl 34 725 com.sap.tc.mobile.cfs.pers.query.QueryLocalCandidateImpl 55 562 com.sap.tc.mobile.cfs.pers.query.QueryImpl 96 847 com.sap.tc.mobile.cfs.meta.mmw.MMWModelProviderImpl 113 1178 com.sap.tc.mobile.cfs.pers.cache.BLOBImpl 114 843 com.sap.tc.mobile.cfs.pers.cache.DefaultPersistenceManager 352 2650 com.sap.tc.mobile.cfs.pers.cache.PersistentList 70 654 com.sap.tc.mobile.cfs.meta.ClassDescriptorImpl 216 1522 com.sap.tc.mobile.cfs.meta.ModelDescriptorImpl 83 721 com.sap.tc.mobile.cfs.meta.TypeDescriptorImpl 121 863 com.sap.tc.mobile.cfs.meta.AttributeDescriptorImpl 76 557 com.sap.tc.mobile.cfs.pers.spi.query.QueryParser 206 1330 com.sap.tc.mobile.cfs.meta.io.ModelContentHandler 31 516 com.sap.tc.mobile.cfs.xml.api.MIXMLParser 62 1024 com.sap.tc.mobile.logging.LogController 62 714 com.sap.tc.mobile.cfs.PersMessages 134 806 com.sap.tc.mobile.logging.msg.LogMessages 78 510 Total number of classes: 390 105 List of exact clones for Mobile Client 7.1 Clone Set Clone Set No 1 Number of LOC in all instances Length: instances: 26 2 52 27 2 54 30 2 60 70 2 140 145 2 290 156 7 1092 24 2 48 57 2 114 From line 476 to 501,com\sap\tc\mobile\cfs\pers\cache\BLOBImpl.java From line 530 to 555,com\sap\tc\mobile\cfs\pers\cache\BLOBImpl.java Clone Set No 2 From line 114 to 140,com\sap\tc\mobile\cfs\pers\spi\DBException.java From line 158 to 184,com\sap\tc\mobile\cfs\utils\ChainedException.java Clone Set No 3 From line 345 to 374,com\sap\tc\mobile\cfs\utils\FastStringBuffer.java From line 391 to 420,com\sap\tc\mobile\cfs\utils\FastStringBuffer.java Clone Set No 4 From line 278 to 347,com\sap\tc\mobile\cfs\utils\FastLongHash.java From line 288 to 357,com\sap\tc\mobile\cfs\utils\FastObjectHash.java Clone Set No 5 From line 45 to 189,com\sap\tc\mobile\exception\BaseException.java From line 45 to 189,com\sap\tc\mobile\exception\BaseRuntimeException.java Clone Set No 6 From line 57 to 212,com\sap\tc\mobile\exception\standard\SAPIOException.java From line 80 to 235, com\sap\tc\mobile\exception\standard\SAPIllegalAccessException.java From line 133 to 342, com\sap\tc\mobile\exception\standard\SAPIllegalArgumentException.java From line 80 to 235, com\sap\tc\mobile\exception\standard\SAPIllegalStateException.java From line 80 to 235, com\sap\tc\mobile\exception\standard\SAPNullPointerException.java From line 103 to 312, com\sap\tc\mobile\exception\standard\SAPNumberFormatException.java From line 64 to 270, com\sap\tc\mobile\exception\standard\SAPUnsupportedOperationException.java Clone Set No 7 From line 176 to 199,com\sap\tc\mobile\cfs\pers\query\QueryTypeCheck.java From line 221 to 244,com\sap\tc\mobile\cfs\pers\query\QueryTypeCheck.java Clone Set No 8 From line 23 to 79,com\sap\tc\mobile\logging\spi\CategorySPI.java From line 23 to 79,com\sap\tc\mobile\logging\spi\LocationSPI.java Total lines of code in clones 106 1850