Generating the Blueprints of the Java Ecosystem
Transcription
Generating the Blueprints of the Java Ecosystem
Generating the Blueprints of the Java Ecosystem Vassilios Karakoidas, Dimtris Mitropoulos, Panos Louridas*, Georgios Gousios, Diomidis Spinellis Athens University of Economics and Business Department of Management Science and Technology *louridas@aueb.gr This work presents the dataset obtained by statically analysing a set of projects (11,365 projects) of the Maven Central Repository by three static analysis tools; Cross-Lanugage Metric Tool (CLMT), Chidamber and Kemerrer Java Metrics Tool (CKJM), and JDepend. These tools cover four aspects of a software project; class design, method design, package design and program size. 11,365 projects 22,730 Jars 74,565,772 LoC 65 Metrics 32,844,836 Measurements Detecting Domain-specific Language Usage in Open Source Projects Dataset Construction Process Maven Repository XML Filtered Java projects with source and binary jars Valid Project Collection 446,749 Artifacts SQL Github Repository RTF HTML DSLs Regular Expressions XPATH XSLT Workers #1 #2 #3 The detection process was easy, the source code was statically analysed and the usage of specific packages were detected e.g. java.util.regex (regular expressions), java.sql and javax.sql for SQL. Workers: Download the jars then execute clmt, ckjm, and jdepend #N How many DSLs are used per project? CKJM #1 XML with 3094 uses Analyse exported data Regex, 1751 8400 SQL, 1035 Measurements are analysed and stored in the MySQL database MySQL Database XSLT, 888 1000 Metric Categories Class 17 metrics Program Size 17 metrics Method 3 Metrics Package 6 metrics Note: These are the unique metrics per category, since the three tools have several in common. Number of Projets XPath, 190 HTML, 68 100 RTF, 7 Facts ~35% of the projects are using at least one DSL 10 547 projects are using four DSLs 1 0 1 2 Database Schema project prj_pk: int(11) prj_name: varchar(500) category cat_pk: int(11) cat_name: varchar(500) 3 4 5 Number of DSLs CLMT 8 projects are using 7 DSLs! Popular DSL Combinations measurement_type mt_pk: int(11) mt_name: varchar(500) measurement meas_pk: int(11) meas_value: varchar(500) meas_id: int(11) meas_filename: int(11) cat_pk: int(11) prj_pk: int(11) mt_pk: int(11) JDepend XML, XSLT (475) identifiers ident_pk: int(11) ident_name: varchar(500) Regex, XML (303) Regex, SQL, XML, XSLT (80) Regex, SQL, XML (71) Regex, XML, XSLT (158) SQL, XML, XSLT (54) SQL, XML (162) XML, XPath (50) ... Regex, SQL (116) Research Opportunities The dataset can be used by researchers to test their models and theories against a large set of emprical data e.g. fine tune software quality models that are based on metrics. Practicioners can test their tools and validate their calculations against CKJM and JDepend. This research has been co-financed by the European Union (European Social Fund - ESF) and Greek national funds through the Operational Program “Education and Lifelong Learning” of the National Strategic Reference Framework (nsrf) - Research Funding Program: Thalis - Athens University of Economics and Business - Software Engineering Research Platform. Contact Information Vassilios Karakoidas bkarak@aueb.gr