On the Challenges in Extracting Metrics from Java Bytecode
Transcription
On the Challenges in Extracting Metrics from Java Bytecode
On the Challenges in Extracting Metrics from Java Bytecode A/Prof. Jean-Guy Schneider jschneider@swin.edu.au Where it all began… .7 12.7: 0.6973 Gini Coefficient of Synthetic Fields .4 .5 .6 12.6: 0.6443 12.5: 0.5820 4.7: 0.4664 .3 4.6: 0.3484 0 20 40 60 80 100 RSN SCIENCE | TECHNOLOGY | INNOVATION 2 Overview Java Class File Ø Informa1on that can be extracted Ø … and some “misconcep1ons” q Nested Class Extrac1on Ø Preliminaries Ø Problems and Consequences Ø Nested Class Graph q A few “surprises” Ø Mo1va1ng example revisited q Lessons learnt q 3 SCIENCE | TECHNOLOGY | INNOVATION Quick Overview of Class File Format… There are 10 basic sec1ons to the Java Class File structure: 1. Magic Number: 0xCAFEBABE 2. Version of Class File Format: the minor and major versions of the class file 3. Constant Pool: Pool of constants for the class 4. Access Flags 5. Name of the class 6. Name of the super class 7. Interfaces: any interfaces the class implements 8. Fields: fields the class defines 9. Methods: methods the class defines 10. AWributes: aWributes of the class (e.g., the name of the sourcefile, enclosing method etc.) ☞ Many references to the Constant Pool in 5 -‐ 10 SCIENCE | TECHNOLOGY | INNOVATION 4 Analysis of Java Bytecode What we can extract from Java Bytecode q Classes and Interfaces q Visibility (and other access modifiers) q Deriva1ons (inheritance, interface implementa1ons) q Fields, Methods + Method Signatures, Excep1ons q Instruc1ons q Type Dependencies q Local Variables q Nested Classes, Enclosing Classes/Methods 5 SCIENCE | TECHNOLOGY | INNOVATION Data Sources used… Qualitas Corpus: hWp://qualitascorpus.com/ q Ant: 22 versions (1.1 – 1.8.4) q Freecol: 31 versions (0.3.0 – 0.10.7) q Helix Data Set: hWp://www.ict.swin.edu.au/research/projects/ helix/ q Kolmafia: 101 versions (4.0 – 14.1) q Xalan: 13 versions (1.0.0 – 2.7.1) q … and a few hand-‐crahed examples q 6 SCIENCE | TECHNOLOGY | INNOVATION When is a Java Type enclosed in another type? q No nested types in Java 1.0 q General assump1on: q From Java 1.1: ‘$’ indicates nes1ng of a type For example: q com/ice/tar/TarInputStream is top-‐level q com/ice/tar/TarInputStream$EntryAdapter is nested Hence: q Test for ‘$’ in type name will suffice… q q ☞ ‘$’ is a valid character for any Java idenHfier, including class and interface names! 7 SCIENCE | TECHNOLOGY | INNOVATION Example 1 class Foo$Bar { private int a = 23; public int mFoo$Bar (int x) { int $y = x + a; int $z = 3; return $y + $z; } } ☞ ☞ Perfectly well-‐formed class… But: Bytecode analysis indicates that method mFoo$Bar does not have any local variables! 8 SCIENCE | TECHNOLOGY | INNOVATION Example 2 public class Foo { private int x = 1; protected class Bar { int y = 2; public int mBar() { return new Object() { public int z = x + 3; }.z; } } } ☞ ☞ Top-‐level class Foo, nested class Bar with a nested anonymous class that extends Object Generated Bytecode names: Foo, Foo$Bar, Foo$Bar$1 9 SCIENCE | TECHNOLOGY | INNOVATION Example 2 – Bytecode… [ public ] Foo$Bar -> (top-level: false) super: java/lang/Object implements: [ ] Enclosed in: Foo Member: true Local: false Anonymous: false #Methods 3 - #Fields 2 attributes: SourceFile InnerClasses InnerClasses: [ Foo$Bar, Foo, Bar, [ protected ] ] InnerClasses: [ Foo$Bar$1, #0, #0, [ ] ] Inner Name, Outer Name, Simple Name, Access Flags Mismatch!! 10 SCIENCE | TECHNOLOGY | INNOVATION Example 2 (cont.) [ ] Foo$Bar$1 -> (top-level: false) super: java/lang/Object implements: [ ] Enclosed in: Foo$Bar -> mBar:()I Member: false Local: true Anonymous: true #Methods 1 - #Fields 2 attributes: SourceFile InnerClasses EnclosingMethod EnclosingMethodAttribute: Foo$Bar -> mBar:()I InnerClasses: [Foo$Bar, Foo, Bar, [ protected ] ] InnerClasses: [Foo$Bar$1, #0, #0, [ ] ] ☞ BTW: which Foo$Bar??? 11 SCIENCE | TECHNOLOGY | INNOVATION ObservaMon… and why it can be wrong… A nested class with bytecode name X$..$Y$Z is enclosed in class X$..$Y q Not always applicable L (up to 6% error rate) q Counter examples (from Freecol 0.5.0): net/sf/freecol/client/control/InGameController$2 is enclosed in net/sf/freecol/client/control/InGameController$1 net/sf/freecol/client/gui/panel/Declara1onDialog$5 is enclosed in net/sf/freecol/client/gui/panel/Declara1onDialog$SignaturePanel q Another example (from KoKmafia 4.0): net/sourceforge/kolmafia/KoLFrame$2 is enclosed in net/sourceforge/kolmafia/KoLFrame$ItemManagePanel$VerifyButtonPanel 12 SCIENCE | TECHNOLOGY | INNOVATION Nested Classes Graph starting from a top-level node, go through all inner class structures: exclude any 'self-defining' nested classes, that is, ones with Inner Name having the same string value as the current class C if the Outer Name of a nested class structure is zero -> a method M in the current class C is the defining scope for the nested local class -> record Inner Name as one of the directly nested classes of C (i.e. class Inner Name is directly enclosed by C) if the Outer Name is equal to the current class' name -> record Inner Name as one of the directly nested classes of C as it is a (static or non-static) member class of C proceed recursively with all recorded nested class names 13 SCIENCE | TECHNOLOGY | INNOVATION Example 2 – missing InformaMon [ ] Foo$Bar$1 -> (top-level: false) super: java/lang/Object implements: [ ] Enclosed in: ??? -> ??? Member: false Local: false Anonymous: true #Methods 1 - #Fields 2 attributes: SourceFile InnerClasses InnerClasses: [Foo$Bar, Foo, Bar, [ protected ] ] InnerClasses: [Foo$Bar$1, #0, #0, [ ] ] ☞ ☞ Without an EnclosingMethod aWribute, the defining “context” of a non-‐ member nested class cannot be uniquely determined! In case of missing InnerClasses informa1on, the enclosing class of a “hidden” nested class cannot always be uniquely determined, either… 14 SCIENCE | TECHNOLOGY | INNOVATION Java Classes CategorizaMon What is ohen published: q Top-‐level (package level) classes q Nested classes q Member-‐level classes q “Inner Classes” q Local classes (have a simple name) q Anonymous classes (no simple name) Anonymous classes are expressions – they can be used anywhere where an expression is allowed. ☞ Are ohen used to ini1alize member instances ☞ 15 SCIENCE | TECHNOLOGY | INNOVATION Java Classes CategorizaMon (cont.) Corrected classifica1on: q Top-‐level (package level) classes q Nested classes q Member-‐level classes q Anonymous classes q Local classes – can be either named or anonymous ☞ “Locality” of anonymous classes can only be safely determined if EnclosingMethod aWribute is present! Classifica1on is not orthogonal! ☞ 16 SCIENCE | TECHNOLOGY | INNOVATION Let’s have a look at some results… SCIENCE | TECHNOLOGY | INNOVATION 17 0 Gini - Percentage .2 .4 .6 .8 Ant – Gini 0 5 10 15 20 25 RSN GiniSyntheticFields %TopLevel SCIENCE | TECHNOLOGY | INNOVATION GiniSyntheticFields(Zero) 18 0 Frequency 500 1000 1500 Ant – Types of Classes 0 1000 2000 3000 Age (days since release 1) #Classes #Nested SCIENCE | TECHNOLOGY | INNOVATION 4000 #TopLevel #SyntheticNested 19 0 Frequency 200 400 600 Freecol – Types of Classes 0 1000 2000 3000 Age #Nested #NestedAnonymous SCIENCE | TECHNOLOGY | INNOVATION #NestedMember #NestedLocal 20 0 Frequency 500 1000 KoLmafia – Types of Classes 0 20 40 60 80 100 RSN #Classes #Nested SCIENCE | TECHNOLOGY | INNOVATION #TopLevel #SyntheticNested 21 Lessons Learnt q q q q Java Bytecode is a “rich” source of informa1on q Needs to be treated with “care” q Go back to the specifica1ons to find “correct” informa1on Beware of what Java compilers do q Missing informa1on q Wrongly generated informa1on (not according to specs): q Outer Name for anonymous classes q Local classes used in mul1ple methods Beware of (incomplete) heuris1cs q E.g., name of enclosing class Beware of pre-‐mature interpreta1ons ☞ use “whole” picture 22 SCIENCE | TECHNOLOGY | INNOVATION Lessons Learnt (cont.) q q Compiler: q If current compiler generates Bytecode in a par1cular way ☞ Do not assume that is the case for all compilers! Tes1ng: q use the tool as one of the case studies q have a LARGE corpus of case studies ☞ Special cases may only appear in certain systems! ☞ Many publica1ons in empirical SE do not quan1fy “error rates” when heuris1cs are used… 23 SCIENCE | TECHNOLOGY | INNOVATION 0 Gini / Percentage .2 .4 .6 KoLmafia – Adding RaMo Nested Classes 0 20 40 60 80 100 RSN GiniSyntheticFields %TopLevel SCIENCE | TECHNOLOGY | INNOVATION GiniSyntheticFields(Zero) 24 On the Challenges in Extracting Metrics from Java Bytecode A/Prof. Jean-Guy Schneider jschneider@swin.edu.au 25