here - realKD
Transcription
here - realKD
www.realKD.org Scalable and Repeatable Extrinsic Evaluation for Pattern Discovery Systems Mario Boley, Maike Krause-Traudes, Bo Kang, Björn Jacobs University of Bonn & Fraunhofer IAIS mario@realKD.org Aug 10 2015 Mario Boley, IDEA 2015 1 Recently at Q&A time... Q: This looks interesting, but is this really what users would want? A: Well, I guess in order to really confirm that, we would need to test this somehow with real users. Q: Yep, agreed. Thank you. Aug 10 2015 Mario Boley, IDEA 2015 2 Extrinsic evaluation can support ultimate value of contributions Photograph courtesy Dorothy Fragaszy (voices.nationalgeographic.com) Aug 10 2015 Mario Boley, IDEA 2015 3 Extrinsic means: “not depending on theory used for development cycle” Aug 10 2015 Mario Boley, IDEA 2015 4 Poll among ECMLPKDD authors: half skipped potentially useful studies Details at http://www.realkd.org/dm-userstudies/ecmlpkdd-authorpoll-march2015/ Aug 10 2015 Mario Boley, IDEA 2015 5 High costs are dominant reason for skipping on “study opportunity” No added benefit 5 Unclear how to recruit participants 55 High costs of conducting study 98.33333333 Insecurity of outcome and acceptance 15 0 10 20 30 40 50 60 70 80 90 100 % of “yes”-respondents Aug 10 2015 Mario Boley, IDEA 2015 6 High costs are dominant reason for skipping on “study opportunity” No added benefit of user study over automatized/formal evaluation 5 Unclear how to recruit suitable group of participants 55 Cost of developing study design 46.66666667 Cost of embedding contribution in accessible UI 40 Cost of organizing actual study 63.33333333 Cost of evaluating results 15 Insecurity of outcome and acceptance by peers 15 0 10 20 30 40 50 60 70 80 90 100 % of “yes”-respondents Aug 10 2015 Mario Boley, IDEA 2015 7 Creedo’s major contributions are… • Allows definition of reusable study designs • Elements focus on scalable evaluation in application context • Automatizes process Aug 10 2015 Mario Boley, IDEA 2015 8 A study is a process for providing evidence in favor or against... Hypothesis: “Users can solve a certain class of analysis tasks better with a specific target system than with other control systems.” Aug 10 2015 Mario Boley, IDEA 2015 9 A study is a process for providing evidence in favor or against... Hypothesis: “Users can solve a certain class of analysis tasks better with a specific target system than with other control systems.” Example: “Users can discover a set of interesting patterns faster using a FORSIEDbased association discovery process than when using a conventional* association discovery process.” *based on a static interestingness measure that is oblivious to prior and gained knowledge Aug 10 2015 Mario Boley, IDEA 2015 10 Data analysis systems are represented by Creedo analytics dashboards Aug 10 2015 Mario Boley, IDEA 2015 11 Algorithms can be integrated via the realKD library Aug 10 2015 Mario Boley, IDEA 2015 12 Creedo tasks bridge formal abstraction and application context 𝑞 𝑥 = 1 𝐷 𝑥 𝑝0 −𝑝𝑥 2 1. Introduction In this paper, we tackle the important problem of discovering interesting patterns from a given input dataset. … Aug 10 2015 for each 𝑑 ∈ 𝐷 if 𝑥 ∈ 𝐷 then 𝐷 𝑥 ←𝐷 𝑥 +1 … Mario Boley, IDEA 2015 13 Creedo tasks bridge formal abstraction and application context Aug 10 2015 Mario Boley, IDEA 2015 14 Creedo tasks bridge formal abstraction and application context Aug 10 2015 Mario Boley, IDEA 2015 15 User perspective on task are natural language instructions Aug 10 2015 Mario Boley, IDEA 2015 16 Creedo tasks bridge formal abstraction and application context Aug 10 2015 Mario Boley, IDEA 2015 17 Creedo tasks bridge formal abstraction and application context Aug 10 2015 Mario Boley, IDEA 2015 18 Creedo tasks bridge formal abstraction and application context Aug 10 2015 Mario Boley, IDEA 2015 19 Task also defines elementary attributes of results Aug 10 2015 Mario Boley, IDEA 2015 20 All measurements can be aggregated to system performance measures Aug 10 2015 Mario Boley, IDEA 2015 21 All measurements can be aggregated to system performance measures Aug 10 2015 Mario Boley, IDEA 2015 22 All measurements can be aggregated to system performance measures Aug 10 2015 Mario Boley, IDEA 2015 23 All measurements can be aggregated to system performance measures 𝑎 → avg{𝑡 𝑥 : 𝑐 𝑥 ≥ 𝜏, 𝑥 ∈ 𝑅𝑎 } Aug 10 2015 Mario Boley, IDEA 2015 24 Assignment logic can control biases and balance confidence Aug 10 2015 Mario Boley, IDEA 2015 25 Assignment logic can control biases and balance confidence Aug 10 2015 Mario Boley, IDEA 2015 26 Assignment logic can control biases and balance confidence Aug 10 2015 Mario Boley, IDEA 2015 27 Creedo organizes study process Aug 10 2015 Mario Boley, IDEA 2015 28 Yes, we can mario@realKD.org Aug 10 2015 Mario Boley, IDEA 2015 29