here - realKD

Transcription

here - realKD
www.realKD.org
Scalable and Repeatable Extrinsic
Evaluation for Pattern Discovery Systems
Mario Boley, Maike Krause-Traudes, Bo Kang, Björn Jacobs
University of Bonn & Fraunhofer IAIS
mario@realKD.org
Aug 10 2015
Mario Boley, IDEA 2015
1
Recently at Q&A time...
Q: This looks interesting, but is this really what users
would want?
A: Well, I guess in order to really confirm that, we
would need to test this somehow with real users.
Q: Yep, agreed. Thank you.
Aug 10 2015
Mario Boley, IDEA 2015
2
Extrinsic evaluation can support
ultimate value of contributions
Photograph courtesy Dorothy Fragaszy (voices.nationalgeographic.com)
Aug 10 2015
Mario Boley, IDEA 2015
3
Extrinsic means: “not depending on
theory used for development cycle”
Aug 10 2015
Mario Boley, IDEA 2015
4
Poll among ECMLPKDD authors:
half skipped potentially useful studies
Details at http://www.realkd.org/dm-userstudies/ecmlpkdd-authorpoll-march2015/
Aug 10 2015
Mario Boley, IDEA 2015
5
High costs are dominant reason for
skipping on “study opportunity”
No added benefit
5
Unclear how to recruit participants
55
High costs of conducting study
98.33333333
Insecurity of outcome and acceptance
15
0
10
20
30
40
50
60
70
80
90
100
% of “yes”-respondents
Aug 10 2015
Mario Boley, IDEA 2015
6
High costs are dominant reason for
skipping on “study opportunity”
No added benefit of user study over automatized/formal evaluation
5
Unclear how to recruit suitable group of participants
55
Cost of developing study design
46.66666667
Cost of embedding contribution in accessible UI
40
Cost of organizing actual study
63.33333333
Cost of evaluating results
15
Insecurity of outcome and acceptance by peers
15
0
10
20
30
40
50
60
70
80
90
100
% of “yes”-respondents
Aug 10 2015
Mario Boley, IDEA 2015
7
Creedo’s major contributions are…
• Allows definition of reusable study designs
• Elements focus on scalable evaluation in application
context
• Automatizes process
Aug 10 2015
Mario Boley, IDEA 2015
8
A study is a process for providing
evidence in favor or against...
Hypothesis:
“Users can solve a certain class of analysis tasks better with a specific
target system than with other control systems.”
Aug 10 2015
Mario Boley, IDEA 2015
9
A study is a process for providing
evidence in favor or against...
Hypothesis:
“Users can solve a certain class of analysis tasks better with a specific
target system than with other control systems.”
Example:
“Users can discover a set of interesting patterns faster using a FORSIEDbased association discovery process than when using a conventional*
association discovery process.”
*based on a static interestingness measure that is oblivious to prior and gained knowledge
Aug 10 2015
Mario Boley, IDEA 2015
10
Data analysis systems are represented
by Creedo analytics dashboards
Aug 10 2015
Mario Boley, IDEA 2015
11
Algorithms can be integrated via the
realKD library
Aug 10 2015
Mario Boley, IDEA 2015
12
Creedo tasks bridge formal abstraction
and application context
𝑞 𝑥 =
1
𝐷 𝑥
𝑝0 −𝑝𝑥 2
1. Introduction
In this paper, we tackle the
important problem of discovering
interesting patterns from a given
input dataset.
…
Aug 10 2015
for each 𝑑 ∈ 𝐷
if 𝑥 ∈ 𝐷 then
𝐷 𝑥 ←𝐷 𝑥 +1
…
Mario Boley, IDEA 2015
13
Creedo tasks bridge formal abstraction
and application context
Aug 10 2015
Mario Boley, IDEA 2015
14
Creedo tasks bridge formal abstraction
and application context
Aug 10 2015
Mario Boley, IDEA 2015
15
User perspective on task are natural
language instructions
Aug 10 2015
Mario Boley, IDEA 2015
16
Creedo tasks bridge formal abstraction
and application context
Aug 10 2015
Mario Boley, IDEA 2015
17
Creedo tasks bridge formal abstraction
and application context
Aug 10 2015
Mario Boley, IDEA 2015
18
Creedo tasks bridge formal abstraction
and application context
Aug 10 2015
Mario Boley, IDEA 2015
19
Task also defines elementary attributes
of results
Aug 10 2015
Mario Boley, IDEA 2015
20
All measurements can be aggregated
to system performance measures
Aug 10 2015
Mario Boley, IDEA 2015
21
All measurements can be aggregated
to system performance measures
Aug 10 2015
Mario Boley, IDEA 2015
22
All measurements can be aggregated
to system performance measures
Aug 10 2015
Mario Boley, IDEA 2015
23
All measurements can be aggregated
to system performance measures
𝑎 → avg{𝑡 𝑥 : 𝑐 𝑥 ≥ 𝜏, 𝑥 ∈ 𝑅𝑎 }
Aug 10 2015
Mario Boley, IDEA 2015
24
Assignment logic can control biases
and balance confidence
Aug 10 2015
Mario Boley, IDEA 2015
25
Assignment logic can control biases
and balance confidence
Aug 10 2015
Mario Boley, IDEA 2015
26
Assignment logic can control biases
and balance confidence
Aug 10 2015
Mario Boley, IDEA 2015
27
Creedo organizes study process
Aug 10 2015
Mario Boley, IDEA 2015
28
Yes, we can
mario@realKD.org
Aug 10 2015
Mario Boley, IDEA 2015
29