Semantic Challenges in Getting Work Done
Transcription
Semantic Challenges in Getting Work Done
Semantic Challenges in Getting Work Done" Yolanda Gil Information Sciences Institute and Department of Computer Science University of Southern California http://www.isi.edu/~gil @yolandagil gil@isi.edu Keynote at the International Semantic Web Conference (ISWC) October 21, 2014 USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 1! Outline! 1. Managing work • 2. Knowledge rich tasks in science • 3. Semantic workflows Collaborative tasks in science • 4. Personal to do lists Organic data science Closing thoughts USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 2! Outline! 1. Managing work • 2. Knowledge rich tasks in science • 3. Semantic workflows Collaborative tasks in science • 4. 2 semantic challenges Personal to do lists Organic data science 2 semantic challenges 2 semantic challenges Closing thoughts USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 3! To Dos! Email requests Daily to-dos USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 4! To Dos! Monthly travel and deadlines By annual timeline (e.g. SIGAI) By project (next 6 months) CFPs, BAAs USC Information Sciences Institute This week Yolanda Gil ISWC-2014 gil@isi.edu 5! To Do Lists ■ To do lists are pervasive [Kirsh 01; Norman 91] • • ■ Prior research focused on user studies • ■ Used by more than 60% of people for personal information [Jones & Thomas 97] Used more than calendars, contact lists, etc. [Bellotti et al 04; Dey et al 00] Opportunity for assistance • Major potential impact on productivity USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 6! To Do List Management: Opportunities for Interpretation-based Assistance To Do List Manager Automate through agents Anticipate missing entries & sub-tasks Group and organize Get advice from others Get information from Web USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 7! What Are To Do Items Like: FB app! DIET!!! Shoes? Debate… need sneakers Be a true Christian Mafia blog update Think about more Facebook Money Making Ideas! Ending of reproductive abilities Get off my lazy arse and start achieving some stuff Renew registration for car Hotel reservation Pay bills Apply for financial aid Print plane tickets Renew BOFA card before you leave for summer Mettre les images des captures sur Facebook Spinatch and Bashamel Cupcakes Skriva kod till Simons webbsida Buy Air Blades Mk2 (Imperial) Watch ‘arry pottaaa! Buy Ablative Shell (Imperial) Ruff Racing Hyperblack 278 19’’ 275/35 & ■ ~1500 items collected from ~325 people 245/40 wrapped in NITTO 555R’s ■ Many are not amenable Order more AA Eneloop batteries to automation Return Fan to Westside via UPS ■ Many could be automated Return ugly jacket fully or in part USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 8! What Are To Do Items Like: Office! ■ Unusual structure • • • ■ Many ways to refer to the same task • ■ “Schedule meeting with John” Ambiguous references out of context • • ■ “Meet with John about paper”, “Discuss paper with John”, … Incomplete task specifications • ■ No verb: “quarterly report to Joe” Abbreviations (also typos): “Sched wed 15 ISI” Questions: “How to extract data for Steve” “Meet about paper” “Meet with Raytheon folks” Personal items • ■ ■ “Walk the dog” ■ ■ USC Information Sciences Institute Yolanda Gil Corpus of 2400 to-do entries from users of CALO office assistant 77% lack a verb 56% missing at least one argument 14% could be automated by agents ISWC-2014 gil@isi.edu 9! Opportunities To Do List Manager Automate through agents Agents Anticipate missing entries & sub-tasks Colleagues Group and organize On-Line Resources Get advice from others Get information from Web USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 10! Agents: Beamer for CALO and Radar [Gil & Ratnakar AAAI 2008]! ■ Match agent capabilities to user’s to dos Calendar Agent To Do ? • SchMtg <person> <topic> <time> <loc> ? • Set up discussion with Bill on ISWC paper ? USC Information Sciences Institute Yolanda Gil ISWC-2014 • • A2 Action2 <x1> <x2> A3 Action3 <z1> gil@isi.edu 11! Agents: Beamer for CALO and Radar [Gil & Ratnakar AAAI 2008]! ■ Use paraphrase patterns of agent capabilities to match them to user’s to dos Calendar Agent • SchMtg <person> <topic> <time> <loc> To Do • Set up discussion with Bill on ISWC paper USC Information Sciences Institute Match Yolanda Gil PARAPHRASES: • Set up discussion with <person> on <topic> • Meet about <topic> with <person> • … ISWC-2014 A2 Action2 <x1> <x2> • • A3 Action3 <z1> gil@isi.edu 12! Agents: Beamer for CALO and Radar [Gil & Ratnakar AAAI 2008]! ■ Use paraphrase patterns of agent capabilities to match them to user’s to dos Calendar Agent • SchMtg <person> <topic> <time> <loc> To Do • Set up discussion with Bill on ISWC paper ■ Match PARAPHRASES: • Set up discussion with <person> on <topic> • Meet about <topic> with <person> • … Evaluation with CALO office assistant corpus • • 86.7% accuracy in detecting relevance to agents only 0.2 to 0.4 edits needed to set up task parameters USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 13! The Need for Semantics Knowledge To Do List Manager Automate through agents Anticipate missing entries & sub-tasks Group and organize Get advice from others Get information from Web Yolanda Gil ISWC-2014 USC Information Sciences Institute gil@isi.edu 14! Paraphrase Game [Chklovski 2005]! USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 15! Common (Sense) Knowledge [Chklovski and Gil, K-CAP 2005, AAAI 2005]! Learner2 700,000+ statements collected from over 3,000 users USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 16! VerbOcean [Chkovski and Pantel, IJCNLP 2005]! hips between taks (a) Relationships betwee (a) Relationships between taks “Lunch” – likely duration 1hr USC Information Sciences Institute “Presentation” – likely duration 10mins Yolanda Gil ISWC-2014 gil@isi.edu 17! Managing To Dos through Colleagues: Social Task Networks [Groth et al 2010]! ■ To Do app for FB Social task networks • • • ■ Person People linked to their to-dos To-dos linked to their subtasks Tasks are linked to URIs which link to web resources To-do Web resource Shared to-do Shared technique Person | Person To-do | Resource To-do | Technique Task | Subtask Open Task Repository using Linked Data Principles USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 18! Managing To Dos through On-Line Resources [Vrandecic and Gil IUI 2011]! USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 19! http://www.isi.edu/~gil/publications.php Some Readings! ■ ■ Yolanda Gil, Varun Ratnakar, Timothy Chklovski, Paul T. Groth, Denny Vrandecic: “Capturing Common Knowledge about Tasks: Intelligent Assistance for To Do Lists.” ACM Transactions on Interactive Intelligent Systems, 2(3). 2012. Hans Chalupsky, Yolanda Gil, Craig A. Knoblock, Kristina Lerman, Jean Oh, David V. Pynadath, Thomas A. Russ, Milind Tambe: “Electric Elves: Agent Technology for Supporting Human Organizations.” AI Magazine 23(2): 11-24 (2002) USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 20! A Semantic Challenge: Managing Personal To Dos! To-Do list interfaces Agents/services, other people, advice web sites To Do List Manager USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 21! A Semantic Challenge: Coordinating To Dos of Different People! To Do List Manager To Do List Manager USC Information Sciences Institute Yolanda Gil To Do List Manager ISWC-2014 To Do List Manager gil@isi.edu 22! Semantic Challenges in Getting Work Done! ■ To dos • • Managing personal to dos Managing coordinated to dos ■ Knowledge rich tasks in science ■ Open science USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 23! Data-Intensive Computing in Science! USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 24! The Bottleneck is the Process, Not the Data!! ■ Today: significant human bottleneck in the scientific process What is the state of the art? What is a good problem to work on? What is a good experiment to design? What data should be collected? What is the best way to analyze the data? What are the implications of the experiments? What are appropriate revisions of current models? ■ Need to help machines understand the scientific research process in order to assist scientists • Semantics can be a game changer USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 25! Text Extraction in Hanalyzer (L. Hunter, U. Colorado)! Text extraction from publications Generation of interesting new hypotheses Semantic integration of biomedical databases USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 26! Robot Scientist [King et al 2009]! USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 27! Intelligent Science Assistants! What is the state of the art? What is a good problem to work on? What is a good experiment to design? What data should be collected? What is the best way to analyze the data? What are the implications of the experiments? What are appropriate revisions of current models? USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 28! Timely Analysis of Environmental Data [Gil et al ISWC 2011]! With Tom Harmon (UC Merced), Craig Knoblock and Pedro Szekely (ISI) California’s Central Valley: • Farming, pesticides, waste • Water releases • Restoration efforts USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 29! A Semantic Workflow! DailySensorData isa Hydrolab_Sensor_Data siteLong rdf:datatype=“float” siteLaHtude rdf:datatype=“float” dateStart rdf:datatype=“date” forSite rdf:datatype=”string” numberOfDayNights rdf:datatype=“int” avgDepth rdf:datatype=”float” avgFlow rdf:datatype=“float” Owens-Gibbs Model O’Connor-Dobbins Model Churchill Model USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 30! Semantic Workflows in Wings [Gil et al 10][Gil et al 09][Kim & Gil et al 08][Kim et al 06]! ■ Workflows are augmented with semantic constraints • Each workflow constituent has a variable associated with it – Workflow components, arguments, datasets • • Constraints are used to restrict workflow variables Can define abstract classes of components – Concrete components model exec. codes ■ ■ ■ Workflow reasoners propagate and use semantic constraints Uses semantic web standards: OWL/ RDF, SPARQL Compilation of workflows to scalable execution infrastructure USC Information Sciences Institute Yolanda Gil www.wings-workflows.org ISWC-2014 gil@isi.edu 9 31! Semantic Components in WINGS [Gil iEMSs 2014]! Classes of models/ components I/O Data constraints Use constraints USC Information Sciences Institute Yolanda Gil ;; Depth must be over .6m [ CMInvalidity1: (?c rdf:type pcdom:ReaerationCMClass) (?c pc:hasInput ?idv) (?idv pc:hasArgumentID 'InputParameters') (?idv dcdom:depth ?depth) le(?depth '0.61’) -> (?c pc:isInvalid 'true’)] gil@isi.edu ISWC-2014 32! WINGS Specializes Workflow Based on Characteristics of Daily Data! <dcdom:Hydrolab_Sensor_Data rdf:ID=“Hydrolab-‐CDEC-‐04272011"> <dcdom:siteLong rdf:datatype=“float">-‐120.931</dcdom:siteLongitude> <dcdom:siteLaHtude rdf:datatype=“float">37.371</dcdom:siteLaHtude> <dcdom:dateStart rdf:datatype=“date">2011-‐04-‐27</dcdom:dateStart> <dcdom:forSite rdf:datatype=”string">MST</dcdom:forSite> <dcdom:numberOfDayNights rdf:datatype=“int">1</dcdom:numberOfDayNights> <dcdom:avgDepth rdf:datatype=”float">4.523957</dcdom:avgDepth> <dcdom:avgFlow rdf:datatype=“float">2399</dcdom:avgFlow> </dcdom:Hydrolab_Sensor_Data> 2) Choice of models 1) Parameter se+ngs Owens-Gibbs Model O’Connor-Dobbins Model Churchill Model 3) Metadata of new results <dcdom:Metabolism_Results rdf:ID=“Metabolism_Results-‐CDEC-‐04272011"> <dcdom:siteLong rdf:datatype=“float">-‐120.931</dcdom:siteLongitude> <dcdom:siteLaHtude rdf:datatype=“float">37.371</dcdom:siteLaHtude> <dcdom:dateStart rdf:datatype=“date">2011-‐04-‐27</dcdom:dateStart> <dcdom:forSite rdf:datatype=”string">MST</dcdom:forSite> <dcdom:numberOfDayNights rdf:datatype=“int">1</dcdom:numberOfDayNights> <dcdom:avgDepth rdf:datatype=”float">4.523957</dcdom:avgDepth> <dcdom:avgFlow rdf:datatype=“float">2399</dcdom:avgFlow> </dcdom: Metabolism_Results> USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 33! WINGS Dynamically Selects Appropriate Model Based on Daily Sensor Readings! Churchill model USC Information Sciences Institute O’Connor-Dobbins model Yolanda Gil ISWC-2014 Owens-Gibbs model gil@isi.edu 34! WINGS Workflow Reasoners! Input data for decision tree modelers (eg ID3) must be discrete ?Dataset4 dcdom:isDiscrete true ■ ■ Key idea: Skeletal planning, where constraints for each component are propagated through a fixed workflow structure (the skeleton) Phase 1: Goal Regression • • ■ Phase 2: Forward Projection • • USC Information Sciences Institute Yolanda Gil Starting from final products, traverse workflow backwards For each node, query for constraints on inputs Starting from input datasets, traverse workflow forwards For each node, query for constraints on ISWC-2014 outputs 35! gil@isi.edu Example (Step 1 of 5)! Rule in Component Catalog: [modelerSpecialCase2: (?c rdf:type pcdom:ID3ModelerClass) (?c pc:hasInput ?idv) (?idv pc:hasArgumentID "trainingData”) ?Dataset4 dcdom:isDiscrete true -> (?idv dcdom:isDiscrete "true"^^xsd:boolean)] Model5! USC Information Sciences Institute Yolanda Gil Model6! ISWC-2014 Model7! gil@isi.edu 36! Example (Step 2 of 5)! ?Dataset3 dcdom:isDiscrete true Rule in Component Catalog: [samplerTransfer: (?c rdf:type pcdom:RandomSampleNClass) (?c pc:hasOutput ?odv) (?odv pc:hasArgumentID "randomSampleNOutputData") (?c pc:hasInput ?idv) (?idv pc:hasArgumentID "randomSampleNInputData”) (?odv ?p ?val) (?p rdfs:subPropertyOf dc:hasMetrics) ?Dataset4 dcdom:isDiscrete true Model5! Model6! Model7! -> (?idv ?p ?val)] USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 37! Example (Step 3 of 5)! ?TrainingData dcdom:isDiscrete true ?Dataset3 dcdom:isDiscrete true Rule in Component Catalog: [normalizerTransfer: (?c rdf:type pcdom:NormalizeClass) (?c pc:hasOutput ?odv) (?odv pc:hasArgumentID "normalizeOutputData") (?c pc:hasInput ?idv) (?idv pc:hasArgumentID "normalizeInputData") (?odv ?p ?val) (?p rdfs:subPropertyOf dc:hasMetrics ?Dataset4 dcdom:isDiscrete true Model5! Model6! Model7! -> (?idv ?p ?val)] USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 38! Example (Step 4 of 5)! ?TrainingData dcdom:isDiscrete true ?Dataset3 dcdom:isDiscrete true Rule in Component Catalog: [modelerTransferFwdData: (?c rdf:type pcdom:ModelerClass) (?c pc:hasOutput ?odv) (?odv pc:hasArgumentID "outputModel”) (?c pc:hasInput ?idv) (?idv pc:hasArgumentID "trainingData") (?idv ?p ?val) (?p rdfs:subPropertyOf dc:hasDataMetrics) notEqual(?p dcdom:isSampled) ?Dataset4 dcdom:isDiscrete true Model5! Model6! ?Model5 dcdom:isDiscrete true ?Model6 dcdom:isDiscrete true ?Model7 dcdom:isDiscrete true -> (?odv ?p ?val)] USC Information Sciences Institute Model7! Yolanda Gil ISWC-2014 gil@isi.edu 39! Example (Step 5 of 5)! ?TrainingData dcdom:isDiscrete true Rule in Component Catalog: ?Dataset3 dcdom:isDiscrete true [voteClassifierTransferDataFwd10: (?c rdf:type pcdom:VoteClassifierClass) (?c pc:hasInput ?idvmodel1) (?idvmodel1 pc:hasArgumentID "voteInput1") (?c pc:hasInput ?idvmodel2) (?idvmodel2 pc:hasArgumentID "voteInput2") (?c pc:hasInput ?idvmodel3) (?idvmodel3 pc:hasArgumentID "voteInput3") (?c pc:hasInput ?idvdata) (?idvdata pc:hasArgumentID "voteInputData") (?idvmodel1 dcdom:isDiscrete ?val1) (?idvmodel2 dcdom:isDiscrete ?val2) Model5! (?idvmodel3 dcdom:isDiscrete ?val3) equal(?val1, ?val2), equal(?val2, ?val3) ?TestData dcdom:isDiscrete -> (?idvdata dcdom:isDiscrete ?va1l)] true USC Information Sciences Institute Yolanda Gil ?Dataset4 dcdom:isDiscrete true Model6! Model7! ?Model5 dcdom:isDiscrete true ?Model6 dcdom:isDiscrete true ?Model7 dcdom:isDiscrete true ISWC-2014 gil@isi.edu 40! WINGS Workflow Reasoners: Result! ?TrainingData dcdom:isDiscrete true ?Dataset3 dcdom:isDiscrete true ?Dataset4 dcdom:isDiscrete true ?Dataset4 dcdom:isDiscrete true Model5! Model6! ?Model5 dcdom:isDiscrete true ?Model6 dcdom:isDiscrete true ?Model7 dcdom:isDiscrete true ?TestData dcdom:isDiscrete true USC Information Sciences Institute Yolanda Gil Model7! ISWC-2014 gil@isi.edu 41! unified well-formed req. WINGS Automatic Workflow Generation Algorithm [Gil et al JETAI 2011]! Seed workflow from request seeded workflows Find input data requirements Work with P. Gonzalez (UCM) and Jihie Kim (ISI) Workflows with S. McWeeney & C. Zhang (OHSU) binding-ready workflows Data source selection “Pay-asyou-go” semantics bound workflows Parameter selection configured workflows Workflow instantiation workflow instances Workflow grounding ground workflows Workflow ranking top-k workflows Workflow mapping USC Information Sciences Institute Yolanda Gil ISWC-2014 42! gil@isi.edu executable workflows Workflows! ■ ■ Workflow systems ■ [Goble et al 2007] ■ [Ludaescher et al 2007] ■ [Freire et al 2008] ■ [Mattmann et al 2007] ■ [Mesirov et al 2009] ■ [Dinov et al 2009] Workflow representations ■ [Moreau et al 2010] ■ [IBM/MSR 2002] USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 43! Semantic Process Models! ■ Composition from first principles • • • • • • • ■ [McIlraith & Son KR 2002] [Sohrabi et al ISWC 2006] [Sohrabi & McIlraith ISWC 2009] [Sohrabi & McIlraith ISWC 2010] [McDermott AIPS 2002] [Kuter et al ISWC 2004] [Sirin et al JWS 2005] [Kuter et al JWS 2005] [Lin et al ESWC 2008] [Lecue ISWC 2009] [Calvanese et al IEEE 2008] [Bertolli et al ICAPS 2009] [Li et al ISSC 2011] Representations • • • [Burstein et al ISWC 2002] [Martin et al ISWC 2007] [Domingue & Fensel IEEE IS 2008] [Dietze et al IJWSR 2011] [Dietze et al ESWC 2009] [Fensel et al 2011] [Vitvar et al ESWC 2008] [Roman et al AO 2005] USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 44! Semantic Descriptions of Software Components in Geosciences Work with C. Duffy (PSU), S. Peckham (CU), C. Mattmann (JPL), J. Howison (UT) ! USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 45! CSDMS Standard Names [Peckham iEMSs 2014] http://csdms.colorado.edu/! USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 46! USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 47! Benefits of Semantic Workflows: 1) Automatic Workflow Elaboration [Gil et al WORKS’13]! Workflowsdeveloped with Y. Liu (USC) and C. Mattmann (JPL) LDA Online LDA Parallel LDA USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 48! Benefits of Semantic Workflows: 2) Access to Data Analytics Expertise! Science, Dec 2011 USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 49! Capturing Expertise through Workflows [Hauder et al e-Science 2011]! Workflows for text analytics, joint work with Yan Liu (USC) and Mattheus Hauder (TUM) Naïve Approach Expert Approach Feature selection USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 50! Capturing Expertise [Gil et al 2012]! Work with Christopher Mason (Cornell University) Workflows for population genomics Association Tests Variant Discovery from Resequencing USC Information Sciences Institute Yolanda Gil CNV Detection ISWC-2014 Transmission Disequilibrium Test gil@isi.edu 51! Benefits of Semantic Workflows: 3) Saving Time Through Reuse [Sethi et al MM’13]! Work with Ricky Sethi and Hyujoon Jo of USC USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 52! Saving Time through Reuse [Garijo et al FGCS’13]! Work with D. Garijo and O. Corcho (UPM), P. Alper, K. Belhajjame, and C. Goble (UM) Result • “Scientists and engineers spend more than 60% of their time just preparing the data for model input or data-model comparison” (NASA A40) USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 53! Su Measuring Time Savings with “Reproducibility Maps” [Garijo et al PLOS CB12]! Work with D. Garijo of UPM and P. Bourne of UCSD ■ ■ 2 months of effort in reproducing published method (in PLoS’10) Authors expertise was required Comparison of ligand binding sites Comparison of dissimilar protein structures Graph network genera?on Molecular Docking USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 54! Benefits of Semantic Workflows: 4) Interoperability in a Workflow Ecosystem [Garijo et al 2014]! Work with D. Garijo and O. Corcho of UPM WT'(.pipe)' Workflow'Execu,on' LONI&Pipeline& Repository' (.pipe)' Workflow'Genera,on' Wings& WME'to' Wings' DAXEC' to'Wings' WI'(PEPlan)' DAXEC'to' OPMW'&'PROV' WE+WT' Repository' (PROV,'PEPLAN,' OPMW)' WE'(OPMW' '&'PROV)' WE'(WME)' WT(PEPlan)' WE,''WT'' (OPMW'&PROV)' Workflow'Browsing' Wexp& WE'(PROV)' to'RO' WE,''WT'' (OPMW'&PROV)' PEPLAN'to' CASEPGE' OPMW'to' DEPROV' Workflow' Execu,on' Apache&OODT& Workflow'Documenta,on' Organic&data&science&wiki& WT'(OPMW)' OPMW' PEPLAN' to'DAX' Workflow'Mapping' and'Execu,on' Pegasus/Condor& Wings'to' OPMW'&' PEPlan' WE'(DAXEC)' WI'(PEPlan)' Workflow'Mining' FragFlow' WT'(PEPlan)' WT'(PEPlan)' WI'(Wings)' Wings'to' PEplan' LONI'to'' PEPlan' Provenance'visualiza,on' (e.g.,'Prov7o7viz)' RO'Model'applica,on' DEPROV'applica,on' Ecosystem'Tool' Working'converter' WME'to'OPMW' &'PROV' USC Information Sciences Institute Yolanda Gil Planned'converter' Repositories' Current'dataflow' ISWC-2014 Planned'dataflow' Other'workflow'tool' SPARQL'construct' converter' gil@isi.edu 55! Benefits of Semantic Workflows: 4) Interoperability in a Workflow Ecosystem [Garijo et al 2014]! Work with D. Garijo and O. Corcho of UPM WT'(.pipe)' Workflow'Execu,on' LONI&Pipeline& Repository' (.pipe)' LONI'to'' PEPlan' Workflow'Mining' FragFlow' WT'(PEPlan)' Workflows are: WT(PEPlan)' Workflow'Genera,on' - Wings& Described with semantic metadata Workflow'Documenta,on' - Published as Web objects (linked open data) Organic&data&science&wiki& Repository' (PROV,'PEPLAN,' - Imported by systems with diverse functions: OPMW)' Workflow'Browsing' (eg editing, execution, provenance browsing, Wexp& workflow mining, etc) Provenance'visualiza,on' WT'(PEPlan)' WI'(Wings)' Wings'to' PEplan' WI'(PEPlan)' WME'to' Wings' DAXEC' to'Wings' WT'(OPMW)' WE'(OPMW' '&'PROV)' WE'(WME)' WE'(PROV)' OPMW' to'RO' PEPLAN' to'DAX' DAXEC'to' OPMW'&'PROV' WE+WT' WE'(DAXEC)' WI'(PEPlan)' Workflow'Mapping' and'Execu,on' Pegasus/Condor& WE,''WT'' (OPMW'&PROV)' Wings'to' OPMW'&' PEPlan' WE,''WT'' (OPMW'&PROV)' PEPLAN'to' CASEPGE' OPMW'to' DEPROV' Workflow' Execu,on' Apache&OODT& (e.g.,'Prov7o7viz)' RO'Model'applica,on' DEPROV'applica,on' Ecosystem'Tool' Working'converter' WME'to'OPMW' &'PROV' USC Information Sciences Institute Yolanda Gil Planned'converter' Repositories' Current'dataflow' ISWC-2014 Planned'dataflow' Other'workflow'tool' SPARQL'construct' converter' gil@isi.edu 56! http://www.isi.edu/~gil/publications.php Some Readings! ■ ■ ■ Yolanda Gil: “Intelligent Workflow Systems and Provenance-Aware Software.” Proceedings of the Seventh International Congress on Environmental Modeling and Software (iEMSs), San Diego, CA, 2014. Yolanda Gil: “From Data to Knowledge to Discoveries: Artificial Intelligence and Scientific Workflows.” Scientific Programming 17(3), 2009. Ewa Deelman, Chris Duffy, Yolanda Gil, Suresh Marru, Marlon Pierce, and Gerry Wiener: “EarthCube Report on a Workflows Roadmap for the Geosciences.” National Science Foundation, Arlington, VA. 2012. USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 57! A Semantic Challenge: Automatic Paper Generator! ■ Capture knowledge about analytic methods • • USC Information Sciences Institute Yolanda Gil ISWC-2014 Run workflows in existing data repositories Report new findings gil@isi.edu 58! A Semantic Challenge: A Web of Semantic Workflows/Processes! Assist people to: ■ Share ■ Copy ■ Reuse ■ Adapt ■ Remix ■ Update ■ Certify ■ Review ■ … USC Information Sciences Institute “Pay-as-you-go” semantics Yolanda Gil ISWC-2014 gil@isi.edu 59! Semantic Challenges in Getting Work Done! ■ To dos • • ■ Knowledge rich tasks in science • • ■ Managing personal to dos Managing coordinated to dos Automatic paper generator A Web of semantic workflows/processes Open science USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 60! Collaboration to Develop Workflows [Gil et al 2007]! Slide from T. Jordan of USC and SCEC Seismicity ! Paleoseismology! Local site effects! Faults! Geologic structure! High-Level Workflow Seismic Hazard Model Stress! transfer! Crustal! motion! USC Information Sciences Institute Crustal! deformation! Yolanda Gil Seismic velocity! structure! ISWC-2014 Rupture! dynamics! ! gil@isi.edu 61! Understanding the “Age of Water”! Work with P. Hanson (UWisc), C. Duffy (PSU), and J. Read (USGS) Research Soil Survey Lidar-derived numerical mesh Linking Catchment Model-Data Assets Supported by NSF GEO-CZO High Resolution Vegetation Mapping Mapping Bedrock GP Radar Linking Lake Model-Data Assets Supported by NSF BIO-GLEON, USGS CIDA High-resolution sensor network data Models of lake hydrodynamics and water quality USC Information Sciences Institute Read JS, et al. 2014. Ecological Modelling. 291C: 142-150. doi:10.1016/j.ecolmodel.2014.07.029 Yolanda Gil ISWC-2014 gil@isi.edu 62! A New Kind of Collaborative Platform! ■ Taxonomy of Science Communities [Bos et al 2007]! Shared'Instruments Community'Data'Systems Open'Community'Contribution'Systems Virtual'Communities'of'Practice Virtual'Learning'Communities Distributed'Research'Centers Community'Infrastructure'Projects ! ■ NEON PDB Zooniverse GLEON VIVO ENCODE CSDMS Need a platform to support science collaborations that require:! • • • Significant organization and coordination Maintaining a community over the longer term Growing the community based on unanticipated needs USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 63! Organic Data Science! Work with F. Michel and M. Hauder of TUM ■ Organic data science is a novel approach to on-line scientific collaboration that supports: • Self-organization of communities by enabling any user to specify and decompose tasks • On-line community support by incorporating social sciences principles and best practices • An open science process by capturing new kinds of metadata about the collaboration that give necessary context to newcomers USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 64! Self-Organization through Task Decomposition! ■ ■ ■ Many tasks involved Necessary data resides in different repositories Different people understand different kinds of data • • ■ USC Information Sciences Institute Where it is How to use it If other data needed, unclear who has it Yolanda Gil ISWC-2014 gil@isi.edu 65! Social Principles for Online Communities! USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 66! Social Principles: Some Examples! ■ Starting communities, e.g.: • • ■ Encouraging contributions, e.g.: • • ■ Simple tasks with challenging goals are easier to comply with Publicize that others have complied with requests Encouraging commitment, e.g.: • ■ Organize content, people, and activities into subspaces Inactive tasks should have “expected active times” Interdependent tasks increase commitments and reduce conflict Dealing with newcomers, e.g.: • • Design common learning experiences for newcomers Provide sandboxes while they are learning USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 67! Opening Science: Polymath [Nielsen, Gowers 09]! USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 68! Organic Data Science! => Task-oriented self-organizing on-line communities for open collaboration in science ■ Organic data science is a novel approach to on-line scientific collaboration that supports: • Self-organization of communities by enabling any user to specify and decompose tasks • On-line community support by incorporating social sciences principles and best practices • An open science process by capturing new kinds of metadata about the collaboration that give necessary context to newcomers USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 69! Self-Organization through Dynamic Task Decomposition! USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 70! ! Organic Data Science: Contributors! USC Information Sciences Institute Yolanda Gil ! ISWC-2014 gil@isi.edu 71! Data! USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 72! Models! USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 73! Workflows! USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 74! Training Newcomers! Reader Participant Owner USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 75! What Features Are Used to Manage Tasks?! ! USC Information Sciences Institute Yolanda Gil ISWC-2014 ! gil@isi.edu 76! How Do Users Find Relevant Tasks?! ! USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 77! Are Users Collaborating?! A 52% of tasks are viewed by more than one person B 72% of tasks have more than one person signed up C 19% of tasks have more than one person editing metadata than one person editing content D 11% of tasks have more Yolanda Gil ISWC-2014 USC Information Sciences Institute gil@isi.edu 78! What Does the Social Network of Collaborators Look Like?! Network of users (nodes) linked by shared tasks ■ Links across all users ■ Two distinct subgroups ■ 1. 2. Water Software ! USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 79! A Semantic Challenge: Email-less Coordination for Projects! USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 80! A Semantic Challenge: Open Science Processes! Datasets&linked&to& download&sites& From: http://www.ncdc.noaa.gov/paleo/metadata/noaa-coral-1865.html {{ #ask: [[Is a::dataset]] | ?Domain=geochemistry | ?Archive | ?MeasurementMaterial | ?MeasurementStandard | ?MeasurementUnits}} www Metadata& proper*e s&created& on&the&fly& Datasets& linked&to& loca*ons& Projects& linked&to& datasets& En**es&linked&to&&linked& data& Metadata& added&by& different& volunteers& Credits Informa*on& sources&are& documented& People&linked&to& projects& USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 81! Semantic Challenges in Getting Work Done! ■ To dos • • ■ Knowledge rich tasks in science • • ■ Managing personal to dos Managing coordinated to dos Automatic paper generator A Web of semantic workflows/processes Open science • • Email-less coordination of projects Open science processes http://www.isi.edu/~gil http://www.wings-workflows.org http://www.organicdatascience.org http://discoveryinformaticsinitiative.org USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 82! “We need bigger glasses and more hands in the water” – J. Tarter, SETI Institute! USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 83! ILLUSTRATION: P. HUEY/ INSIGHTS | the PERSPECT I V E S entists in many fields based are augmenting entists in many fields are augmenting group of cognitive systems, largely on 1 powerinofneural searchnetworks by usingand machine-reada power of search by Institute, using machine-readable advances neurologiInformation Sciences University of Southern biological imaging ( 7), species preservation scientists’ workload and false-positive rates ment 2 California, and Marina Semantic del Rey, CA 90292, USA. Pacifi c Northwest ontologies and Semantic Web technol ontologies Web technology cally inspired computation, is beginning to (8), and quantum chemistry (9). in identifying supernovae (14). techn 3 National Richland, WA 99354, USA. Information These fourjust systems are representative of distri ( 4) to tag not scientific articles (4) to tagLaboratory, not just scientific articles but show promise in the analysis of nontextual DATA. AI techniques have acthe ways that more advanced AI can serve Howe Technology and Web Science, RensselaerDIGESTING Polytechnic celerated the pace and quality of analysis of scientific ends. They are based on explicit limit also figures and videos, blogs, data s also Institute, figures and videos, blogs, data sets, processing, especially of online images and Troy, NY 12203, USA. 4Cornell University, Ithaca, NY the huge quantities of data that can stream representations of science processes, and the c USA. *E-mail: hendler@cs.rpi.edu and computational services, which all and 14850, computational services, which allow video, across a wide of areas including from modern laboratory equipment. To dethey reasonrange about these to automate protion Discovery Informatics: Knowledge-Rich Science Infrastructure! Published by AAAS ARTIFICIAL INTELLIGENCE Amplify scientific discovery with artificial intelligence Many human activities are a bottleneck in progress By Yolanda Gil,1 Mark Greaves,2 James Hendler,3* Haym Hirsh4 T echnological innovations are penetrating all areas of science, making predominantly human activities a principal bottleneck in scientific progress while also making scientific advancement more subject to error and harder to reproduce. This is an area where a new generation of artificial intelligence (AI) systems can radically transform the practice of scientific discovery. Such systems are showing an increasing ability to POLICY automate scientific data analysis and discovery processes, can searchInformation systematicallySciences and correctly through USC Institute hypothesis spaces to ensure best results, can autonomously discover complex patterns in data, and can reliably apply small-scale sci- “AI-based systems that can represent hypotheses … can reduce the error-prone human bottleneck in … discovery.” trans navig 17enthu 1 On to me are e mach wher defin sured such for th field. vance gies i the im An of th mark non-A datab atten ers a fictio rathe logica of he be a and b Th lenge gies. brain the b emag.org on October 9, 2014 SCIENCE sciencemag.org rive scientific insight from data at this scale, cesses and assist the human scientist. Destandard methods include applying dimenvelopment of the explicit representations of sionality-reduction techniques and scientific which they are based 10 feature OCTOBER 2014processes • VOL on 346 ISSUE 6206 extractors to create high-speed classifiers is complex. When successful, the computer based on machine-learning approaches, such can become a real (although junior) parby AAAS beyond current as Bayesian networks or support-vector ma- information-finding ticipant inPublished the science process, doingsearch what limitations. chines. Because the phenomena under study it does best: applying algorithmic methods Webringing can project a not-so-distant future often exist in nonstationary environments and knowledge to bear in a consiswhere “intelligent science assistant” or in contexts with only small quantities of tent, systematic, and complete manner. prolabeled data that can be used for training— grams identify and summarize relevant described across the worldwide complex, unsupervised, and reinforcement research A VIRTUOUS CIRCLE. Developing systems spectrum blogs, preprint armachine-learning techniques are critical for multilingual like these is not just anofexercise in AI applichives, and discussion forums; find or gendata analysis. These types of approaches are cation—it affects the direction of AI research. new hypotheses that might confirm or being used in recent projects in data-rich ar- erate Addressing real challenges of science pushes ongoing work; areas, and even rerun eas as diverse as chemical structure predic- conflict the AI with envelope in many including increased the numbers of interested particiold analyses when a new computational tion, pathway analysis and identification in knowledge representation, automatic inferpants; steady ofexponential becomes available. Aided by such systemsMoore’s biology,law the and processing large-scale method ence, process reasoning, hypothesis generaincreases in computing power; and expoa system, the scientist will focus on more geophysics data, and others. tion, natural language processing, machine nential increases and broad the creative aspects interaction, of research, and withina Another, more in, ambitious classavailability of intelli- of learning, collaborative of, relevant data in volumes never previously larger fraction of the routine work left to the gent systems is being developed under the telligent user interfaces. This interaction seen. scientificScience efforts that have leverartificially intelligent assistant. rubricThose of Discovery or, increasingly, aged AI advances have largely harnessed soNew types of intelligent systems that can Discovery Informatics (10). These systems phisticated techniques to enhance scientific efforts in this manner are enhance themachine-learning intelligent assistants described create correlative predictions from large sets earlier with the capability to attack scientific transitioning from academic and industrial of “bigthat data.” Such work aligns wellincreasing with the research laboratories. A term gathering poptasks combine rote work with current needs of petaand exascale science. amounts of adaptivity and freedom. These ularity for systems that intelligently process However, AI encoded has far broader capacity to aconline information beyond search is “cognisystems use knowledge of scientific domains and processes in order to assist with tasks that previously required human knowledge and reasoning. In fact, several sciences have significant investments in the creates a virtuous circle where advances in 84! Yolanda Gil ISWC-2014 gil@isi.edu representation of vast amounts of scientific science go hand in hand with advances in AI. knowledge and are poised to explore new inThis virtuous circle can only work well if baltelligent systems that exploit that knowledge anced and well oiled. http://www.discoveryinformaticsinitiative.org" " PSB Workshop (January 2015) KDD Workshop (August 2014): http://ailab.ist.psu.edu/idkdd14/ AAAI Workshop (July 2014): http://discoveryinformaticsinitiative/diw2014 AAAI Fall Symposium (Nov 2013): http://discoveryinformaticsinitiative/dis2013 AAAI Fall Symposium (Nov 2012): http://discoveryinformaticsinitiative/dis2012 Microsoft eScience Summit (Aug 2012) Workshop on Web Observatories for Discovery Informatics PSB Workshop (Jan 2013): on Computational Challenges of Mass Phenotyping USC Information Sciences Institute NSF Workshop (Feb 2012): http://discoveryinformaticsinitiative/diw2012 Yolanda Gil ISWC-2014 gil@isi.edu 85! A View from Biomedical Research: The NIH Big Data To Knowledge (BD2K) Initiative! “Discovery informatics is in its infancy. Search engines are grappling with the need for deep search, but it is doubtful they will fulfill the needs of the biomedical research community when it comes to finding and analyzing the appropriate datasets. Let me cast the vision in a use case. As a research group winds down for the day algorithms take over, deciphering from the days on-line raw data, lab notes, grant drafts etc. underlying themes that are being explored by the laboratory (the lab’s digital assets). Those themes are the seeds of deep search to discover what is relevant to the lab that has appeared since a search was last conducted in published papers, public data sets, blogs, open reviews etc. Next morning the results of the deep search are presented to each member as a personalized view for further post processing. We have a long way to go here, but programs that incite groups of computer, domain and social scientists to work on these needs move us forward.” 86! Yolanda Gil ISWC-2014 USC Information Scienceswill Institute gil@isi.edu A View from Geoscieces: The NSF EarthCube Initiative! Data Workflows Semantics Governance hAp://www.earthcube.org/ USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 87! What Might the Future Look Like?! YOU: What are you working on? OTHER PERSON: I am really busy, working on… USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 88! In the Future! YOU: What are you working on? OTHER PERSON: I am really busy, working on… YOU: Yes, but aren’t you glad that we can get our work done faster? USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 89! Thank you!! http://www.isi.edu/~gil http://www.wings-workflows.org http://www.organicdatascience.org http://discoveryinformaticsinitiative.org ■ ■ ■ ■ ■ ■ Wings contributors: Varun Ratnakar, Ricky Sethi, Hyunjoon Jo, Jihie Kim, Yan Liu, Dave Kale (USC), Ralph Bergmann (U Trier), William Cheung (HKBU), Daniel Garijo (UPM), Pedro Gonzalez & Gonzalo Castro (UCM), Paul Groth (VUA) Wings collaborators: Chris Mattmann (JPL), Paul Ramirez (JPL), Dan Crichton (JPL), Rishi Verma (JPL), Ewa Deelman & Gaurang Mehta & Karan Vahi (USC), Sofus Macskassy (ISI), Natalia Villanueva & Ari Kassin (UTEP) Organic Data Science: Felix Michel and Matheus Hauder (TUM), Varun Ratnakar (ISI), Chris Duffy (PSU), Paul Hanson, Hilary Dugan, Craig Snortheim (U Wisconsin), Jordan Read (USGS) Biomedical workflows: Phil Bourne & Sarah Kinnings (UCSD), Chris Mason (Cornell), Joel Saltz & Tahsin Kurk (Emory U.), Jill Mesirov & Michael Reich (Broad), Randall Wetzel (CHLA), Shannon McWeeney & Christina Zhang (OHSU) Geosciences workflows: Chris Duffy (PSU), Paul Hanson (U Wisconsin), Tom Harmon & Sandra Villamizar (U Merced), Tom Jordan & Phil Maechlin (USC), Kim Olsen (SDSU) And many others! USC Information Sciences Institute Yolanda Gil ISWC-2014 gil@isi.edu 90!