Event Narrative Module, version 3 Deliverable D5.1.3
Transcription
Event Narrative Module, version 3 Deliverable D5.1.3
Event Narrative Module, version 3 Deliverable D5.1.3 Version FINAL Authors: Piek Vossen1 , Tommaso Caselli1 , Agata Cybulska1 , Antske Fokkens1 , Filip Ilievski1 , Anne-Lyse Minard2 , Paramita Mirza2 , Itziar Aldabe3 , Egoitz Laparra3 , German Rigau3 Affiliation: (1) VUA, (2) FBK, (3) UPV/EHU Building structured event indexes of large volumes of financial and economic data for decision making ICT 316404 Event Narrative Module, version 3 2/148 Grant Agreement No. Project Acronym Project Full Title 316404 NEWSREADER Building structured event indexes of large volumes of financial and economic data for decision making. Funding Scheme FP7-ICT-2011-8 Project Website http://www.newsreader-project.eu/ Prof. dr. Piek T.J.M. Vossen VU University Amsterdam Tel. + 31 (0) 20 5986466 Project Coordinator Fax. + 31 (0) 20 5986500 Email: piek.vossen@vu.nl Document Number Deliverable D5.1.3 Status & Version FINAL Contractual Date of Delivery October 2015 Actual Date of Delivery January 2016 Type Report Security (distribution level) Public Number of Pages 148 WP Contributing to the Deliverable WP05 WP Responsible VUA EC Project Officer Susan Fraser 1 1 Authors: Piek Vossen , Tommaso Caselli , Agata Cybulska1 , Antske Fokkens1 , Filip Ilievski1 , Anne-Lyse Minard2 , Paramita Mirza2 , Itziar Aldabe3 , Egoitz Laparra3 , German Rigau3 Keywords: Event detection, event-coreference, event components, NAF, RDF, SEM, GAF, event relations, timelines, storylines, attribution and perspective, crosslingual event extraction Abstract: This deliverable describes the final version of the modules that convert the Natural Language Processing output in NAF to the Semantic Web interpretation in SEM-RDF. We describe the way we represent instances in SEM-RDF, relations between these instances and the GAF pointers to the mentions in the text. Since instances can have many different mentions in different sources, we define identity criteria for mentions. The resulting SEM-RDF representations are loaded into the KnowledgeStore as triples that can be queried through SPARQL. In addition to the data on individual events, we also extract causal and temporal relations between events. Time anchoring of events and the relations are used to create timelines of events. Timelines are then used to derive storylines. In addition to the event data, we also derive the perspective of the sources of information with respect to the data. This is represented in a separate RDF structure that models provenance and attribution. Finally, we describe how the system extract semantic data across different languages. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 3/148 Table of Revisions Version Date Description and reason By 1.1 1 October 2015 First structure 1.2 5 October 2015 Streaming architecture 1.3 11 October 2015 Event relations 1.4 13 October 2015 Perspectives 1.5 13 October 2015 Storylines 1.6 21 October 2015 RDF evaluation and clean up 1.7 21 October 2015 Review 1.8 5 October 2015 Streaming architecture revision 1.9 21 October 2015 Revision all 2.0 January 2015 Event coreference 2.1 January 2015 Event coreference 2.2 January 2016 Cross-lingual extraction 2.2 January 2016 Reviewed by 2.3 January 2016 Revised after review 2.4 29 January 2016 Check by coordinator Piek Vossen, VUA Filip Ilievski, VUA Anne-Lyse Minard, FBK Antske Fokkens, VUA Tommaso Caselli, VUA Piek Vossen, VUA Marieke van Erp, VUA Filip Ilievski, VUA Piek Vossen, VUA Piek Vossen, VUA Agata Cybulska, VUA Piek Vossen, VUA Egoitz Laparra, EHU Piek Vossen, VUA VUA NewsReader: ICT-316404 Affected tions all 2 4 6 5 2 all 2 all 3 3 7 all all - February 1, 2016 sec- Event Narrative Module, version 3 NewsReader: ICT-316404 4/148 February 1, 2016 Event Narrative Module, version 3 5/148 Executive Summary This deliverable describes the final version of the modules that convert the Natural Language Processing output in NAF to the Semantic Web interpretation in SEM-RDF. We describe the way we represent instances in SEM-RDF, relations between these instances and the GAF pointers to the mentions in the text. Since instances can have many different mentions in different sources, we define identity criteria for mentions. The resulting SEMRDF representations are loaded into the KnowledgeStore as triples that can be queried through SPARQL. Two different approaches have been defined for processing NAF files: 1) assuming an empty KnowledgeStore and a batch of NAF files, all NAF files are processed and compared after which the KnowledgeStore is populated with the RDF and 2) assuming a streaming set-up in which each NAF file is processed one-by-one and the result is compared with the data that is already in the KnowledgeStore. The data is event-centric, which means we provide only the data relevant for the events detected. Every event is assumed to be anchored to a date or period in time and have at least one participant. Next to the data on individual events, we also extract causal and temporal relations between events. Time anchoring of events and the relations are used to create timelines of events. Timelines are then used to derive storylines. In addition to the event data, we also derive the perspective of the sources of information with respect to the data. This is represented in a separate RDF structure that models provenance and attribution. Finally, since the RDF representation is agnostic for the expression in language and our NLP modules for English, Spanish, Dutch and Italian are interoperable through NAF, the NAF2SEM module can extract the same RDF representation across different languages. This provides a unique opportunity for comparing the semantic processing of text across languages. We describe the cross-lingual processing and the comparison across the language-system output. In comparison with the previous deliverable D5.1.2 almost all sections changed. Minor changes were made for sections 4, 5 and 6. Other sections were changed drastically. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 NewsReader: ICT-316404 6/148 February 1, 2016 Event Narrative Module, version 3 7/148 Contents Table of Revisions 3 1 Introduction 13 2 Interpreting NAF-XML as SEM-RDF 2.1 Extracting instances from NAF layers . . . . 2.1.1 Entities and non-entities . . . . . . . 2.1.2 Events . . . . . . . . . . . . . . . . . 2.1.3 Participant and event relations . . . 2.1.4 Temporal anchoring . . . . . . . . . . 2.2 Identity across events . . . . . . . . . . . . . 2.2.1 Event comparison in batch mode . . 2.2.2 Event comparison in streaming mode 2.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 19 19 23 25 28 32 35 39 44 . . . . . . . . . . . . . . . 47 47 47 48 49 51 51 52 52 54 54 57 58 61 67 77 . . . . . . . . 79 79 79 80 83 83 84 87 87 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Event Coreference 3.1 Bag of Events Approach . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 The Overall Approach . . . . . . . . . . . . . . . . . . . . . 3.1.2 Two-step Bag of Events Approach . . . . . . . . . . . . . . . 3.1.3 Step 1: Clustering Documents Using Bag of Events Features 3.1.4 Step 2: Clustering Sentence Templates . . . . . . . . . . . . 3.1.5 One-step Bag of Events Approach . . . . . . . . . . . . . . . 3.1.6 Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.7 Experimental Set Up . . . . . . . . . . . . . . . . . . . . . . 3.1.8 Baseline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.9 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Evaluation of the NewsReader pipeline . . . . . . . . . . . . . . . . 3.2.1 NewsReader output . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Maximizing the event detection . . . . . . . . . . . . . . . . 3.2.3 Conclusion NewsReader cross-document event coreference . 4 Event Relations 4.1 Temporal Relations . . . . . . . . . . 4.1.1 Annotation Schema . . . . . . 4.1.2 Temporal Relation Extraction 4.2 Causal Relation . . . . . . . . . . . . 4.2.1 Annotation Scheme . . . . . . 4.2.2 Causal Relation Extraction . 4.3 Predicate Time Anchors . . . . . . . 4.3.1 Annotation Scheme . . . . . . NewsReader: ICT-316404 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . February 1, 2016 Event Narrative Module, version 3 4.3.2 8/148 Predicate Time Anchor Relation Extraction . . . . . . . . . . . . . 5 From TimeLines to StoryLines 5.1 TimeLine extraction . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 TimeLines: task description . . . . . . . . . . . . . . . 5.1.2 System Description and Evaluation . . . . . . . . . . . 5.1.3 Document level time-anchoring for TimeLine extraction 5.2 Storylines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 StoryLines aggregated from entity-centered TimeLines 5.2.2 Storylines aggregated from climax events . . . . . . . . 5.2.3 Workshop on Computing News Storylines . . . . . . . 6 Perspectives 6.1 Basic Perspective Module . . 6.2 Factuality module . . . . . . . 6.2.1 Event factuality . . . . 6.2.2 Identifying factualities 6.2.3 Factuality module . . . 6.2.4 Future work . . . . . . 6.3 A perspective modelross-lingual extraction 7.1 Crosslingual extraction of entities . 7.2 Crosslingual extraction of events . . 7.3 Crosslingual extraction of relations 7.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 127 130 134 137 . . . . . . . . . . . . . . 8 Conclusions 138 9 Appendix 139 NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 9/148 List of Tables 1 Cross-document event coreference arguments for stream processing . . . . . 41 2 Quality triple evaluation of SEM-RDF extracted from Wikinews. . . . . . . 45 3 Detailed quality triple evaluation of SEM-RDF extracted from Wikinews with and without taking even-coreference into account. . . . . . . . . . . . 46 4 Sentence template ECB topic 1, text 7, sentence 1 . . . . . . . . . . . . . . 49 5 Sentence template ECB topic 1, text 7, sentence 2 . . . . . . . . . . . . . . 49 6 Document template ECB topic 1, text 7, sentences 1-2 . . . . . . . . . . . 49 7 ECB+ statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 8 Features grouped into four categories: L-Lemma based, A-Action similarity, D-location within Discourse, E-Entity coreference and S-Synset based. . . . 53 9 Bag of events approach to event coreference resolution, evaluated on the ECB+ in MUC, B3, mention-based CEAF, BLANC and CoNLL F measures. 55 10 Baseline results on the ECB+: singleton baseline and lemma match of event triggers evaluated in MUC, B3, mention-based CEAF, BLANC and CoNLL F. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 11 Best scoring two-step bag of events approach, evaluated in MUC, B3, entitybased CEAF, BLANC and CoNLL F in comparison with related studies. Note that the BOE approach uses gold while related studies system mentions. 56 12 BLANC refererence results macro averaged over ECB+ topics in terms of recall (R), precision (P) and F1 (F) for NewsReader output with different proportions of WordNet synsets to match: S=only synset matches, SL= Synsets matches if synsets and lemma matches if no synsets associated, L=lemmas only. Different columns represent proportions in steps of 10% from 1% to 100%. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 BLANC refererence results macro averaged over ECB+ topics in terms of recall (R-SL30), precision (P-SL30) and F1 (F-SL30). AR is stable across the results, meaning that a single participant in any role needs to match. We varied the hypernyms (H) and lowest-common-subsumer (L) for action matches and the time constraints: no time constraint (NT), year (Y), month (M) and day (D) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 BLANC refererence results macro averaged over ECB+ topics in terms of recall (R-SL30), precision (P-SL30) and F1 (F-SL30). The hypernyms (H), lowest-common-subsumer (L) and time constraint month (M) are kept stable. We varied the role-participant constraints: NR=no constraint, A0 role participant should match, A1 should match, A2 should match, A0 and A1 should match, A0 and A2 should match, A1 and A2 should match . . . . . 66 13 14 NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 15 16 17 18 19 20 21 22 23 24 10/148 Macro averaged Mention identification for ECB+ topics. NWR=NewsReader pipeline v3.0 without adaptation, EDg(old)=NWR augmented with EventDetection trained with gold data, EDg(old)EC= same as EDg(old) but skipping predicates with an Event class, EDs(ilver)= NWR augmented with EventDetection trained with silver data, EDs(ilver)EC= same as EDs but skipping predicates with an Event class. . . . . . . . . . . . . . . . . . . . Predicates missed more than once by NewsReader extended with EventDetection (silver) and Event class filter as events in ECB+ . . . . . . . . . . Predicates missed once by NewsReader extended with EventDetection (silver) and Event class filter as events in ECB+ . . . . . . . . . . . . . . . . Predicates invented and occurring more than once by NewsReader extended with EventDetection (silver) and Event class filter as events in ECB+ . . . Predicates invented and occurring only once by NewsReader extended with EventDetection (silver) and Event class filter as events in ECB+ . . . . . . Reference results macro averaged over ECB+ topics with different options for event detection. kNWR=NewsReader event detection without invented mentions, maximizing precision, NWR=NewsReader pipeline v3.0 without adaptation, EDg=NWR augmented with EventDetection trained with gold data, EDgEC= same as EDg but skipping predicates with an Event class, EDs= NWR augmented with EventDetection trained with silver data, EDsEC= same as EDs but skipping predicates with an Event class. ARM= standard setting one participant in any role (AR), time month match and action concept and phrase match 30%, mR= maximizing recall by no constraints on participant match and time, action concept and phrase match 1%, mP=maximizing precision by participant roles A0A1, time day match and action concept and phrase match is set to 100%. . . . . . . . . . . . . Reference results macro averaged over ECB+ corpus with different options for event detection. kNWR=NewsReader event detection without invented mentions, maximizing precision, NWR=NewsReader pipeline v3.0 without adaptation, EDg=NWR augmented with EventDetection trained with gold data, EDgEC= same as EDg but skipping predicates with an Event class, EDs= NWR augmented with EventDetection trained with silver data, EDsEC= same as EDs but skipping predicates with an Event class. ARM= standard setting one participant in any role (AR), time month match and action concept and phrase match 30%, mR= maximizing recall by no constraints on participant match and time, action concept and phrase match 1%, mP=maximizing precision by participant roles A0A1, time day match and action concept and phrase match is set to 100%. . . . . . . . . . . . . Reference results macro averaged over ECB+ corpus as reported by Yang et al. (2015) for state-of-the-art machine learning systems . . . . . . . . . . Distribution of tell, kill and election over all text and annotated text per mention, document and topic in ECB+ . . . . . . . . . . . . . . . . . . . . Temporal relations in TimeML annotation . . . . . . . . . . . . . . . . . . NewsReader: ICT-316404 67 69 70 71 72 73 75 75 76 79 February 1, 2016 Event Narrative Module, version 3 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 11/148 Tempeval-3 evaluation on temporal relation classification . . . . . . . . . . 82 CLINK extraction system’s performance. . . . . . . . . . . . . . . . . . . . 87 System Results (micro F1 score) for the SemEval 2015 Task 4 Task A - Main 92 System Results (micro F1 score) for the SemEval 2015 Task 4 Task A Subtask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Results on the SemEval-2015 task . . . . . . . . . . . . . . . . . . . . . . . 97 Figures of the StoryLine gold dataset. . . . . . . . . . . . . . . . . . . . . . 99 Results of the StoryLine extraction process. . . . . . . . . . . . . . . . . . 101 Certainty, polarity and tense values . . . . . . . . . . . . . . . . . . . . . . 114 DBpedia entities extracted for English, Spanish, Italian and Dutch Wikinews with proportion of coverage, measured as macro and micro coverage. I=instances, M=mentions, O=overlap, maC=macro-average over all document results, miC=microAverage over all mentions . . . . . . . . . . . . . . . . . . . . . 130 DBpedia entities in the Wikinews Airbus corpus most frequent in English with Spanish, Italian and Dutch frequencies . . . . . . . . . . . . . . . . . 132 DBpedia entities in the Wikinews Apple corpus most frequent in English with Spanish, Italian and Dutch frequencies . . . . . . . . . . . . . . . . . 132 DBpedia entities in the Wikinews GM, Chrysler, Ford corpus most frequent in English with Spanish, Italian and Dutch frequencies . . . . . . . . . . . 133 DBpedia entities in the Wikinews stock market corpus most frequent in English with Spanish, Italian and Dutch frequencies . . . . . . . . . . . . . 133 ILI-based events extracted for English, Spanish, Italian and Dutch Wikinews with proportion of coverage, measured as macro and micro coverage. I=instances, M=mentions, O=overlap, maC=macro-average over all document results, miC=microAverage over all mentions . . . . . . . . . . . . . . . . . . . . . 134 ILI-based events in the Wikinews Airbus corpus most frequent in English with Spanish, Italian and Dutch frequencies . . . . . . . . . . . . . . . . . 134 ILI-based events in the Wikinews Apple corpus most frequent in English with Spanish, Italian and Dutch frequencies . . . . . . . . . . . . . . . . . 135 ILI-based events in the Wikinews GM, Chrysler, Ford corpus most frequent in English with Spanish, Italian and Dutch frequencies . . . . . . . . . . . 135 ILI-based events in the Wikinews stock market corpus most frequent in English with Spanish and Dutch frequencies . . . . . . . . . . . . . . . . . 135 Triple predicates that are most frequent in the English Wikinews corpus with coverage in Spanish, Italian and Dutch . . . . . . . . . . . . . . . . . 136 ILI-based Triples extracted for English, Spanish, Italian and Dutch Wikinews with proportion of coverage, measured as macro and micro coverage. I=instances, M=mentions, O=overlap, maC=macro-average over all document results, miC=microAverage over all mentions . . . . . . . . . . . . . . . . . . . . . 137 FrameNet frames for contextualEvents . . . . . . . . . . . . . . . . . . . . 139 FrameNet from for sourceEvents . . . . . . . . . . . . . . . . . . . . . . . . 140 FrameNet frames for grammaticalEvents . . . . . . . . . . . . . . . . . . . 140 NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 NewsReader: ICT-316404 12/148 February 1, 2016 Event Narrative Module, version 3 1 13/148 Introduction The goal of the NewsReader project1 is to automatically process massive streams of daily news in 4 different languages to reconstruct longer term story lines of events. For this purpose, we extract events mentioned in news articles, the place and date of their occurrence and who is involved, using a pipeline of Natural Language Processing modules (for the details see Agerri et al. (2015)) that process each news article and store the interpretation in the form of the Natural Language Processing Annotation Format (NAF (Fokkens et al., 2014)). Representations in NAF are mention-based. Mentions of an event, entity, place or time are not unique. The same instance (event, entity, place or date) can be mentioned several times in a single text or in different texts and, likewise NAF represents each mention separately. Consider the following short fragments of two news articles published on the same day: http://www.telegraph.co.uk/finance/newsbysector/industry/engineering/10125280/Porsche-family-buys-back10pc-stake-from-Qatar.html 17 Jun 2013 Porsche family buys back 10pc stake from Qatar Descendants of the German car pioneer Ferdinand Porsche have bought back a 10pc stake in the company that bears the family name from Qatar Holding, the investment arm of the Gulf State’s sovereign wealth fund. ————————————————————————————– http://english.alarabiya.net/en/business/banking-and-finance/2013/06/17/Qatar-Holding-sells-10-stake-in-Porscheto-family-shareholders.html Monday, 17 June 2013 Qatar Holding sells 10% stake in Porsche to founding families Qatar Holding, the investment arm of the Gulf state’s sovereign wealth fund, has sold its 10 percent stake in Porsche SE to the luxury carmaker’s family shareholders, four years after it first invested in the firm. Both fragments describe the same event but do this very differently. The first fragment talks about a buy event of 10pc stake in the Porsche company from Qatar holding by the Porsche family. The second fragments frames this as a sell event of the same stake by Qatar Holding to Porsche. Both articles make reference to the Porsche company, the Porsche family and Qatar Holding in various ways e.g. Descendants of the German car pioneer Ferdinand Porsche and the investment arm of the Gulf state’s sovereign wealth fund. However, there is no difference in the content across the texts. The NewsReader pipeline for processing text deals with this by detecting DBpedia URIs for each entity and resolving coreference relations in the text to connect other expressions to these entities. The same is done for events mentioned in the text: buys back and bought back are represented through a unique URI as a single event, and so are sells and solds in the second fragment. Obviously, these events are not in DBpedia and therefore unique blank URIs are created. In NewsReader, we ultimately generate representations for unique instances represented from these URIs in RDF across all these mentions and the relations between them as RDFtriples according to the Simple Event Model (SEM, van Hage et al. (2011)). Within SEM, 1 FP7-ICT-316404 Building structured event indexes of large volumes of financial and economic data for decision making, www.newsreader-project.eu/ NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 : e v e n t #23 a rdfs : label f n : Buyer fn : S e l l e r f n : Goods sem : hasAtTime 14/148 sem : Event , f n : C o m m e r c e s e l l , f n : Commerce buy ; ” buy ” , ” s e l l ” ; dbp : r e s o u r c e / P o r s c h e ; dbp : r e s o u r c e / Q a t a r I n v e s t m e n t A u t h o r i t y ; : non− e n t i t i e s /10 p c s t a k e ; :20150120. Figure 1: SEM event instance 1 2 3 4 5 6 7 8 9 : e v e n t #23 g a f : denotedBy <h t t p : / /www. t e l e g r a p h . c o . uk#c h a r =15,19> , <h t t p : / / e n g l i s h . a l a r a b i y a . n e t#c h a r =13 ,19 >. dbp : r e s o u r c e / P o r s c h e g a f : denotedBy <h t t p : / /www. t e l e g r a p h . c o . uk#c h a r =0 ,7 >. : non− e n t i t i e s /10 p c s t a k e g a f : denotedBy <h t t p : / /www. t e l e g r a p h . c o . uk#c h a r =25 ,35 >. dbp : r e s o u r c e / Q a t a r I n v e s t m e n t A u t h o r i t y g a f : denotedBy <h t t p : / / e n g l i s h . a l a r a b i y a . n e t#c h a r =0 ,13 >. Figure 2: SEM event instance with mentions data objects are defined for events, actors, places and time with relations between them. Sources that contain the same events, entities and relations thus should result in the same structure in SEM. A SEM representation for the above two fragments would look as in Figure 1. For the event, we created the so-called blank URI event#23 which is an instance of the ontological types sem:Event, fn:Commerce sell, fn:Commerce buy. The two latter types come from FrameNet Baker et al. (1998). Furthermore, we see the labels buy and sell that are aggregated from the two sources. Furthermore, there are triples that relate the event to the entities in the text through FrameNet relations, while the event is anchored to a data through a sem:hasAtTime relation. From both text, only a single event representation is derived with the same set of triples. Since we do not want to lose the connection to the textual mentions of the information, we developed the Grounded Annotation Framework (GAF, Fokkens et al. (2013)), which formally distinguishes between mentions of events and entities in NAF and instances of events and entities in SEM, connected through denotedBy links between representations. For each object in SEM, we therefore provide the pointers to the character offsets in the text where the information is mentioned as additional triples, as is shown in Figure 2. In this deliverable, we describe the modules that read NLP interpretations from NAF files and create the RDF representations according to SEM and GAF. The RDF representations are eventually stored into the KnowledgeStore (Corcoglioniti et al. (2013)). The following Figure 3 shows the position of modules in the overall process. In this deliverable, we describe the final implementation of the modules that interpret NAF data as SEM/GAF. The document is structured as follows. In section 2, we describe the implementation of the NAF2SEM module that carries out the conversion. There are two implementations: one for batch processing and one for stream processing. The latter implementation is part of the streaming end-to-end architecture through which a text file is completely processed through the NLP pipeline, interpreted as SEM-RDF and integrated into the KnowledgeStore without intermediate storage. In section 3, we report NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 15/148 Figure 3: Input-output schema for Work Packages in NewsReader on the progress made on event coreference, which is the basis for establishing instance representation, detailing the current implementation of the NAF2SEM module. In section 4, we describe two modules for detecting temporal and causal relations. These relations are used in section 5.2, where we report on the modules that extract timelines and storylines. Section 6 discusses the implementation of the perspective and attribution module. Events are divided into contextual events and source events. The former describe the statements about the changes in the world, while the latter events indicate the relations between sources and these statements. We developed a separate module that derives the perspective and attribution values from these two event types, incorporating the output of the opinion layer and the attribution layer in NAF. Finally in section 7, we provide our results on crosslingual event extraction (i.e. combining the event information extracted from documents in different languages). The English Wikinews corpus was translated to Dutch, Italian and Spanish. By processing the translations with the Spanish, Italian and Dutch pipelines, we were able to create SEM representation from each translated data set. These SEM representations should entail the same instance information. We report on the results of this comparison. Conclusions are given in section 8. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 2 16/148 Interpreting NAF-XML as SEM-RDF Where NAF contains the semantic interpretations of mentions in text, SEM represents instances of entities, events and relations linked to these mentions through GAF. Following GAF, we create SEM instances from NAF layers by combining different mentions of the same event or entity into a unique URI representation with gaf:denotedBy links to their mentions. Identity across mentions is based on various measures, among which identical DBPedia URIs, overlap with spans of tokens or terms in coreference sets, normalisation of dates, and similarity in WordNet. Next we show the output for processing just the titles in the two Qatar-Porsche examples as separate NAF sources. The RDF-TRiG has 3 parts: nwr:instances a named graph of all the instances detected: events, participants and time instances or intervals. See Figure 4. event relations a set of named-graphs, each representing a relation with an event instance. See Figure 5. nwr:provenance a named graph with gaf:denotedBy relations between the named graphs of the event relations and the offset positions in the sources . See Figure 6. The TRiG example shows a graph that includes all the instances. We see here 4 types of instances: events, entities, non-entities and time descriptions. Instances are based on coreference of mentions. Entity mention coreference is established by a the nominal coreference module but also indirectly by the URI assigned to the entities. Since URIs are unique and most URIs are based on DBpedia, mentions for which we create the same URI are automatically merged and all statements will apply to this unique representation across different NAF files. In this example we see that the entity <dbp:Porsche> has two different mentions originated from the two different sources. In other cases such as Qatar and Qatar Holding, different URIs have been chosen by the Named Entity Disambiguation (NED) module and consequently the entities are represented as distinct instances. There are also event components that are not considered as entities, such as: <nwr:data/porsche/non-entities/10pc+stake> To identify the concept, we create a URI based on the phrase that is only unique across data sets. So all reference to 10p stake within the database will become corerefential and point to the same instance representation. Obviously, this is not a proper representation since each 10 percent can be a different one. We also see that the events sell and buy are kept separately here by the software. This is because the similarity between these words is not high enough to merge them. Finally, the two time expressions in the two documents represent the day on which the document was processed, which resolves to the same time:Instant instance. This is because the meta data was lacking and there is no other time description in the title. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 @prefix @prefix @prefix @prefix @prefix @prefix @prefix @prefix @prefix @prefix @prefix @prefix @prefix @prefix @prefix @prefix 17/148 prov : <h t t p : / /www. w3 . o r g / n s / p r o v#> . gaf : <h t t p : / / g r o u n d e d a n n o t a t i o n f r a m e w o r k . o r g / g a f#> . wn : <h t t p : / /www. n e w s r e a d e r −p r o j e c t . eu / o n t o l o g i e s /pwn3.0/ > . n w r o n t o l o g y : <h t t p : / /www. n e w s r e a d e r −p r o j e c t . eu / o n t o l o g i e s /> . ili : <h t t p : / / g l o b a l w o r d n e t . o r g / i l i /> . rdfs : <h t t p : / /www. w3 . o r g / 2 0 0 0 / 0 1 / r d f −schema#> . time : <h t t p : / /www. w3 . o r g /TR/ owl−t i m e#> . eso : <h t t p : / /www. n e w s r e a d e r −p r o j e c t . eu / domain−o n t o l o g y#> . pb : <h t t p : / /www. n e w s r e a d e r −p r o j e c t . eu / o n t o l o g i e s / propbank/> . owl : <h t t p : / /www. w3 . o r g / 2 0 0 2 / 0 7 / owl#> . nwr : <h t t p : / /www. n e w s r e a d e r −p r o j e c t . eu/> . rdf : <h t t p : / /www. w3 . o r g /1999/02/22 − r d f −s y n t a x−n s#> . sem : <h t t p : / / s e m a n t i c w e b . c s . vu . n l / 2 0 0 9 / 1 1 / sem/> . fn : <h t t p : / /www. n e w s r e a d e r −p r o j e c t . eu / o n t o l o g i e s / f r a m e n e t/> . skos : <h t t p : / /www. w3 . o r g / 2 0 0 4 / 0 2 / s k o s / c o r e#> . nwrdata : <h t t p : / /www. n e w s r e a d e r −p r o j e c t . eu / d a t a/> . nwr : i n s t a n c e s { #e n t i t i e s <dbp : P o r s c h e> rdfs : label g a f : denotedBy <dbp : Qatar> rdfs : label g a f : denotedBy ” Porsche ” ; <h t t p : / /www. t e l e g r a p h . c o . uk#c h a r =0,7> , <h t t p : / / e n g l i s h . a l a r a b i y a . n e t#c h a r =33,40> . ” Qatar ” ; <h t t p : / /www. t e l e g r a p h . c o . uk#c h a r =41,46> . <dbp : Q a t a r I n v e s t m e n t A u t h o r i t y > rdfs : label ” Qatar H o l d i n g ” ; g a f : denotedBy <h t t p : / / e n g l i s h . a l a r a b i y a . n e t#c h a r =0,13> . <nwr : d a t a / p o r s c h e / non− e n t i t i e s /10 pc+s t a k e > rdfs : label ”10 pc s t a k e ” ; g a f : denotedBy <h t t p : / /www. t e l e g r a p h . c o . uk#c h a r =25,35> . #non− e n t i t i e s <nwr : d a t a / p o r s c h e / non− e n t i t i e s /10+\%25+ s t a k e+i n+p o r s c h e > rdfs : label ”10 \% s t a k e i n P o r s c h e ” ; g a f : denotedBy <h t t p : / / e n g l i s h . a l a r a b i y a . n e t#c h a r =20,40> . <nwr : d a t a / p o r s c h e / non− e n t i t i e s / t o+f o u n d i n g+f a m i l i e s > rdfs : label ” to founding family ” ; g a f : denotedBy <h t t p : / / e n g l i s h . a l a r a b i y a . n e t#c h a r =41,61> . #e v e n t s <h t t p : / / e n g l i s h . a l a r a b i y a . n e t#ev1> a sem : Event , n w r o n t o l o g y : c o n t e x t u a l E v e n t , fn : Commerce sell , eso : S e l l i n g , i l i : i32963 , i l i : i32953 rdfs : label ”sell” ; g a f : denotedBy <h t t p : / / e n g l i s h . a l a r a b i y a . n e t#c h a r =14,19> . <h t t p : / /www. t e l e g r a p h . c o . uk#ev2> a sem : Event , n w r o n t o l o g y : c o n t e x t u a l E v e n t , f n : Commerce buy , e s o : Buying , i l i : i 3 2 7 8 8 , i l i : i 3 4 9 0 1 rdfs : label ” buy ” ; g a f : denotedBy <h t t p : / /www. t e l e g r a p h . c o . uk#c h a r =15,19> . #t i m e <nwr : t i m e /20150324 > a t i m e : day t i m e : month time : unitType time : year ; ; time : DateTimeDescription ; ”−−−24”ˆˆ< h t t p : / /www. w3 . o r g / 2 0 0 1 /XMLSchema#gDay> ; ”−−03”ˆˆ< h t t p : / /www. w3 . o r g / 2 0 0 1 /XMLSchema#gMonth> ; t i m e : unitDay ; ”2015”ˆˆ < h t t p : / /www. w3 . o r g / 2 0 0 1 /XMLSchema#gYear> . <h t t p : / /www. t e l e g r a p h . c o . uk#tmx0> a time : I n s t a n t ; t i m e : inDateTime <nwr : t i m e /20150324 > . <h t t p : / / e n g l i s h . a l a r a b i y a . n e t#tmx0> a time : I n s t a n t ; t i m e : inDateTime <nwr : t i m e /20150324 > . } Figure 4: Instances of events, entities, non-entities and time NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 18/148 <h t t p : / / e n g l i s h . a l a r a b i y a . n e t#pr1 , r l 1 > { <h t t p : / / e n g l i s h . a l a r a b i y a . n e t#ev1> sem : h a s A c t o r <dbp : Q a t a r I n v e s t m e n t A u t h o r i t y > ; e s o : p o s s e s s i o n −o w n e r 1 <dbp : Q a t a r I n v e s t m e n t A u t h o r i t y > ; <nwr : o n t o l o g i e s / f r a m e n e t / C o m m e r c e s e l l @ S e l l e r > <dbp : Q a t a r I n v e s t m e n t A u t h o r i t y > ; pb : A0 <dbp : Q a t a r I n v e s t m e n t A u t h o r i t y > . } <h t t p : / / e n g l i s h . a l a r a b i y a . n e t#pr1 , r l 2 > { <h t t p : / / e n g l i s h . a l a r a b i y a . n e t#ev1> sem : h a s A c t o r <nwr : d a t a / p o r s c h e / non− e n t i t i e s /10+\%25+ s t a k e+i n+p o r s c h e > ; <nwr : o n t o l o g i e s / f r a m e n e t / Commerce sell@Goods> <nwr : d a t a / p o r s c h e / non− e n t i t i e s /10+\%25+ s t a k e+i n+p o r s c h e > ; pb : A1 <nwr : d a t a / p o r s c h e / non− e n t i t i e s /10+\%25+ s t a k e+i n+p o r s c h e > . } <h t t p : / / e n g l i s h . a l a r a b i y a . n e t#pr1 , r l 3 > { <h t t p : / / e n g l i s h . a l a r a b i y a . n e t#ev1> sem : h a s A c t o r <nwr : d a t a / p o r s c h e / non− e n t i t i e s / t o+f o u n d i n g+f a m i l i e s > ; e s o : p o s s e s s i o n −o w n e r 2 <nwr : d a t a / p o r s c h e / non− e n t i t i e s / t o+f o u n d i n g+f a m i l i e s > ; <nwr : o n t o l o g i e s / f r a m e n e t / Commerce sell@Buyer> <nwr : d a t a / p o r s c h e / non− e n t i t i e s / t o+f o u n d i n g+f a m i l i e s > ; pb : A2 <nwr : d a t a / p o r s c h e / non− e n t i t i e s / t o+f o u n d i n g+f a m i l i e s > . } <h t t p : / /www. t e l e g r a p h . c o . uk#pr2 , r l 2 > { <h t t p : / /www. t e l e g r a p h . c o . uk#ev2> sem : h a s A c t o r <dbp : P o r s c h e> ; e s o : p o s s e s s i o n −o w n e r 2 <dbp : P o r s c h e> ; <nwr : o n t o l o g i e s / f r a m e n e t / Commerce buy@Buyer> <dbp : P o r s c h e> ; pb : A0 <dbp : P o r s c h e> . } <h t t p : / /www. t e l e g r a p h . c o . uk#pr2 , r l 4 > { <h t t p : / /www. t e l e g r a p h . c o . uk#ev2> sem : h a s A c t o r <nwr : d a t a / p o r s c h e / non− e n t i t i e s /10 pc+s t a k e > ; <nwr : o n t o l o g i e s / f r a m e n e t / Commerce buy@Goods> <nwr : d a t a / p o r s c h e / non− e n t i t i e s /10 pc+s t a k e > ; pb : A1 <nwr : d a t a / p o r s c h e / non− e n t i t i e s /10 pc+s t a k e > . } <h t t p : / /www. t e l e g r a p h . c o . uk#pr2 , r l 5 > { <h t t p : / /www. t e l e g r a p h . c o . uk#ev2> sem : h a s A c t o r <dbp : Qatar> ; <nwr : o n t o l o g i e s / f r a m e n e t / Commerce buy@Means> <dbp : Qatar> ; pb : A2 <dbp : Qatar> . } <h t t p : / / e n g l i s h . a l a r a b i y a . n e t#t r 1 > { <h t t p : / / e n g l i s h . a l a r a b i y a . n e t#ev1> sem : hasAtTime <h t t p : / / e n g l i s h . a l a r a b i y a . n e t#tmx0> ; sem : hasTime <h t t p : / / e n g l i s h . a l a r a b i y a . n e t#tmx0> . } <h t t p : / /www. t e l e g r a p h . c o . uk#t r 2 > { <h t t p : / /www. t e l e g r a p h . c o . uk#ev2> sem : hasAtTime <h t t p : / /www. t e l e g r a p h . c o . uk#tmx0> ; sem : hasTime <h t t p : / /www. t e l e g r a p h . c o . uk#tmx0> . } Figure 5: Sem triples embedded in named-graphs NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 19/148 nwr : p r o v e n a n c e { <h t t p : / / e n g l i s h . a l a r a b i y a . n e t#pr1 , r l 1 > g a f : denotedBy <h t t p : / / e n g l i s h . a l a r a b i y a . n e t#c h a r =0,19> . <h t t p : / / e n g l i s h . a l a r a b i y a . n e t#pr1 , r l 2 > g a f : denotedBy <h t t p : / / e n g l i s h . a l a r a b i y a . n e t#c h a r =14,40> . <h t t p : / / e n g l i s h . a l a r a b i y a . n e t#pr1 , r l 3 > g a f : denotedBy <h t t p : / / e n g l i s h . a l a r a b i y a . n e t#c h a r =14,61> . <h t t p : / /www. t e l e g r a p h . c o . uk#pr2 , r l 2 > g a f : denotedBy <h t t p : / /www. t e l e g r a p h . c o . uk#c h a r =0,19> . <h t t p : / /www. t e l e g r a p h . c o . uk#pr2 , r l 4 > g a f : denotedBy <h t t p : / /www. t e l e g r a p h . c o . uk#c h a r =15,35> . <h t t p : / /www. t e l e g r a p h . c o . uk#pr2 , r l 5 > g a f : denotedBy <h t t p : / /www. t e l e g r a p h . c o . uk#c h a r =15,46> . } Figure 6: Provenance information indicating the mentions of relations These examples show that the interpretation of mentions as instances, is a delicate process that is on the one hand the result of creating identical URIs for the interpretation of a mention (e.g. DBpedia references or normalised dates) and on the other hand the result of specific strategies to match the semantics. The module NAF2SEM2 defines this interpretation. The core function to interpret the mentions in NAF is the class GetSemFromNaf.java. It creates the SEM objects and relations in memory. At that point there are two API options: either the information is stored on disk as a binary object file or the SEM objects are serialized to an RDF file or stream. The former is used for batch processing described in Section 2.2.1, whereas the latter is used for the streaming architecture described in Section 2.2.2. Below we first describe the representation of the instances and their relations in more detail and next the general way in which we establish identity across events. In the next subsections, we describe in detail the interpretation as instances and relations and how identity is established. 2.1 Extracting instances from NAF layers The NAF2SEM module combines information from various NAF layers to define the SEM Objects (internal data structures defined in Java) for entities, events, time and event relations. We discuss each type of object in more details below. 2.1.1 Entities and non-entities Genuine entities are represented in the entity layer in NAF and have a DBpedia URI that identifies them and are a participant in an event that is extracted. However, there are many important objects that do not fit all 3 of these requirements. There are entities without a reference to DBPedia, there are entities that do not play a role in an event that is represented and sometimes important participants in events are not detected as entities. As a result, we can find the different entities and entity-like objects in the RDF representations. Regular entities that have been detected by the Named Entity Recognizer (NERC) and got an external reference to an URI by the spotlight program are represented as discussed 2 https://github.com/cltl/EventCoreference NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 <e n t i t y i d =”e 1 ” t y p e=”ORGANIZATION”> <r e f e r e n c e s > <!−−P o r s c h e−−> <span> <t a r g e t i d =”t 1 ” /> </span> </ r e f e r e n c e s > <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e s o u r c e =” s p o t l i g h t r e f t y p e =”en ” /> <e x t e r n a l R e f r e s o u r c e =” s p o t l i g h t r e f t y p e =”en ” /> <e x t e r n a l R e f r e s o u r c e =” s p o t l i g h t r e f t y p e =”en ” /> <e x t e r n a l R e f r e s o u r c e =” s p o t l i g h t r e f t y p e =”en ” /> <e x t e r n a l R e f r e s o u r c e =” s p o t l i g h t r e f t y p e =”en ” /> <e x t e r n a l R e f r e s o u r c e =” s p o t l i g h t r e f t y p e =”en ” /> <e x t e r n a l R e f r e s o u r c e =” s p o t l i g h t r e f t y p e =”en ” /> <e x t e r n a l R e f r e s o u r c e =” s p o t l i g h t r e f t y p e =”en ” /> <e x t e r n a l R e f r e s o u r c e =” s p o t l i g h t r e f t y p e =”en ” /> <e x t e r n a l R e f r e s o u r c e =” s p o t l i g h t r e f t y p e =”en ” /> </ e x t e r n a l R e f e r e n c e s > </ e n t i t y > 20/148 v 1 ” r e f e r e n c e =”dbp : P o r s c h e ” c o n f i d e n c e = ” 0 . 9 9 9 9 3 2 1 7 ” v 1 ” r e f e r e n c e =”dbp : P o r s c h e D e s i g n G r o u p ” c o n f i d e n c e = ” 3 . 9 7 6 1 5 2E−5” v 1 ” r e f e r e n c e =”dbp : P o r s c h e 9 1 1 ” c o n f i d e n c e = ” 1 . 5 3 5 9 3 9 7E−5” v 1 ” r e f e r e n c e =”dbp : P o r s c h e 9 1 4 ” c o n f i d e n c e = ” 7 . 1 7 1 3 5 8E−6” v 1 ” r e f e r e n c e =”dbp : F e r d i n a n d P o r s c h e ” c o n f i d e n c e = ” 5 . 5 2 2 5 6 3 7E−6” v 1 ” r e f e r e n c e =”dbp : P o r s c h e f a m i l y ” c o n f i d e n c e = ” 2 . 4 9 3 9 7 5 6E−10” v 1 ” r e f e r e n c e =”dbp : P o r s c h e i n m o t o r s p o r t ” c o n f i d e n c e = ” 2 . 9 5 7 0 8 0 8E−14” v 1 ” r e f e r e n c e =”dbp : P o r s c h e 5 5 0 ” c o n f i d e n c e = ” 8 . 5 1 7 1 6 5E−17” v 1 ” r e f e r e n c e =”dbp : P o r s c h e 3 5 1 2 ” c o n f i d e n c e = ” 2 . 5 0 4 6 7 4 6E−17” v 1 ” r e f e r e n c e =”dbp : P o r s c h e R S S p y d e r ” c o n f i d e n c e = ” 1 . 9 3 9 9 4 5 2E−17” Figure 7: Organisation Entity in NAF with URIs 1 2 3 4 5 6 7 8 <e n t i t y i d =”e 4 5 ” t y p e=”LOCATION”> <r e f e r e n c e s > <!−−N o r t h e a s t China ’ s−−> <span><t a r g e t i d =”t 6 9 7”/>< t a r g e t </span> </ r e f e r e n c e s > </ e n t i t y > <e n t i t y i d =”e 4 6 ” t y p e=”LOCATION”> i d =”t 6 9 8”/>< t a r g e t i d =”t 6 9 9 ”/> Figure 8: Location Entity in NAF without URI before through their URI. The NAF representation for the Porsche instance shown in TRiG example above is shown in Figure 7. There are however many phrases detected as entities by the NERC that did not receive an external reference to DBPedia by spotlight as is shown in Figure 8. For these so-called dark entities we create a blank URI within the set of entities in a project. The entity type assigned by the NERC is used to create a subclass relation. This is shown in Figure 9. When the same URI was recovered for different mentions of an entity, this results in a single representation of that entity with gaf:denotedBy links to each mention. The spans of these mentions can however overlap with the span in the entity coreference set in a NAF layer. In these cases, we can extend the mentions of an entity with the mentions of the coreference set but in some cases, distinct entities also get tight to the same instance 1 2 3 4 <nwr : d a t a / c a r s / e n t i t i e s / N o r t h e a s t C h i n a s > a n w r o n t o l o g y : LOCATION ; rdfs : label ” N o r t h e a s t China ’ s ” ; g a f : denotedBy <nwr : d a t a / c a r s /57DD−HR81−JB4B−V3T6 . xml#c h a r =3582 ,3599 > . Figure 9: Entity in SEM with a blank URI NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 21/148 <e n t i t y i d =”e 5 ” t y p e=”PERSON”> <r e f e r e n c e s > <!−− D i d i e r Drogba−−> <span> <t a r g e t i d =”t 6 8 ”/> <t a r g e t i d =”t 6 9 ”/> </span> </ r e f e r e n c e s > <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e s o u r c e =” s p o t l i g h t v 1 ” r e f e r e n c e =”dbp : D i d i e r D r o g b a ” c o n f i d e n c e =”1.0” r e f t y p e =”en ” s o u r c e =”en”/> <e x t e r n a l R e f r e s o u r c e =”dbp ” r e f e r e n c e =”dbp : D i d i e r D r o g b a ” c o n f i d e n c e =”1.0” s o u r c e =”POCUS”/> </ e x t e r n a l R e f e r e n c e s > </ e n t i t y > <e n t i t y i d =”e 2 ” t y p e=”PERSON”> <r e f e r e n c e s > <!−− D i d i e r Yves Drogba T e b i l y −−> <span> <t a r g e t i d =”t 2 ”/> <t a r g e t i d =”t 3 ”/> <t a r g e t i d =”t 4 ”/> <t a r g e t i d =”t 5 ”/> </span> </ r e f e r e n c e s > </ e n t i t y > Figure 10: Drogba as an entity in NAF through the coreference set. Consider the example of Didier Drogba and Didier Yves Drogba Tébily in Figure 10 that was in both cases detected as an entity but only the former is mapped to DBpedia. Within the same NAF we also find the coreference set shown in Figure 11. The spans in the coreference set overlap with the spans of the two entities. As a result of that, we can merge all these representation into a single entity and list all of the mentions of the entity layer and the coreference set. Consequently, the rdfs:label predicate is also extended with all the different labels used to make reference. This is shown in the SEM representation in Figure 12. Although the NAF2SEM module reads all entities from a NAF file, we currently only output entities that play a role in events. To determine whether an entity plays a role in an event, we need to match the mentions of an entity with the span of the roles of events. We will discuss this later below. In addition, there are phrases that play a role but are not detected as entities. We have seen two examples in the TRiG example shown in the beginning of this section. To limit the amount of entities and triples, we only consider roles that have a FrameNet role element assigned. We consider these roles essential for understanding what the event is about. In all cases that these roles cannot be assigned to an entity, we represent the concept as a so-called non-entity. To find these non-entities, we match the span of the roles with all the spans of the entities extracted before. We assign the type NONENTITY to these instances as showin in Figure 13. Spotlight is not only applied to entities in the text but also to all other content phrases. This is represented in the NAF layer with the markables, see Figure 14. We use these markables to find potential DBpedia references that somehow relate to the non-entity. Again the overlap in the span elements is used to determine relevance. Since we do not know the precise semantic relation, we use the skos:relatedMatch predicate to relate the non-entity to the DBpedia entry. Through the skos:relatedMatch it is possible to query the NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 < c o r e f i d =”c o 2”> <!−−His−−> <span> <t a r g e t i d =”t 1 2 9 ”/> </span> <!−− D i d i e r Drogba−−> <span> <t a r g e t i d =”t 5 3 1 ”/> <t a r g e t i d =”t 5 3 2 ”/> </span> <!−−Drogba−−> <span> <t a r g e t i d =”t 4 2 9 ”/> </span> <!−−he−−> <span> <t a r g e t i d =”t 7 4 ”/> </span> <!−−h i s −−> <span> <t a r g e t i d =”t 9 ”/> </span> <!−−Drogba−−> <span> <t a r g e t i d =”t 3 3 3 ”/> </span> <!−−Drogba−−> <span> <t a r g e t i d =”t 3 5 3 ”/> </span> <!−−Drogba−−> <span> <t a r g e t i d =”t 2 0 2 ”/> </span> <!−− D i d i e r Drogba−−> <span> <t a r g e t i d =”t 6 2 6 ”/> <t a r g e t i d =”t 6 2 7 ”/> </span> <!−−Drogba−−> <span> <t a r g e t i d =”t 7 6 6 ”/> </span> <!−− D i d i e r Drogba−−> <span> <t a r g e t i d =”t 6 8 ”/> <t a r g e t i d =”t 6 9 ”/></span> <!−−His−−> <span><t a r g e t i d =”t 1 0 8 ”/></span> <!−−h i s −−> <span><t a r g e t i d =”t 8 4 ”/></span> <!−− D i d i e r Yves Drogba T e b i l y −−> <span><t a r g e t i d =”t 2”/>< t a r g e t i d =”t 3”/>< t a r g e t i d =”t 4”/>< t a r g e t </ c o r e f > 22/148 i d =”t 5 ”/></span> Figure 11: Coreference set with Drogba in NAF 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 dbp : D i d i e r D r o g b a rdfs : label ” Drogba ” , ” D i d i e r Yves Drogba T e b i l y ” , ” h i s ” ” D i d i e r Drogba ” , ” D i d i e r Drogba ’ s ” ; g a f : denotedBy <nwr : d a t a / c a r s /57R0−J5K1−JC86−C1N7 . xml#c h a r =11,36> , <nwr : d a t a / c a r s /57R0−J5K1−JC86−C1N7 . xml#c h a r =694 ,697 > , <nwr : d a t a / c a r s /57R0−J5K1−JC86−C1N7 . xml#c h a r =2795 ,2808 > , <nwr : d a t a / c a r s /57R0−J5K1−JC86−C1N7 . xml#c h a r =2274 ,2280 > , <nwr : d a t a / c a r s /57R0−J5K1−JC86−C1N7 . xml#c h a r =395 ,397 > , <nwr : d a t a / c a r s /57R0−J5K1−JC86−C1N7 . xml#c h a r =53,56> , <nwr : d a t a / c a r s /57R0−J5K1−JC86−C1N7 . xml#c h a r =1752 ,1758 > , <nwr : d a t a / c a r s /57R0−J5K1−JC86−C1N7 . xml#c h a r =1858 ,1864 > , <nwr : d a t a / c a r s /57R0−J5K1−JC86−C1N7 . xml#c h a r =1052 ,1058 > , <nwr : d a t a / c a r s /57R0−J5K1−JC86−C1N7 . xml#c h a r =3333 ,3346 > , <nwr : d a t a / c a r s /57R0−J5K1−JC86−C1N7 . xml#c h a r =4120 ,4126 > , <nwr : d a t a / c a r s /57R0−J5K1−JC86−C1N7 . xml#c h a r =357 ,370 > , <nwr : d a t a / c a r s /57R0−J5K1−JC86−C1N7 . xml#c h a r =580 ,583 > , <nwr : d a t a / c a r s /57R0−J5K1−JC86−C1N7 . xml#c h a r =443 ,446 > , <nwr : d a t a / c a r s /57R0−J5K1−JC86−C1N7 . xml#c h a r =4344 ,4350 > , <nwr : d a t a / c a r s /59 JJ −0761−DY2M−33MF. xml#c h a r =4277 ,4292 > , <nwr : d a t a / c a r s /59 JJ −0761−DY2M−33MF. xml#c h a r =4693 ,4699 > . , Figure 12: Drogba as an instance in SEM 1 2 3 4 5 6 7 8 9 10 11 <nwr : d a t a / c a r s / non− e n t i t i e s / from+a+n e w s p a p e r+o r+magazine> a n w r o n t o l o g y :NONENTITY ; rdfs : label ” from a n e w s p a p e r o r m a g a z i n e ” ; g a f : denotedBy <nwr : d a t a / 2 0 0 4 / 0 3 / 2 6 / 4 C16−2HY0−01JV−13BX. xml#c h a r =4099 ,4127 > ; skos : relatedMatch dbp : Magazine <nwr : d a t a / c a r s / non− e n t i t i e s / a+l a w s u i t > a n w r o n t o l o g y :NONENTITY ; rdfs : label ”a l a w s u i t ” ; g a f : denotedBy <nwr : d a t a / c a r s /5629−K3P1−F190−V098 . xml#c h a r =3239 ,3248 > ; skos : relatedMatch dbp : L a w s u i t . Figure 13: Non-entity instance in SEM NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 23/148 <!−−magazine−−> <mark i d =”m4” s o u r c e =”DBpedia ” lemma=”m a g a z i n e”> <span> <t a r g e t i d =”w23”/> </span> <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e s o u r c e =” s p o t l i g h t ” r e f e r e n c e =”dbp : Magazine ” c o n f i d e n c e =”1.0” r e f t y p e =”en ” s o u r c e =”en”/> </ e x t e r n a l R e f e r e n c e s > </mark> <!−− l a w s u i t −−> <mark i d =”m138” s o u r c e =”DBpedia ” lemma=” l a w s u i t ”> <span> <t a r g e t i d =”w601”/> </span> <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e s o u r c e =” s p o t l i g h t ” r e f e r e n c e =”dbp : L a w s u i t ” c o n f i d e n c e =”1.0” r e f t y p e =”en ” s o u r c e =”en”/> </ e x t e r n a l R e f e r e n c e s > </mark> Figure 14: Markables in NAF non-entities through the DBpedia entries and classes rather than just on the string values. 2.1.2 Events We cannot recover a URI from DBpedia to represent an event instance. We therefore create a blank URI using the meta data of the document and an event counter. Furthermore, we give the type information, the labels, the mentions for each event object. The event coreference sets in NAF are the basis for defining the event instances in a single document. In Figure 15, we show several examples of event coreference sets. Event coreference sets are derived from the predicates in the Semantic Role Layer (SRL). Every predicate is represented in a coreference set. Predicates with the same lemma are represented in the same set, as well as predicates that are semantically similar. As a result we get both singleton sets (noreference) and multiform sets (coreference). For each mention, we determine the highest scoring wordnet synsets and across different mentions we select the most dominant senses from the highest scoring senses. Likewise, each coreference set gets external references to the most dominant senses with the averaged word-sense-disambiguation score. Whenever the similarity across these dominant senses is above the threshold, we merge coreference sets and add the lowest common subsumer to the merged set. This is shown for the last example in Figure 15, were shot and injured are merged into a single set with the lowest common subsumer synset eng-30-00069879-v, injure:1, wound:1 and a similarity score of 2.6390574 using the method described by Leacock and Chodorow (1998). Each coreference set becomes a potential event instance, where we use the mentions for the labels and the references to the WordNet synset as a subclass relation.3 Furthermore, we collect a subset of the ontological labels assigned to each predicate in the SRL. For example for chased, we find the following typing in the SRL layer shown in Figure 16, from which we copy the FrameNet and ESO classes into the instance representation. 3 We convert all reference to WordNet synset to InterLingualIndex concepts to allow cross-lingual match- ing NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 < c o r e f i d =” c o e v e n t 7 2 ” t y p e=” e v e n t ”> <!−−c h a s e d−−> <span> <t a r g e t i d =”t 3 7 1 ”/> </span> <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e s o u r c e =”WordNet −3.0” s o u r c e =” d o m i n a n t s e n s e ”/> </ e x t e r n a l R e f e r e n c e s > </ c o r e f > < c o r e f i d =” c o e v e n t 4 9 ” t y p e=” e v e n t ”> <!−− g i v i n g −−> <span> <t a r g e t i d =”t 1 5 0 ”/> </span> <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e s o u r c e =”WordNet −3.0” s o u r c e =” d o m i n a n t s e n s e ”/> <e x t e r n a l R e f r e s o u r c e =”WordNet −3.0” s o u r c e =” d o m i n a n t s e n s e ”/> <e x t e r n a l R e f r e s o u r c e =”WordNet −3.0” s o u r c e =” d o m i n a n t s e n s e ”/> <e x t e r n a l R e f r e s o u r c e =”WordNet −3.0” s o u r c e =” d o m i n a n t s e n s e ”/> <e x t e r n a l R e f r e s o u r c e =”WordNet −3.0” s o u r c e =” d o m i n a n t s e n s e ”/> <e x t e r n a l R e f r e s o u r c e =”WordNet −3.0” s o u r c e =” d o m i n a n t s e n s e ”/> </ e x t e r n a l R e f e r e n c e s > </ c o r e f < c o r e f i d =” c o e v e n t 5 7 ” t y p e=” e v e n t ”> <!−−w i n n i n g−−> <span><t a r g e t i d =”t 2 8 4 ”/></span> <!−−won−−> <span><t a r g e t i d =”t 2 9 1 ”/></span> <!−−won−−> <span><t a r g e t i d =”t 4 0 5 ”/></span> <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e s o u r c e =”WordNet −3.0” s o u r c e =” d o m i n a n t s e n s e ”/> <e x t e r n a l R e f r e s o u r c e =”WordNet −3.0” s o u r c e =” d o m i n a n t s e n s e ”/> </ e x t e r n a l R e f e r e n c e s > </ c o r e f > 24/148 r e f e r e n c e =” i l i −30−02001858−v ” c o n f i d e n c e =”1.0” r e f e r e n c e =” i l i −30−02200686−v ” c o n f i d e n c e = ” 0 . 8 6 5 3 8 1 9 6 ” r e f e r e n c e =” i l i −30−02339171−v ” c o n f i d e n c e =”1.0” r e f e r e n c e =” i l i −30−02199590−v ” c o n f i d e n c e = ” 0 . 9 1 3 9 8 1 ” r e f e r e n c e =” i l i −30−01629403−v ” c o n f i d e n c e = ” 0 . 9 7 3 0 2 9 3 ” r e f e r e n c e =” i l i −30−02316868−v ” c o n f i d e n c e = ” 0 . 8 5 7 6 5 4 9 ” r e f e r e n c e =” i l i −30−02235842−v ” c o n f i d e n c e = ” 0 . 8 0 7 2 3 5 1 ” r e f e r e n c e =” i l i −30−02288295−v ” c o n f i d e n c e = ” 0 . 9 7 5 4 5 7 1 3 ” r e f e r e n c e =” i l i −30−01100145−v ” c o n f i d e n c e =”1.0” < c o r e f i d =” c o e v e n t 6 1 ” t y p e=” e v e n t ”> <!−−s h o t −−> <span><t a r g e t i d =”t 3 4 5 ”/></span> <!−−s h o t −−> <span><t a r g e t i d =”t 4 1 4 ”/></span> <!−− i n j u r e d −−> <span> <t a r g e t i d =”t 6 8 4 ”/></span> <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e s o u r c e =” P r i n c e t o n WordNet 3 . 0 ” r e f e r e n c e =”eng −30−00069879−v ” c o n f i d e n c e = ” 2 . 6 3 9 0 5 7 4 ” s o u r c e =”l o w e s t c o m m o n s u b s u m e r ”/> <e x t e r n a l R e f r e s o u r c e =”WordNet −3.0” r e f e r e n c e =” i l i −30−02055267−v ” c o n f i d e n c e = ” 0 . 8 1 2 2 6 4 8 6 ” s o u r c e =” d o m i n a n t s e n s e ”/> <e x t e r n a l R e f r e s o u r c e =”WordNet −3.0” r e f e r e n c e =” i l i −30−01134781−v ” c o n f i d e n c e = ” 0 . 9 5 1 7 7 9 2 ” s o u r c e =” d o m i n a n t s e n s e ”/> <e x t e r n a l R e f r e s o u r c e =”WordNet −3.0” r e f e r e n c e =” i l i −30−01137138−v ” c o n f i d e n c e = ” 0 . 9 2 4 6 8 0 7 ” s o u r c e =” d o m i n a n t s e n s e ”/> <e x t e r n a l R e f r e s o u r c e =”WordNet −3.0” r e f e r e n c e =” i l i −30−02484570−v ” c o n f i d e n c e = ” 0 . 9 3 0 1 4 6 4 6 ” s o u r c e =” d o m i n a n t s e n s e ”/> </ e x t e r n a l R e f e r e n c e s > </ c o r e f > Figure 15: Event coreference set in NAF NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 25/148 <!−−t 3 7 1 c h a s e d : A1 [ t 3 6 7 England ] AM−DIR [ t 3 7 2 down]−−> <p r e d i c a t e i d =”p r 7 1”> <!−−c h a s e d−−> <span> <t a r g e t i d =”t 3 7 1 ”/> </span> <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e s o u r c e =”PropBank ” r e f e r e n c e =” c h a s e .01”/ > <e x t e r n a l R e f r e s o u r c e =”VerbNet ” r e f e r e n c e =”c h a s e −51.6”/ > <e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”Cotheme”/> <e x t e r n a l R e f r e s o u r c e =”PropBank ” r e f e r e n c e =” c h a s e .01”/ > <e x t e r n a l R e f r e s o u r c e =”ESO” r e f e r e n c e =” T r a n s l o c a t i o n ”/> <e x t e r n a l R e f r e s o u r c e =”EventType ” r e f e r e n c e =” c o n t e x t u a l ”/> <e x t e r n a l R e f r e s o u r c e =”WordNet” r e f e r e n c e =” i l i −30−02001858−v”/> <e x t e r n a l R e f r e s o u r c e =”WordNet” r e f e r e n c e =” i l i −30−02535093−v”/> </ e x t e r n a l R e f e r e n c e s > <nwr : d a t a / c a r s /55XK−XGX1−JBKJ−C3CF . xml#ev72> a sem : Event , n w r o n t o l o g y : c o n t e x t u a l E v e n t , e s o : T r a n s l o c a t i o n , f n : Cotheme , i l i : i 3 1 7 4 7 ; rdfs : label ” chase ” ; g a f : denotedBy <nwr : d a t a / c a r s /55XK−XGX1−JBKJ−C3CF . xml#c h a r =2074 ,2080 > . Figure 16: SRL in NAF with event types for the predicate chased In the case of the coreference set of injured and shot, we derive no classes for injured but various FrameNet classes from the predicate shot, seen in Figure 17. In addition to the specific semantic classes, every event is of the type sem:Event and one of the 3 main event classes in NewsReader: sourceEvent events that introduce a source of information as the semantic subject, such as speech-acts (say, claim, declare) and cognitive-verbs (think, believe, hope, fear ). grammaticalEvent auxiliary verbs (be, have, should, will ), aspectual verbs (stop, begin, take place, happen) that do not introduce other participants in an event, do not define a different time interval for events and express properties of other events. contextualEvent all other events than the above that take place in some world and do not introduce a source of events or express a property of an event. On the basis of an analysis of the car data set, we create lists of FrameNet frames classified as contextual, source or grammatical. In Appendix 9, we give a complete list of FrameNet frames that distinguish between these three main event types. 2.1.3 Participant and event relations Once the instances for entities, non-entities and events have been established, we determine the relations between them. We first extract from the SRL layer all the roles with a valid Propbank role. The following role labels are considered valid: PRIMEPARTICIPANT a0, arg0, a-0, arg-0 NONPRIMEPARTICIPANT a1, a2, a3, a4, arg1, arg2, arg3, a-1, a-2, a-3, a-4, arg-1, arg-2, arg-3, arg-4, am-dir, argm-dir LOCATION am-loc, argm-loc, am-dir NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 26/148 <!−−t 5 7 9 i n j u r e d : A0 [ t 5 7 7 he ] AM−MNR[ t 5 8 0 s o ] A1 [ t 5 8 1 t h a t]−−> <p r e d i c a t e i d =”p r 1 0 7”> <!−− i n j u r e d −−> <span> <t a r g e t i d =”t 5 7 9 ”/> </span> <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e s o u r c e =”PropBank ” r e f e r e n c e =” i n j u r e .01”/ > </ e x t e r n a l R e f e r e n c e s > <p r e d i c a t e i d =”p r 8 3”> <!−−s h o t −−> <span> <t a r g e t i d =”t 5 0 0 ”/> </span> <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e s o u r c e =”PropBank ” r e f e r e n c e =” s h o o t .02”/ > <e x t e r n a l R e f r e s o u r c e =”VerbNet ” r e f e r e n c e =”p o i s o n −42.2”/ > <e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =” H i t t a r g e t ”/> <e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =” S h o o t p r o j e c t i l e s ”/> <e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =” U s e f i r e a r m ”/> <e x t e r n a l R e f r e s o u r c e =”PropBank ” r e f e r e n c e =” s h o o t .02”/ > <e x t e r n a l R e f r e s o u r c e =”EventType ” r e f e r e n c e =” c o n t e x t u a l ”/> <e x t e r n a l R e f r e s o u r c e =”WordNet” r e f e r e n c e =” i l i −30−02484570−v”/> </ e x t e r n a l R e f e r e n c e s > <nwr : d a t a / c a r s /59JB−GV01−JBSN−30SP . xml#ev84> a sem : Event , n w r o n t o l o g y : c o n t e x t u a l E v e n t , f n : H i t t a r g e t , i l i : i27293 , i l i : i27278 , i l i : i34141 , fn : S h o o t p r o j e c t i l e s i l i : i22125 , fn : Use firearm ; rdfs : label ” shoot ” , ” i n j u r e ” ; g a f : denotedBy <nwr : d a t a / c a r s /59JB−GV01−JBSN−30SP . xml#c h a r =2215 ,2219 > , <nwr : d a t a / c a r s /59JB−GV01−JBSN−30SP . xml#c h a r =2588 ,2595 > . , Figure 17: SRL in NAF for predicate injured Next, we intersect the span of the role with the span of any mention of the entities and nonentities that were extracted before. Since the spans are established in very different ways across the NLP modules that create the NAF layers, we implemented a loose matching principle that the number of matching content words across the role and the entity needs exceed 75% of each. This prevents excessively long spans to match with short spans. Content words are terms with part-of-speech that start with R (adverb), N (noun), V (verb), A (adjective) or G (adjective). Note that non-entities always exactly match at least one role since they are derived from the roles. If there is a match between an entity and a role, we create a SEM relation between the event instance and the entity instance and copy the semRole value and the external references from the role as predicates for the relation. In Figure 18 we show the SRL structure for the chased predicate given before, followed by the SEM relations that are extracted from it. The SEM relations are combined in named-graphs for which we created blank URIs based on the identifiers for the predicate and the role, e.g. pr71,rl175. Since the role is considered to be an actor role, the sem:hasActor predicate is added. Furthermore, only the PropBank, ESO and FrameNet roles are kept. Note that the NAF role down the ball is not represented in RDF. This is because there is no matching entity for this role and there is no FrameNet role to promote it as a non-entity. We thus constraint the representation of the events to components that are somehow modeled and crystallized. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 27/148 <!−−t 3 7 1 c h a s e d : A1 [ t 3 6 7 England ] AM−DIR [ t 3 7 2 down]−−> <p r e d i c a t e i d =”p r 7 1”> <!−−c h a s e d−−> <span> <t a r g e t i d =”t 3 7 1 ”/> </span> <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e s o u r c e =”PropBank ” r e f e r e n c e =” c h a s e .01”/ > <e x t e r n a l R e f r e s o u r c e =”VerbNet ” r e f e r e n c e =”c h a s e −51.6”/ > <e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”Cotheme”/> <e x t e r n a l R e f r e s o u r c e =”PropBank ” r e f e r e n c e =” c h a s e .01”/ > <e x t e r n a l R e f r e s o u r c e =”ESO” r e f e r e n c e =” T r a n s l o c a t i o n ”/> <e x t e r n a l R e f r e s o u r c e =”EventType ” r e f e r e n c e =” c o n t e x t u a l ”/> <e x t e r n a l R e f r e s o u r c e =”WordNet” r e f e r e n c e =” i l i −30−02001858−v”/> <e x t e r n a l R e f r e s o u r c e =”WordNet” r e f e r e n c e =” i l i −30−02535093−v”/> </ e x t e r n a l R e f e r e n c e s > < r o l e i d =” r l 1 7 5 ” semRole=”A1”> <!−−England d e f e n d e r John Terry−−> <span> <t a r g e t i d =”t 3 6 7 ”/> <t a r g e t i d =”t 3 6 8 ” head=” y e s ”/> <t a r g e t i d =”t 3 6 9 ”/> <t a r g e t i d =”t 3 7 0 ”/> </span> <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e s o u r c e =”VerbNet ” r e f e r e n c e =”c h a s e −51.6@Theme”/> <e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”Cotheme@Cotheme”/> <e x t e r n a l R e f r e s o u r c e =”PropBank ” r e f e r e n c e =” c h a s e . 0 1 @1”/> <e x t e r n a l R e f r e s o u r c e =”ESO” r e f e r e n c e =” T r a n s l o c a t i o n @ t r a n s l o c a t i o n −theme”/> </ e x t e r n a l R e f e r e n c e s > </ r o l e > < r o l e i d =” r l 1 7 6 ” semRole=”AM−DIR”> <!−−down t h e b a l l −−> <span> <t a r g e t i d =”t 3 7 2 ” head=” y e s ”/> <t a r g e t i d =”t 3 7 3 ”/> <t a r g e t i d =”t 3 7 4 ”/> </span> </ r o l e > </ p r e d i c a t e > <nwr : d a t a / c a r s /55XK−XGX1−JBKJ−C3CF . xml#pr71 , r l 1 7 5 > { <nwr : d a t a / c a r s /55XK−XGX1−JBKJ−C3CF . xml#ev72> sem : h a s A c t o r dbp : J o h n T e r r y ; f n : Cotheme@Cotheme dbp : J o h n T e r r y ; e s o : t r a n s l o c a t i o n −theme dbp : J o h n T e r r y ; pb : A1 dbp : J o h n T e r r y . } Figure 18: SRL in NAF with roles and corresponding SEM triples for actor relations NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 2.1.4 28/148 Temporal anchoring Time objects and temporal relations play an important role in NewsReader. Without a proper time anchoring, we cannot compare one event to another. The same type of event with the same participants on a different day is by definition not the same event: compare telling the same story today and next week. As disjoint spatial boundaries define distinct objects, so do disjoint temporal boundaries for events. Time objects for delineating events are derived from the timex3 layer in NAF. There are two types of time expressions: DATE and DURATION. Both have span elements to the words expressing it and can have value attributes. DATE expressions have a value attribute that usually points to a specific normalised ISO date. DURATION expressions usually have ISO periods as values but can have optional attributes for beginPoint and endPoint whose values are DATE expressions represented elsewhere. A special timex3 element is the document creation time (if known). This is a DATE expression with the attribute functionInDocument=”CREATION TIME” and usually the identifier tm0. The document creation time is derived from the meta data of a document and has no span element pointing to its expression in the document. Some examples for each type taken from different NAF files are given in Figure 19. From the timex3 elements, we derive two types of instances: time:Instant and time:Interval. We can only do this if we can obtain at least the year from the normalised value (the month and day can remain underspecified). If the values are relative and the year is not explicit, we cannot interpret the expression as a time object and we have to ignore it. The next examples in Figure 20show the representation of time objects in RDF-TRiG derived from different time expressions such as the ones shown above. Time expressions as well as the document creation time are represented as instances with unique URIs. Instances of the type time:Instant have a time:inDateTime relation to a date object, whereas instances of time:Interval have a time:hasBeginning and/or time:hasEnd relation to a date. Dates are represented as separate instances of the type time:DateTimeDescription with values for the year, month and/or day according to owl-time4 . Each event in SEM-RDF needs to be anchored to at least one time expression that resolves to a time instance. Events without time anchoring are ignored in the output. Anchoring relations are expressed using a sem:hasTime relation between an event URI and a time expression URI. The triples are embedded inside a named graph just as the participant-event relations we have seen before: <http://www.telegraph.co.uk#tr2> { <http://www.telegraph.co.uk#ev2> sem:hasTime <http://www.telegraph.co.uk#tmx0> . } We use the following heuristic to detect relations between events and time instances in a NAF file: 1. The NAF layer temporalRelations provides explicit anchor relations between predicates and time expressions; 4 http://www.w3.org/TR/owl-time NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 29/148 <t i m e x 3 i d =”tmx0” t y p e=”DATE” f u n c t i o n I n D o c u m e n t=”CREATION\ TIME” v a l u e =”2007−01−10T00 : 0 0 : 0 0 ” / > <t i m e x 3 i d =”tmx1” t y p e=”DATE” v a l u e =”2007−01−10”> <!−−J a n u a r y 10 , 2007−−> <span><t a r g e t i d =”w7”/>< t a r g e t i d =”w8”/>< t a r g e t </timex3> i d =”w9”/>< t a r g e t i d =”w10”/></span> <t i m e x 3 i d =”tmx1” t y p e=”DURATION” b e g i n P o i n t =”tmx19 ” e n d P o i n t=”tmx0 ” v a l u e =”P10Y”> <!−−10 y e a r s −−> <span> <t a r g e t i d =”w70”/>< t a r g e t i d =”w71”/></span> </timex3> <t i m e x 3 i d =”tmx2” t y p e=”DATE” v a l u e =”PRESENT REF”> <!−−now−−> <span><t a r g e t i d =”w73”/></span> </timex3> <t i m e x 3 i d =”tmx2” t y p e=”DATE” v a l u e =”2008−09”> <!−−September 2008−−> <span><t a r g e t i d =”w76”/>< t a r g e t i d =”w77”/></span> </timex3> <t i m e x 3 i d =”tmx3” t y p e=”DATE” v a l u e =”2015−03”> <!−−March−−> <span><t a r g e t i d =”w141”/></span> </timex3> <t i m e x 3 i d =”tmx4” t y p e=”DATE” v a l u e =”2015−06”> <!−−June−−> <span><t a r g e t i d =”w168”/></span> </timex3> <t i m e x 3 i d =”tmx5” t y p e=”DURATION” e n d P o i n t=”tmx0 ” v a l u e =”PXM”> <!−− r e c e n t months−−> <span><t a r g e t i d =”w202”/>< t a r g e t i d =”w203”/></span> </timex3> <t i m e x 3 i d =”tmx4” t y p e=”DURATION” b e g i n P o i n t =”tmx7 ” e n d P o i n t=”tmx0” v a l u e =”P8Y”> <!−− e i g h t y e a r s −−> <span><t a r g e t i d =”w90”/>< t a r g e t i d =”w91”/></span> </timex3> <t i m e x 3 i d =”tmx5” t y p e=”DURATION” v a l u e =”PXY”> <!−−t h e y e a r s −−> <span><t a r g e t i d =”w176”/>< t a r g e t i d =”w177”/></span> </timex3> <t i m e x 3 i d =”tmx6” t y p e=”DURATION” b e g i n P o i n t =”tmx8 ” e n d P o i n t=”tmx0” v a l u e =”P21Y”> <!−−21 y e a r s −−> <span><t a r g e t i d =”w198”/>< t a r g e t i d =”w199”/></span> </timex3> <t i m e x 3 i d =”tmx7” t y p e=”DATE” v a l u e =”2003−10−20”/> <t i m e x 3 i d =”tmx8” t y p e=”DATE” v a l u e =”1990−10−20”/> Figure 19: Timex3 elements in NAF NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 # document c r e a t i o n time , which h a s no m e n t i o n s <nwr : w s j 1 0 1 3 . xml#tmx0> a time : I n s t a n t ; rdfs : label ” nwr : t i m e / 1 9 8 9 1 0 2 6 ” t i m e : inDateTime nwr : t i m e / 1 9 8 9 1 0 2 6 . 30/148 ; # DURATION w i t h b e g i n and end p o i n t <nwr : 4 M1J−3MC0−TWKJ−V1W8 . xml#tmx2> a time : I n t e r v a l ; rdfs : label ” week ” ; g a f : denotedBy nwr : 4 M1J−3MC0−TWKJ−V1W8 . xml#c h a r =822 ,825 , nwr : 4 M1J−3MC0−TWKJ−V1W8 . xml#c h a r =826 ,830 nwr : 4 M1J−3MC0−TWKJ−V1W8 . xml#c h a r =831 ,833 nwr : 4 M1J−3MC0−TWKJ−V1W8 . xml#c h a r =834 ,839 t i m e : h a s B e g i n n i n g nwr : t i m e / 2 0 0 5 1 0 0 3 ; t i m e : hasEnd nwr : 2 0 0 6 1 0 0 2 . , , ; # Q u a r t e r i n t e r p r e t e d a s I n t e r v a l w i t h b e g i n and end p o i n t <nwr : w s j 1 0 1 3 . xml#tmx3> a time : I n t e r v a l ; rdfs : label ” quarter ” ; g a f : denotedBy nwr : w s j 1 0 1 3 . xml#c h a r =363 ,366 , nwr : w s j 1 0 1 3 . xml#c h a r =367 ,373 nwr : w s j 1 0 1 3 . xml#c h a r =374 ,381 ; t i m e : h a s B e g i n n i n g nwr : t i m e / 1 9 8 9 0 7 0 1 ; t i m e : hasEnd nwr : t i m e / 1 9 8 9 0 9 3 0 . # DATE i n t e r p r e t e d a s a I n s t a n t <nwr : w s j 1 0 1 3 . xml#tmx4> a time : I n s t a n t ; rdfs : label ”earlier” ; g a f : denotedBy nwr : w s j 1 0 1 3 . xml#c h a r =401 ,402 , nwr : w s j 1 0 1 3 . xml#c h a r =403 ,407 nwr : w s j 1 0 1 3 . xml#c h a r =408 ,415 ; t i m e : inDateTime nwr : t i m e /1988> . <nwr : t i m e /19890701 > a t i m e : day t i m e : month time : unitType time : year time : DateTimeDescription ; ”−−−01”ˆˆ< h t t p : / /www. w3 . o r g / 2 0 0 1 /XMLSchema#gDay> ; ”−−07”ˆˆ< h t t p : / /www. w3 . o r g / 2 0 0 1 /XMLSchema#gMonth> ; t i m e : unitDay ; ”1989”ˆˆ < h t t p : / /www. w3 . o r g / 2 0 0 1 /XMLSchema#gYear> . <nwr : t i m e /19890930 > a t i m e : day t i m e : month time : unitType time : year time : DateTimeDescription ; ”−−−30”ˆˆ< h t t p : / /www. w3 . o r g / 2 0 0 1 /XMLSchema#gDay> ; ”−−09”ˆˆ< h t t p : / /www. w3 . o r g / 2 0 0 1 /XMLSchema#gMonth> ; t i m e : unitDay ; ”1989”ˆˆ < h t t p : / /www. w3 . o r g / 2 0 0 1 /XMLSchema#gYear> . <nwr : t i m e /1988> a time : unitType time : year time : DateTimeDescription ; t i m e : unitDay ; ”1988”ˆˆ < h t t p : / /www. w3 . o r g / 2 0 0 1 /XMLSchema#gYear> . , , Figure 20: SEM representations for time instants and time intervals NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 <p r e d i c a t e A n c h o r <p r e d i c a t e A n c h o r <p r e d i c a t e A n c h o r <p r e d i c a t e A n c h o r i d =”an1 ” i d =”an5 ” i d =”an8 ” i d =”an8 ” 31/148 anchorTime=”tmx0”><span><t a r g e t i d =”p r 1”/></span></p r e d i c a t e A n c h o r > anchorTime=”tmx2”><span><t a r g e t i d =”p r 6”/></span></p r e d i c a t e A n c h o r > b e g i n P o i n t =”tmx2”><span><t a r g e t i d =”p r 9”/></span> </ p r e d i c a t e A n c h o r > e n d P o i n t=”tmx0”><span><t a r g e t i d =”p r 4 3”/></span></p r e d i c a t e A n c h o r > Figure 21: Time anchoring of predicates in NAF 2. If there is no anchor relation through the temporalRelations, check the sentence in which the event is mentioned for time expressions; else the preceding and following sentence, and finally two sentences before the event mention; 3. If there is still no anchor relation, then attach the event to the document creation time; 4. If there is also no document creation time, then anchor the event to the year zero: 0000-12-25. The next example in Figure 21 shows how predicates are anchored to time expressions in the temporalRelations layer in NAF, where the predicate identifier is given as the span and the time expression as an attribute value for either the anchorTime, beginPoint or endPoint. To deal with the complexity of the temporal relations, we had to extend SEM with more specific time relations. In addition to the generic sem:hasTime, events are linked through any of the following relations: sem:hasAtTime: we assume the event took place at this time Instant or during this Interval; sem:hasFutureTime: we assume the event took place in the future relative to this time Instant object; sem:hasEarliestBeginTime: we assume the event began at this time Instant object; sem:hasEarliestEndTime: we assume the event ended at this time Instant object; The procedure is as follows: we first check if the event is explicitly anchored in NAF to a time expression, where: 1. an anchorTime attribute results in a sem:hasAtTime relation 2. a value for beginPoint results in a sem:hasEarliestBeginTime relation 3. an endPoint value in sem:hasEarliestEndTime. If there is no such relation, we check if the factuality module located the event in the future. If that is the case, we create a sem:hasFutureTime relation relative to the document NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 32/148 : e v e n t#Worked sem : hasTime : tmxWeek . : e v e n t#S l e p t sem : h a s E a r l i e s t B e g i n T i m e sem : h a s E a r l i e s t B e g i n T i m e : tmxMonday ; : tmxFriday . : tmxWeek a time : I n t e r v a l rdfs : label ” week ” ; t i m e : h a s B e g i n n i n g nwr : t i m e / 1 9 8 9 0 7 0 1 ; t i m e : hasEnd nwr : t i m e / 1 9 8 9 0 7 0 7 . ; : tmxMonday a rdfs : label t i m e : inDateTime time : I n s t a n t ”Monday” ; nwr : t i m e / 1 9 8 9 0 7 0 1 . ; a rdfs : label t i m e : inDateTime time : I n s t a n t ” Friday ” ; nwr : t i m e / 1 9 8 9 0 7 0 5 . ; : tmxFriday Figure 22: SEM relations between events and time expressions , creation time.5 For all other cases: the event is related to a time expression in the same or a close sentence or it is simply related to the document creation time, we relate the event through the sem:hasAtTime relation. It is important to realise that intervals are represented in two different ways. An event can have a sem:hasAtTime relation to an interval object or it can have a: sem:hasEarliestBeginTime and/or sem:hasEarliestEndTime relation to two different time instance objects. The former applies to cases where the interval itself is explicitly referred to in the text, e.g. I worked for a week, whereas the latter applies to cases where the interval is not mentioned directly in the text but the boundaries of the event are mentioned, e.g. I slept from Monday till Friday. We represent these cases as shown in Figure 22, assuming that the expressions have been normalised to ISO dates. The begin and end points of intervals are therefore either found at the instance level or at the SEM relation level. Although this complicates the querying of the data for events at points in time, we believe it better expresses the way time intervals are represented in language. To ease the retrieval of the events, we always provide a sem:hasTime relation in addition to any of these specific relations. All events related to a specific point in time can be retrieved through this relation, regardless of the specific time relation. 2.2 Identity across events So far, we discussed the way data in a single NAF file is interpreted as SEM instances and represented in RDF-TRiG. However, the GAF framework is designed to capture identity of mentions across sources. This means that we need to compare SEM objects across different 5 Note that events can be located in the future in two ways: we either created an explicit sem:hasFutureTime relation with respect to the document creation time as described here or there was an explicit anchoring to a time that is in the future with respect to the document creation time. In the latter case, the time is known whereas in the former case we only know it may happen after the document creation time NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 33/148 sources to establish identity. For entities, non-entities and normalised time objects this comes naturally through the way the URIs are defined. For these instances, we assume that the URI is defining the identity and no further measures are taken. In the case of events, this is more complicated. The blank URIs are meaningless across documents and we need to define identity on terms of the triples defined for each event. We follow Quine (1985) here who assumes that time and place are defining criteria. Without time and place, actions are just denotations of abstract classes of concepts. They need to be anchored in time and space to become instantiated. Furthermore, we assume that the type of action and the participants play a role in establishing identity (Cybulska and Vossen, 2013a). We therefore defined so-called Composite Event structure combining all the information relevant for an event. It contains the following SEM components: 1. the event instance 2. the entity instances that participate in the event (both actors and locations) 3. time instants and intervals that apply to the event 4. the relations between the event and the other elements Since the instance representations generalise over the mentions, relations between event components are aggregated from different mentions. For example, the time of an event can be mentioned in one sentence whereas a participant can be mentioned in another sentence that is coreferential. Likewise, the instance representation is the aggregation of all the information that directly relates to the events expressed over the complete document. The Composite Event structures are compared across different NAF files to establish cross-document coreference. For this comparison, again all the information related to the instances is used. In case of identify, Composite Event structures are merged, if not, they are kept separate. Figure 23 shows an overview of the two-step approach. The NAF files represent mentions of events (labeled as e), entities that participate in these events (labeled as p), time expressions linked to the events (labelled as t) and locations labeled as l. As far as mentions have identity relations they are coreferential across the mentions in a single NAF file. NAF2SEM then first creates SEM instances from these mentions in a single NAF file and next compares the Composite Events based on the SEM representation across NAF files to establish cross-document identity of events on the basis of the similarity of the action, the participants and the time. The details of this approach are described in more detail below. We developed two different implementations of the above general approach. One for batch processing in which an empty KnowledgeStore is filled with the result of processing a fixed set of NAF documents. Another for a continuous stream of NAF documents, in which the interpretation of incoming NAF files is compared to data already stored in the KnowledgeStore. In the former case, a file structure is used to define which Composite Events need to be compared. In the latter case, each Composite Event extracted from a NAF file is compared to the data in the KnowledgeStore and the result is directly stored. We describe both processes in more detail in next subsection. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 34/148 Figure 23: Identity and event relations between NAF mentions and SEM instances using Composite Event structures NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 2.2.1 35/148 Event comparison in batch mode The main steps for batch processing are: 1. ClusterEventObjects (a) Read each NAF file one by one; (b) Create Composite Event objects from single NAF files; (c) Store the Composite Events into temporal buckets as NAF objects; 2. MatchEventObjects (a) Read all Composite Events from the NAF objects in the same temporal bucket; (b) Compare the events to decide on cross-document event coreference and merge data if necessary; (c) Output the Composite event data to a SEM.trig file for each temporal bucket; 3. Populate the KnowledgeStore with the SEM.trig files; In Figure 24, we show an overview of the batch processing architecture. From single NAF files, we extract Composite Event structures which are stored in so-called temporal buckets for the different main types of events: contextualEvent, source and grammatical events (see below for more details). After processing all NAF files, the binary object files in each bucket are loaded in memory and compared for identity. Identical events are merged. The result is stored in SEM RDF-TRiG files. When the process has finished, the SEM RDF-TRiG files are used to populate the KnowledgeStore. After creating the time-event relations, we use the time relation to create temporal buckets, where we apply the following rules: 1. Events without a temporal relation or with too many temporal relations are ignored. The threshold for the maximum number of temporal relations is now set to 5. If more than 5 time expressions are associated with the event, we assume the temporal relation is too complex to interpret. 2. If there is a single time relation to a time:Instant Object, we use the date to create a folder, e.g. e-1989-10-26 to store the event data. If the time Object is of the type time:Interval, we create a period using the begin and end points, e.g. e-2005-10-032006-10-02. 3. If there is more than one time Object related to an event we check if the relations are sem:hasEarliestBeginTime and sem:hasEarliestEndTime. These relations implicitly define a period without the use of an interval expression, i.e. there is no linguistic expression of the interval but only of the begin and end of the event. If so, we create a period bucket by first taking the begin points and then taking the end points, e.g. e-1989-10-12-1989-10-26. Note that events that have only a begin or end point, still get a simple event bucket similar to events with a single event relation. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 36/148 Figure 24: NAF2SEM batch processing overview 4. If the multiple time Objects are not the begin or end points, we create different time buckets for each time object and the event is duplicated to each bucket. This means that we assume that the event can be matched with different events at different points of time with other events.6 The above time buckets are created within the basic event type distinction between contextual, source and grammatical events except for those events that are positioned in the future (see below). For each event, we check the associated frames against these lists to decide on the main event type. If no frame is associated, the event is considered as contextual. Events with the sem:hasFutureEvent relation are stored in a separate folder without distinguishing the basic event types. Figure 25 shows the structure that is created for storing Composite Event data in batch processing mode. The events folder is subdivided into 4 subfolders: contextualEvent, sourceEvent, grammaticalEvent and futureEvent. Within each subfolder, binary object files (.obj) are stored in temporal buckets for events anchored to that time from each NAF file with all the data relevant for that event. 6 There is a positive and negative side effect to this strategy. If the events get different instance URIs they get automatically split. If the URIs remain the same they get merged back into a single instance eventually. Whether this happens depends on the order of the comparison since the event instance URI is based on the first URI used in the comparison. A future fix of this arbitrary effect is that merging events forces the creation of a new unique URI. A side-effect of this fix is that split events will never get merged and are treated as distinct events NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 37/148 Figure 25: Example of an event type and temporal bucket structure created by the NAF2SEM module. First the events are divided into contextual, source and grammatical events, and within these into temporal buckets. In each temporal bucket, an object file is created for the events from each NAF file that are associated with the time. Across the different NAF object files, a single sem.trig file is created with all the event data. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 38/148 The purpose of the division into event type and temporal buckets is to compare events for cross-document event coreference, after which the SEM RDF-TRiG files are created. We create a sem.trig file for each bucket. We apply different strategies for the different event types and for future events. Events are considered identical: IFF events have the same time anchoring; IFF events share the same dominant senses or the same lemmas; IFF events shared sufficient participants, which varies for the type of event and whether events are labeled as future or not; The temporal anchoring follows from the comparison of events within the same temporal buckets. The events with the same time anchor are first compared for the type of action expressed. Events are derived from the coreference sets in the NAF file and each Composite Event can be based on more than one mention of the event. These mentions can have the same lemma and share wordnet synsets. Across the Composite Events, we first check if they have sufficient overlap in the wordnet synsets (defined by a threshold that can be adapted, default is set to 50%). This prevents different meanings of the same word to be matched for the wrong reasons. If there are no wordnet synsets associated with either of the Composite Events, we check if the lemmas sufficiently overlap across the coreference set (this can also be set through a proportional threshold, default 50%).7 Finally, events need to share participants. What participants and how many participants need to be matched can be specified through the API by specifying the role-labels that need to be matched. For the different types of events, we follow the following principles: 1. source events must share the PropBank A0 participant to be coreferential. This correlates to the source of the event. The target of the source event is more difficult to compare because the descriptions are usually longer and can vary across mentions.8 2. grammatical events typically have PropBank A1 and A2 arguments that need to match since these arguments usually refer to the main event, e.g. They stopped the meeting on Monday, The meeting was stopped on Monday. We also assume that the lemmas need to match instead of synsets since the meanings of these expressions are abstract. 3. contextual events typically can vary with respect to the participants expressed. We typically assume that the A1 role is most informative and needs to match. Depending on the domain other roles can be choosen. 7 We also may have the synset of Lowest-Common-Subsumers (LCS) from NAF coreference sets that have different lemmas. The LCS represents the hypernym synset that mention pairs share. Currently, this is not used to match events. 8 A future extension could do a word vector comparison of the target roles of these events to constraint coreference more NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 39/148 j a v a −Xmx2000m −cp . . / l i b / E v e n t C o r e f e r e n c e −1.0−SNAPSHOT−j a r −with−d e p e n d e n c i e s . j a r eu . n e w s r e a d e r . e v e n t c o r e f e r e n c e . n a f . C l u s t e r E v e n t O b j e c t s −−n a f−f o l d e r ” . . / t e s t ” −−e v e n t −f o l d e r ” . . / t e s t ” −−e x t e n s i o n ” . xml ” −−p r o j e c t c a r s −−s o u r c e −f r a m e s ” . . / r e s o u r c e s / s o u r c e . t x t ” −−g r a m m a t i c a l −f r a m e s ” . . / r e s o u r c e s / g r a m m a t i c a l . t x t ” −−c o n t e x t u a l −f r a m e s ” . . / r e s o u r c e s / c o n t e x t u a l . t x t ” −−non− e n t i t i e s j a v a −Xmx2000m −cp . . / l i b / E v e n t C o r e f e r e n c e −1.0−SNAPSHOT−j a r −with−d e p e n d e n c i e s . j a r eu . n e w s r e a d e r . e v e n t c o r e f e r e n c e . n a f . M a t c h E v e n t O b j e c t s −−e v e n t −f o l d e r ” . . / t e s t / e v e n t s / c o n t e x t u a l E v e n t ” −−match−t y p e i l i l e m m a −− r o l e s ” a n y r o l e ” −−c o n c e p t −match 30 −−p h r a s e −match 10 −− i l i . . / r e s o u r c e s / i l i . t t l −−p e r s p e c t i v e j a v a −Xmx2000m −cp . . / l i b / E v e n t C o r e f e r e n c e −1.0−SNAPSHOT−j a r −with−d e p e n d e n c i e s . j a r eu . n e w s r e a d e r . e v e n t c o r e f e r e n c e . n a f . M a t c h E v e n t O b j e c t s −−e v e n t −f o l d e r ” . . / t e s t / e v e n t s / s o u r c e E v e n t ” −−match−t y p e i l i l e m m a −− r o l e s ” a0 , a1 ” −−c o n c e p t −match 50 −−p h r a s e −match 50 −− i l i . . / r e s o u r c e s / i l i . t t l −−p e r s p e c t i v e j a v a −Xmx2000m −cp . . / l i b / E v e n t C o r e f e r e n c e −1.0−SNAPSHOT−j a r −with−d e p e n d e n c i e s . j a r eu . n e w s r e a d e r . e v e n t c o r e f e r e n c e . n a f . M a t c h E v e n t O b j e c t s −−e v e n t −f o l d e r ” . . / t e s t / e v e n t s / g r a m m a t i c a l E v e n t ” −−match−t y p e lemma −− r o l e s ” a1 , a2 ” −−p h r a s e −match 50 −− i l i . . / r e s o u r c e s / i l i . t t l −−p e r s p e c t i v e j a v a −Xmx2000m −cp . . / l i b / E v e n t C o r e f e r e n c e −1.0−SNAPSHOT−j a r −with−d e p e n d e n c i e s . j a r eu . n e w s r e a d e r . e v e n t c o r e f e r e n c e . n a f . M a t c h E v e n t O b j e c t s −−e v e n t −f o l d e r ” . . / t e s t / e v e n t s / f u t u r e E v e n t ” −−match−t y p e lemma −− r o l e s ” a l l ” −−p h r a s e −match 80 −− i l i . . / r e s o u r c e s / i l i . t t l −−p e r s p e c t i v e Figure 26: NAF2SEM calls for clustering and matching with different paramters 1 2 3 4 5 6 7 8 9 10 11 j a v a −Xmx2000m −cp . . / l i b / E v e n t C o r e f e r e n c e −1.0−SNAPSHOT−j a r −with−d e p e n d e n c i e s . j a r eu . n e w s r e a d e r . e v e n t c o r e f e r e n c e . n a f . N o C l u s t e r E v e n t O b j e c t s −−n a f−f o l d e r ” . . / t e s t ” −−e v e n t −f o l d e r ” . . / t e s t ” −−e x t e n s i o n ” . xml ” −−p r o j e c t c a r s −−s o u r c e −f r a m e s ” . . / r e s o u r c e s / s o u r c e . t x t ” −−g r a m m a t i c a l −f r a m e s ” . . / r e s o u r c e s / g r a m m a t i c a l . t x t ” −−c o n t e x t u a l −f r a m e s ” . . / r e s o u r c e s / c o n t e x t u a l . t x t ” −−non− e n t i t i e s −−t o p i c s j a v a −Xmx2000m −cp . . / l i b / E v e n t C o r e f e r e n c e −1.0−SNAPSHOT−j a r −with−d e p e n d e n c i e s . j a r eu . n e w s r e a d e r . e v e n t c o r e f e r e n c e . n a f . M a t c h E v e n t O b j e c t s −−e v e n t −f o l d e r ” . . / t e s t / e v e n t s / a l l ” −−match−t y p e i l i l e m m a −− r o l e s ” a n y r o l e ” −−t i m e ”month” −−c o n c e p t −match 30 −−p h r a s e −match 10 −− i l i . . / r e s o u r c e s / i l i . t t l −−p e r s p e c t i v e Figure 27: NAF2SEM calls without clustering and matching all events 4. future events need to have identical participants. Since there is no specific time anchoring, we can only assume identity if all other information matches. In Figure 26 we show the current settings that are used to call the function for creating the binary object files for NAF files (ClusterEventObjects) and the functions for processing the temporal buckets in each type of event (MatchEventObjects): Alternatively, one can choose also not to cluster events using the class: eu.newsreader.eventcoreference.naf.NoClusterEventObjects and in addition use the temporal anchoring of the events as a matching constraint, as shown in Figure 27. In the latter case, you can set the granularity of the temporal dimension to year, month or day. Optionally, you can use the –topic parameter to subdivide into topics within the main folder. Note that the –topic option also works for the above event-type and temporal clustering option. 2.2.2 Event comparison in streaming mode The NAF2SEM stream architecture has been introduced to enable real-time, incremental processing of NAF documents. The batch processing mode works quite solid for transformNewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 40/148 nwr : i n s t a n c e s { <nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#ev13> a sem : Event , n w r o n t o l o g y : c o n t e x t u a l E v e n t , wn : eng −14422488−n , wn : eng −13456567−n , wn : eng −13457378−n , f n : C h a n g e p o s i t i o n o n a s c a l e , e s o : QuantityChange , wn : eng −00203866−v ; rdfs : label ” decline ” ; g a f : denotedBy <nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#c h a r =33,40> . <nwr : d a t a / c a r s / non− e n t i t i e s /30+%25> rdfs : label ”30 %” ; g a f : denotedBy <nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#c h a r =29,32> . <nwr : t i m e /20140611 > a t i m e : day t i m e : month time : unitType time : year time : DateTimeDescription ; ”−−−11”ˆˆ< h t t p : / /www. w3 . o r g / 2 0 0 1 /XMLSchema#gDay> ; ”−−06”ˆˆ< h t t p : / /www. w3 . o r g / 2 0 0 1 /XMLSchema#gMonth> ; t i m e : unitDay ; ”2014”ˆˆ < h t t p : / /www. w3 . o r g / 2 0 0 1 /XMLSchema#gYear> . <nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#tmx0> a time : I n s t a n t ; t i m e : inDateTime <nwr : t i m e /20140611 > . ... } <nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#t r 3 > { <nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#ev13> sem : hasTime <nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#tmx0> . } nwr : p r o v e n a n c e { <nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#pr3 , r l 4 > g a f : denotedBy <nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#c h a r =29,40> . <nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#t r 1 0 > g a f : denotedBy <nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#> . <nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#pr15 , r l 3 4 > g a f : denotedBy <nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#c h a r =296 ,330 > . ... } Figure 28: Example snippet of the TRiG stream generated by NAF2SEM ing very big sets of news articles from NAF to SEM. However, in realistic scenarios the news articles come incrementally, minute by minute, day by day. The batch architecture is unable to handle this situation, as it requires the batch to be reprocessed each time new NAFs need to be processed. Clearly, the streaming architecture is a promising solution for this challenge. Once the initial set of NAF files would be processed and inserted using the batch mode, the follow-up NAF files are being processed through a streaming architecture. Unlike the batch architecture which mainly pays off once we have a significant amount of NAF files, the stream architecture is meant to work with a small set of NAF files at a time (maximum of 1000 at a time). Figure 31 presents an overview of the streaming architecture. Once a new (set of) NAF(s) has arrived, the NAF2SEM module is first fired to convert each NAF file into an RDF-TRiG representation. In the streaming architecture, NAF2SEM simply creates a TRiG set of events by looking at the current file. Similarly as in the batch mode, the events are extracted together with their attached actors, relations and time expressions. What is different than in the batch mode, is that the extracted events are not matched with events coming from other news articles. An example snippet of a TRiG created by NAF2SEM is given in Figure 28. The output from the NAF2SEM module is then used as an input for the streaming NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 41/148 Table 1: Cross-document event coreference arguments for stream processing Argument –concept-match INT –phrase-match INT –contextual-match-type PATH –contextual-lcs –contextual-roles PATH –source-match-type PATH –source-lcs –source-roles PATH –grammatical-match-type PATH –grammatical-lcs –grammatical-roles PATH –future-match-type PATH –future-lcs –recent-span INT Comment Threshold for conceptual matches of events, default is 50. Threshold for phrase matches of events, default is 50. Indicates what is used to match events across resources. Default value is ILILEMMA. Use lowest-common-subsumers. Default value is ON. String with roles for which there must be a match, e.g. ”pb:A1, sem:hasActor” Indicates what is used to match events across resources. Default value is ILILEMMA. Use lowest-common-subsumers. Default value is OFF. String with roles for which there must be a match, e.g. ”pb:A1, sem:hasActor” Indicates what is used to match events across resources. Default value is LEMMA. Use lowest-common-subsumers. Default value is OFF. String with roles for which there must be a match, e.g. ”pb:A1, sem:hasActor” Indicates what is used to match events across resources. Default value is ”LEMMA” Use lowest-common-subsumers. Default value is OFF. Amount of past days which are still considered recent and are treated differently. cross-document event coreference module. Apart from the SEM-TRiG input, the user is also allowed to specify a rich set of matching arguments, such as match type and a list of needed roles for each event type. The full list of input arguments for event coreference is given in Table 1. The cross-document event coreference module then processes the piped TRiG stream based on the argument configuration specified by the user. This module essentially builds a SPARQL SELECT query for each of the events from the TRiG, based on its components: role participants, relations and time expressions (see Figure 32 for an example query). The SPARQL query incorporates all criteria for matching of the current event with respect to the KnowledgeStore. Once it has been composed, the query is sent to the KnowledgeStore. Every result returned from the KnowledgeStore (if any) is a coreferential event to the event in question. For each of these events, we create an owl:sameAs relation to the current event, and append all the sameAs identity relations to the input TRiG in a separate graph (nwr:identities). The nwr:identity graph is shown in Figure 29. Now the TRiG contains all the events from the current news article and their identity relations to already processed events in the KnowledgeStore. It can be the case that the current event has no coreferential event yet in the KnowledgeStore: in this case, the event has no entry in the graph with identity relations. Nevertheless, at this point the TRiG is ready to be inserted into the KnowledgeStore. The module functionality hence finishes with an INSERT request to the KnowledgeStore to store the resulting TRiG. Once this is done, the new events still need to be merged to their coreferential events (if any) in the KnowledgeStore. The remainder of the events (with no coreferential correspondent) is directly inserted in KS. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 42/148 nwr : i d e n t i t y { <nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#ev40> owl : sameAs <nwr : d a t a / c a r s /52 F6−KBJ1−DYWJ−P42R . xml#ev113> . <nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#ev48> owl : sameAs <nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#ev47> . <nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#ev50> owl : sameAs <nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#ev45> . <nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#ev51> owl : sameAs <nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#ev50> . <nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#ev50 tmx0> owl : sameAs <nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#ev45> . } Figure 29: Examples of owl:sameAs relations generated in streaming mode 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 #!/ b i n / bash FILES = ” . . / n a f s t o p r o c e s s / ∗ . xml ” f o r f i n $FILES ; do c a t $ f | j a v a −Xmx2000m −cp . . / t a r g e t / E v e n t C o r e f e r e n c e −1.0−SNAPSHOT−j a r −with−d e p e n d e n c i e s . j a r eu . n e w s r e a d e r . e v e n t c o r e f e r e n c e . n a f . GetSemFromNafStream −−p r o j e c t c a r s −−s o u r c e −f r a m e s ” . . / r e s o u r c e s / s o u r c e . t x t ” −−g r a m m a t i c a l −f r a m e s ” . . / r e s o u r c e s / g r a m m a t i c a l . t x t ” −−c o n t e x t u a l −f r a m e s ” . . / r e s o u r c e s / c o n t e x t u a l . t x t ” −−non− e n t i t i e s −−timex−max 5 | j a v a −Xmx2000m −cp . . / t a r g e t / E v e n t C o r e f e r e n c e −1.0−SNAPSHOT−j a r −with−d e p e n d e n c i e s . j a r eu . n e w s r e a d e r . e v e n t c o r e f e r e n c e . n a f . P r o c e s s E v e n t O b j e c t s S t r e a m −−c o n t e x t u a l −match−t y p e ”ILILEMMA” −−c o n t e x t u a l −l c s −−s o u r c e −match−t y p e ”ILILEMMA” −−s o u r c e −r o l e s ”pb : A0” −−g r a m m a t i c a l −match−t y p e ”LEMMA” −−g r a m m a t i c a l −r o l e s ”pb : A1” −−c o n c e p t −match 25 −−p h r a s e −match 25 > . . / t r i g s / $ f . t r i g done Figure 30: Script for NAF2SEM in streaming architecture An example call to the streaming architecture is shown in Figure 30. The streaming architecture is an end-to-end architecture spreading from NAF to the KnowledgeStore. It turns NewsReader into a real-time incremental system, ranging from a news article to KnowledgeStore, in the following manner: 1. A fresh news article appears 2. This file is processed by the NewsReader pipeline. 3. The pipeline creates a NAF file, which is fed to NAF2SEM. 4. NAF2SEM extracts RDF-TRiG data from NAF and pipes this to the cross-document Event coreference module. 5. This module communicates with the KnowledgeStore and decides on the final set of events and their coreferential relations to the KS events. This is represented in RDF-TRiG format. 6. KS finally “digests” the new RDF, by merging the coreferential events and by simply inserting the events which do not have coreferential event in KS. From a scientific perspective, this architecture introduces the flexibility to adjust the manner of event matching and facilitates experimentation with different parameters. Through NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 43/148 Figure 31: Overview of the NAF2SEM and event coreference stream architecture. Figure 32: Example SPARQL query used to match events from code in the KnowledgeStore. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 44/148 the list of configuration arguments, we are offered an easy option to experiment with various matching options concerning: roles, matching thresholds, matching types, etc. This list of arguments is furthermore not finite and may easily expand/shrink as we go along. We are planning to experiment with using lowest-common-subsumers (LCS) in the matching of event relations. We also plan to work towards using WordNet relations for coreference. Additionally, time is an important factor to be further researched. We may want to introduce a recency relevance, which would first try to match an event to the most recent set of events (from the last few days). This idea comes from the fact that the news articles about an event often spread in a rather small interval of time. Another time aspect which we may look into is the matching of future events. As mentioned above, this architectural solution requires certain caution with respect to the amount of files being fed to the system. Components from this architecture may easily become bottlenecks, for instance, the KnowledgeStore may become slow if the frequency and number of event merges is too high. 2.3 Evaluation In this section, we present an evaluation of the quality of the SEM-RDF built with our approach (forthcoming: Rospocher et al. (2016)). Due to the lack of a proper gold standard to compare with, we relied on human judgment for the triples describing some randomly sampled events of the graph. A similar approach was applied to evaluate YAGO2 Hoffart et al. (2013), a large knowledge graph automatically extracted from Wikipedia pages. We conducted an evaluation of the SEM-RDF triples extracted from the NewsReader Wikinews corpus consisting of 120 news documents. We sampled 100 events from the resulting RDF, splitting them in 4 groups of 25 events each. For each event, we retrieved all its triples, obtaining 4 subgraphs (labeled: S1 , S2 , S3 , S4 ) of approx. 260 triples each. Each subgraph was submitted to a pair of human raters, which independently evaluated each triple of their subgraph. The triples of each subgraph were presented to the raters grouped by event, and for each event the link to its corresponding mentions in the text were provided, so that raters were able to inspect the original text to assess the correctness of the extracted triples. In total, 8 human raters evaluated a total of 1,043 triples of the RDF-SEM data, with each triple independently evaluated by two raters. Raters were given precise criteria to follow for evaluating their subgraph. For instance, in case of an event extracted from many event mentions, raters were instructed to first assess if all its mentions actually refer to the same event: if at least one of these mentions is referring to an event different than the other ones, all triples of the resulting instance have to be considered incorrect.9 This is a quite strict and potentially highly-penalizing criterion, if considered in absolute terms from a triple perspective: one “wrong” mention out of many coreferring mentions, potentially contributing with few triples to the event, may hijack all the triples of the corresponding event. There were for example several instances in which 4 mentions were identified by the pipeline as referring to the same 9 A similar criterion was adopted for cases where something was wrongly identified as an event. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 45/148 Table 2: Quality triple evaluation of SEM-RDF extracted from Wikinews. S1 S2 S3 S4 All Triples 267 256 261 259 1043 Accuracy 0.607 0.525 0.552 0.548 0.551 κ 0.623 0.570 0.690 0.751 event instance, of which 3 were indeed referring to the same instance. Due to our strict evaluation method, all four mentions were considered incorrect. Performing a pairwise evaluation would have been less strict, but as our goal is to accurately extract knowledge graphs from text, and in particular to obtain correctly structured description of events, we believe this criterion goes in this direction. Table 2.3 presents the resulting triple accuracy on the whole evaluation dataset, as well as the accuracy on each subgraph composing it, obtained as average of the assessment of the each rater pair. For each subgraph, the agreement between the rater pair is also reported, computed according to the Cohen’s kappa coefficient (κ). The results show an overall accuracy of 0.551, varying between 0.525 and 0.607 on each subgraph. The Cohen’s kappa values, ranging from 0.570 and 0.751, show a substantial agreement between the raters of each pair. Drilling down these numbers on the type of triples considered — typing triples (rdf:type), annotation triples (rdfs:label), participation triples (properties modelling event roles according to PropBank, FrameNet, and ESO), the accuracy on annotation triples is higher (0.772 on a total of 101 triples), while it is slightly lower for typing (0.522 on 496 triples) and participation triples (0.534 on 446 triples). Indeed, further drilling down on participation triples, the accuracy is higher for PropBank roles (0.559) while it is lower on FrameNet (0.438) and ESO roles (0.407), which reflects the fact that the SRL tool used is trained on PropBank, while FrameNet and ESO triples are obtained via mapping. Looking at the event candidates in the evaluation dataset, 69 of them (out of 100) were confirmed as proper events by both raters. Of the 17 candidate coreferring events (i.e. those having multiple mentions), only 4 of them were marked as correct by both raters (i.e. both raters stated that all mentions were actually referring to the same event) while in a couple of cases an event was marked as incorrect because of one wrong mention out of 4, thus causing all the triples of the event to be marked as incorrect. To remark the aforementioned strict evaluation criterion adopted, we note that ignoring all coreferring events (and their corresponding triples) in the evaluation dataset, the triple accuracy rises to 0.697 on a total of 782 triples. Table 3 shows the details for both the full evaluation and when ignoring the even-coreference effects. Finally, these numbers need to be interpreted in relation to the task-specific evaluation of event detection and event coreference on benchmark corpora. In Wikinews recall of event mention detection is about 77% against the gold annotation of events, while within document event-coreference has a precision of 40.7%. This is a different type of evaluation NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 46/148 Table 3: Detailed quality triple evaluation of SEM-RDF extracted from Wikinews with and without taking even-coreference into account. Group 1 Trp Avg κ ALL 267 0.607 0.623 TYPES 122 0.594 0.649 LABELS 28 0.768 0.7 ROLES 117 0.581 0.591 PROPBANK 39 0.628 0.629 FRAMENET 77 0.461 0.559 ESO 26 0.5 0.692 Without Event Coreference: ALL 207 0.732 0.447 TYPES 91 0.731 0.483 LABELS 23 0.87 0.617 ROLES 93 0.699 0.419 PROPBANK 32 0.734 0.472 FRAMENET 56 0.58 0.387 ESO 17 0.706 0.433 Trp 256 115 26 115 39 73 25 188 91 20 77 25 58 20 Group 2 Avg κ 0.525 0.57 0.539 0.585 0.788 0.661 0.452 0.509 0.423 0.633 0.438 0.445 0.4 0.5 0.601 0.566 0.925 0.558 0.56 0.457 0.375 0.491 0.578 0 0.37 0.516 0.41 0.468 Trp 261 137 24 100 30 90 38 184 96 17 71 20 70 21 Group 3 Avg κ 0.552 0.69 0.504 0.65 0.729 0.684 0.575 0.735 0.583 0.796 0.489 0.645 0.368 0.435 0.731 0.672 0.912 0.768 0.825 0.607 0.571 0.683 0.649 0.638 0.724 0.828 0.615 0.438 Trp 259 122 23 114 28 105 29 203 93 19 91 23 79 21 Group 4 Avg κ 0.548 0.751 0.578 0.748 0.804 0.862 0.465 0.718 0.554 0.928 0.424 0.63 0.345 0.545 0.7 0.758 0.974 0.582 0.674 0.563 0.476 0.625 0.561 0 0.639 0.901 0.512 0.432 Overall Trp Avg 1043 0.551 496 0.522 101 0.772 446 0.534 136 0.559 345 0.438 118 0.407 782 371 79 332 100 263 79 0.697 0.663 0.937 0.678 0.72 0.548 0.557 since the annotators only labelled the important events and systems tend to recognise all possible events in text, which down-ranks the precision. We cannot expect coreference results within a document and cross-document to exceed the recall of event mentions. In the next section, we describe the evaluation for cross-document event coreference. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 3 47/148 Event Coreference Event coreference resolution is the task of determining whether two event mentions refer to the same event instance. In this section, we describe a new, robust “bag of events” approach to cross-textual event coreference resolution on news articles. We discuss two variations of the approach, a one- and a two-step implementation. We first delineate the approach and then describe some experiments with the new method on the ECB+ data set. 10 In the following section, we describe the current implementation within the NewsReader pipeline. 3.1 3.1.1 Bag of Events Approach The Overall Approach It is pretty much common practice to use information coming from event arguments for event coreference resolution (Humphreys et al. (1997), Chen and Ji (2009a), Chen and Ji (2009b), Chen et al. (2011), Bejan and Harabagiu (2010a), Lee et al. (2012), Cybulska and Vossen (2013b), Liu et al. (2014) amongst others). The research community seems to agree that event context information regarding time and place of an event as well as information about other participants play an important role in resolution of coreference between event mentions. Using entities for event coreference resolution is complicated by the fact that event descriptions within a sentence often lack pieces of information. As pointed out by Humphreys et al. (1997) it could be the case however that a lacking chunk of information might be available elsewhere within discourse borders. News articles, which are the focus of the NewsReader project, can be seen as a form of public discourse (van Dijk (1988)). As such the news follows the Gricean Maxim of quantity (Grice (1975)). Authors do not make their contribution more informative than necessary. This means that information previously communicated within a unit of discourse, unless required, will not be mentioned again. This is a challenge for models comparing separate event mentions with one another on the sentence level. To be able to fully make use of information coming from event arguments, instead of looking at event information available within the same sentence, we propose to take a broader look at event descriptions surrounding the event mention in question within a unit of discourse. For the purpose of this study, we consider a document (a news article) to be our unit of discourse. We experimented with an “event template” approach which employs the structure of event descriptions for event coreference resolution. In the proposed heuristic event mentions are looked at through the perspective of five slots, as annotated in the ECB+ dataset created within the NewsReader project (Cybulska and Vossen (2014b)). The five slots correspond to different elements of event information such as the action slot (or event trigger following the ACE (LDC (2005)) terminology) and four kinds of event arguments: time, location, human and non-human participant slots (see Cybulska and Vossen (2014)). 10 The experiments with the two-step bag of events approach that are reported in this section are described in Cybulska and Vossen (2015). NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 48/148 The ECB+ corpus is used in the experiments described here. Our approach determines coreference between descriptions of events through compatibility of slots of the five slot template. The next quote shows an excerpt from topic one, text number seven of the ECB corpus (Bejan and Harabagiu (2010b)). The “American Pie” actress has entered Promises for undisclosed reasons. The actress, 33, reportedly headed to a Malibu treatment facility on Tuesday. Consider two event templates presenting the distribution of event information over the five event slots in the two example sentences (tables 4 and 5). An event template can be created on different levels of information, such as a sentence, a paragraph or an entire document. We propose a novel “bag of events” approach to event coreference that explicitly employs event- and discourse structure to account for implications of Gricean Maxim of quantity. The approach fills in two event templates: a sentence and a document template. A “sentence template” collects event information from the sentence of an active action mention (tables 4 and 5). By filling in a “document template”, one creates a “bag of events” for a document, that could be seen as a kind of document “summary” (table 6). The bag of events heuristic employs clues coming from discourse structure and namely those implied by discourse borders. Descriptions of different event mentions occurring within a discourse unit, whether coreferent or related in some other way, unless stated otherwise, tend to share elements of their context. In our example text fragment the first sentence reveals that an actress has entered a rehab facility. From the second sentence the reader finds out where the facility is located (Malibu) and when the “American Pie” actress headed to the treatment center. It is clear to the reader of the example text fragment from the quotation that both events described in sentence one and two, happened on Tuesday. Also both sentences mention the same rehab center in Malibu. These observations are crucial for the “bag of events” approach proposed here. The bag of events method can be implemented as a one- or two-step classification. In a two-step approach bag of events (document) features are used for preliminary document clustering. Then per document cluster coreference is solved between action mentions in a pairwise model, based on information available in the sentence. In a one-step implementation bag of events features are added to sentence-based feature vectors generated per action mention. Coreference is solved by a classifier in a pairwise model. 3.1.2 Two-step Bag of Events Approach As the first step of the approach a document template is filled, accumulating instances of the five event slot mentions from a document, as exemplified in table 6. Pairs of document templates are clustered by means of supervised classification. In the second step of the approach coreference is solved between event mentions within document clusters created in step 1. For this task again an event template is filled but this time, it is a “sentence template” which per event mention gathers information from the sentence. A supervised classifier solves coreference between pairs of event mentions and finally pairs sharing comNewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 49/148 mon mentions are chained into coreference clusters. Figure 33 depicts the implications of the approach for the training data. Figure 34 presents how the test set is processed. Table 4: Sentence template ECB topic 1, text 7, sentence 1 Action Time Location Human Participant Non-Human Participant entered N/A Promises actress N/A Table 5: Sentence template ECB topic 1, text 7, sentence 2 Action Time Location Human Participant Non-Human Participant headed on Tuesday to a Malibu treatment facility actress N/A Table 6: Document template ECB topic 1, text 7, sentences 1-2 Action Time Location Human Participant Non-Human Participant 3.1.3 entered, headed on Tuesday Promises, to a Malibu treatment facility actress N/A Step 1: Clustering Documents Using Bag of Events Features The first step in this approach is filling in an event template per document. We create a document template by collecting mentions of the five event slots: actions, locations, times, human and non-human participants from a single document. In a document template there is no distinction made between pieces of event information coming from different sentences of a document and no information is kept about elements being part of different mentions. A document template can be seen as a bag of events and event arguments. The template stores unique lemmas, to be precise a set of unique lemmas per event template slot. On the training set of the data, we train a pairwise binary classifier determining whether two document templates share corefering event mentions. This is a supervised learning task in which we determine “compatibility” of two document templates if any two mentions from those templates were annotated in the corpus as coreferent. Let m be an event mention, and doc a collection of mentions from a single document template such that {mı : 1 ≤ ı ≤ doc} where ı is the index of a mention and NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 50/148 Table 7: ECB+ statistics ECB+ Topics Texts Action mentions Location mentions Time mentions Human participant mentions Non-human participant mentions Coreference chains # 43 982 6833 1173 1093 4615 1408 1958 Figure 33: Bag of events approach - training set processing Figure 34: Bag of events approach - test set processing NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 51/148 indexes document templates; doc : 1 ≤ ≤ DOC where DOC are all document templates from the corpus. Let ma and mb be mentions from different document templates. “Compatibility” of a pair of document templates (doc, doc+1 ) is determined based on coreference of any mentions (maı, mbı) from a pair of document templates such that: coref erence(∃maı ∈ doc, ∃mbı ∈ doc+1 ) =⇒ compatibility (doc, doc+1 ). On the training set we train a binary decision tree classifier (hereafter DT ) to find pairs of document templates containing corefering event mentions. After all unique pairs of document templates from the test set have been classified by means of the DT document template classifier, “compatible” pairs are merged into document clusters based on pair overlap. 3.1.4 Step 2: Clustering Sentence Templates The aim of the second step is to solve coreference between event mentions from document clusters which are the output of the classification task from step 1. We experiment with a supervised decision tree sentence template classifier but this time in the classification task pairs of sentence templates are considered. A sentence template is created for every action mention annotated in the data set (see examples of sentence templates in table 4 and 5). All possible unique pairs of event mentions (and their sentence templates) are generated within clusters of document templates sharing corefering event mentions in the training set. Pairs of sentence templates that translate into features indicating compatibility across five template slots are used to train a DT sentence template classifier. On the test set; after output clusters of the DT document template classifier from step 1 are turned to mention pairs (all unique pairs within a document cluster), pairs of sentence templates are classified by means of the DT sentence template classifier. To identify the final equivalence classes of corefering event mentions, within each document cluster, event mentions are grouped based on corefering pair overlap. 3.1.5 One-step Bag of Events Approach In the one-step implementation of the approach all possible unique pairs of action mentions from the corpus are used as the starting point for classification. No initial document clustering is performed. For every action mention a sentence template is filled (see examples in table 4 and 5). Also, for every corpus document a document template is filled. Five bag of events features indicating the degree of overlap between documents, from which two active mentions come from, are used for classification. In the one-step approach document features are used by a classifier together with sentence-based features; all in one go. One DT classifier is trained to determine event coreference. Pairs of mentions are classified based on a mix of information from a sentence and from a document. Corefering pairs with overlap are merged into equivalence classes. The one-step classification is implementationwise simpler but it is computationally much more expensive. Ultimately every action mention has to be compared with every other action mention. This is a drawback of the one-step method. On the other hand, it could be of advantage to have different types of NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 52/148 information (sentence- and document-based) available simultaneously to determine event mention coreference. 3.1.6 Corpus For the experiments we used true mentions from the ECB+ corpus (Cybulska and Vossen (2014b)) which is an extended and re-annotated version of the ECB corpus (Bejan and Harabagiu (2010b)). ECB+ is particularly interesting for this experiment because we extended the ECB topics with texts about different event instances but from the same event type (see Cybulska and Vossen (2014)). For example in addition to the earlier mentioned topic of a celebrity checking into a rehab, we added descriptions of another event involving a different celebrity checking into another rehab facility. Likewise, we increased the referential ambiguity for the event mentions. Since the events are similar, we expect that the only way to solve this is through analysis of the event slots. Figure 35 shows some examples of the seminal events represented in ECB+ with different event instances. Figure 35: Overview of seminal events in ECB and ECB+, topics 1-10 For the experiments on event coreference we used a subset of ECB+ annotations (based on a list of 1840 selected sentences), that were additionally reviewed with focus on coreference relations. Table 7 presents information about the data set used for the experiments. We divided the corpus into a training set (topics 1-35) and test set (topics 36-45). 3.1.7 Experimental Set Up The ECB+ texts are available in the XML format. The texts are tokenized, hence no sentence segmentation nor tokenization needed to be done. We POS-tagged (for the purpose of proper verb lemmatization) and lemmatized the corpus sentences. For the experiments we used tools from the Natural Language Toolkit (Bird et al. (2009), NLTK version 2.0.4): NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 53/148 the NLTK’s default POS tagger, Word-Net lemmatizer11 as well as WordNet synset assignment by the NLTK12 . For machine learning experiments we used scikit-learn (Pedregosa et al. (2011)). Table 8: Features grouped into four categories: L-Lemma based, A-Action similarity, Dlocation within Discourse, E-Entity coreference and S-Synset based. Event Slot Action Location Mentions Active mentions Sent. or doc. mentions Sent. or doc mentions Time Sent. or doc mentions Human Participant Sent. or doc mentions NonHuman Participant Sent. or doc mentions Feature Kind Lemma overlap (L) Synset overlap (S) Action similarity (A) Discourse location (D) - document - sentence Lemma overlap (L) Synset overlap (S) Lemma overlap (L) Entity coreference (E) Synset overlap (S) Lemma overlap (L) Entity coreference (E) Synset overlap (S) Lemma overlap (L) Entity coreference (E) Synset overlap (S) Lemma overlap (L) Entity coreference (E) Synset overlap (S) Explanation Numeric feature: overlap %. Numeric: overlap %. Numeric: Leacock and Chodorow. Binary: - the same document or not. - the same sentence or not. Numeric: overlap %. Numeric: overlap %. Numeric: overlap %. Numeric: cosine similarity. Numeric: overlap %. Numeric: overlap %. Numeric: cosine similarity. Numeric: overlap %. Numeric: overlap %. Numeric: cosine similarity. Numeric: overlap %. Numeric: overlap %. Numeric: cosine similarity. Numeric: overlap %. In the experiments different features were assigned values per event slot (see Table 8). The lemma overlap feature (L) expresses a percentage of overlapping lemmas between two instances of an event slot, if instantiated in the sentence or in a document (with the exclusion of stop words). Frequently, one ends up with multiple entity mentions from the same sentence for an action mention (the relation between an action and involved entities is not annotated in ECB+). All entity mentions from the sentence (or a document in case of bag of events features) are considered. There are two features indicating action mentions’ location within discourse (D), specifying if two active mentions come from the same sentence and the same document. Action similarity (A) was calculated for a pair of 11 12 www.nltk.org/modules/nltk/stem/wordnet.html http://nltk.org/ modules/nltk/corpus/reader/wordnet.html NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 54/148 active action mentions using the Leacock and Chodorow measure Leacock and Chodorow (1998). Per entity slot (location, time, human and non-human participant) we checked if there is coreference between entity mentions from the sentence of the two compared actions; we used cosine similarity to express this feature (E). For all five slots a percentage of synset overlap is calculated (S). In case of document templates features referring to active action mentions were disregarded, instead action mentions from a document were considered. All feature values were rounded to the first decimal point. We experimented with a few feature sets, considering per event slot lemma features only (L), or combining them with other features described in Table 8. Before fed to a classifier, missing values were imputed (no normalization was needed for the scikit-learn DT algorithm). All classifiers were trained on an unbalanced number of pairs of examples from the training set. We used grid search with ten fold cross-validation to optimize the hyper-parameters (maximum depth, criterion, minimum samples leafs and split) of the decision-tree algorithm. 3.1.8 Baseline We will look at two baselines: a singleton baseline and a rule-based lemma match baseline. The singleton baseline considers event coreference evaluation scores generated taking into account all action mentions as singletons. In the singleton baseline response there are no “coreference chains” of more than one element. The rule-based lemma baseline generates event coreference clusters based on full overlap between lemma or lemmas of compared event triggers (action slot) from the test set. Table 10 presents baselines’ results in terms of recall (R), precision (P) and F-score (F) by employing the coreference resolution evaluation metrics: MUC (Vilain et al. (1995)), B3 (Bagga and Baldwin (1998)), CEAF (Luo (2005)), BLANC (Recasens and Hovy (2011)), and CoNLL F1 (Pradhan et al. (2011)). When discussing event coreference scores must be noted that some of the commonly used metrics depend on the evaluation data set, with scores going up or down with the number of singleton items in the data Recasens and Hovy (2011). Our singleton baseline gives us zero scores in MUC, which is understandable due to the fact that the MUC measure promotes longer chains. B3 on the other hand seems to give additional points to responses with more singletons, hence the remarkably high scores achieved by the baseline in B3. CEAF and BLANC as well as the CoNLL measures (the latter being an average of MUC, B3 and entity CEAF) give more realistic results. The lemma baseline reaches 62% CoNLL F1. A baseline only considering event triggers, will allow for an interesting comparison with our event template approach, employing event argument features. 3.1.9 Results Table 9 evaluates the final clusters of corefering event mentions produced in the experiments by means of the DT algorithm when employing different features. When considering bag of events classifiers using exclusively lemma features L (row two and three), the two-step approach reached a 1% higher CoNLL F-score than the one-step NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 55/148 approach with document-based lemma features (docL). The one-step method achieved in BLANC a 2% better precision but a 2% lower recall. This is understandable. In a two-step implementation when document clusters are created some precision is lost. In a one-step classification specific sentence information is always available for the classifier hence we see slightly higher precision scores (also in other metrics). The best coreference evaluation scores with the highest CoNLL F-score of 73% and BLANC F of 72% were reached by the two-step bag of events approach with a combination of the DT document classifier using feature set L (document-based hence docL) across five event slots and the DT sentence classifier when employing features LDES (see Table 8 for a description of features). Adding action similarity (A) on top of LDES features in step two, does not make any difference on decision tree classifiers with a maximum depth of 5 using five slot templates. Our best CoNLL F-score of 73% is an 11% improvement over the strong rule based event trigger lemma baseline, and a 34% increase over the singleton baseline. Table 9: Bag of events approach to event coreference resolution, evaluated on the ECB+ in MUC, B3, mention-based CEAF, BLANC and CoNLL F measures. Alg DT DT DT DT Step1 Slot Nr 5 5 2 5 Features docL docL docL docL Alg DT DT DT DT DT DT Step2 Slot FeaNr tures 5 L 5 L+docL 5 L 5 LDES 2 LDES 5 LADES R 61 65 71 71 76 71 MUC P F 76 80 75 75 70 75 68 71 73 73 73 73 R B3 P F CEAF F R 66 68 71 71 74 71 79 83 77 78 68 78 72 75 74 74 71 74 61 64 64 64 61 64 67 69 71 72 74 72 BLANC P F 69 73 71 71 68 71 68 71 71 72 70 72 CoNLL F 70 72 73 73 70 73 Table 10: Baseline results on the ECB+: singleton baseline and lemma match of event triggers evaluated in MUC, B3, mention-based CEAF, BLANC and CoNLL F. Baseline Singleton Baseline Action Lemma Baseline R 0 71 MUC P F 0 0 60 65 R 45 68 B3 P 100 58 F 62 63 CEAF R/P/F 45 51 R 50 65 BLANC P F 50 50 62 63 CoNLL F 39 62 To quantify the contribution of document features, we contrast the results of classifiers using bag of events features with scores achieved when disregarding document features. The results reached with sentence template classification only (without any document features, row one in table 9), give us some insights into the impact of the document features on our experiment. Note that one-step classification without preliminary document template clustering is computationally much more expensive than a two-step approach, which ultimately takes into account much less item pairs thanks to the initial document template clustering. The DT sentence template classifier trained on an unbalanced training NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 56/148 set reaches 70% CoNLL F. This is 8% better than the strong baseline disregarding event arguments, but only 3% less than the two-step bag of events approach and 2% less than the one-step classification with document features. The reason for the relatively small contribution by document features could owe to the fact that in the ECB+ corpus not that many sentences are annotated per text. 1840 sentences are annotated in 982 corpus texts, i.e. 1.87 sentence per text. We expect that the impact of document features would be bigger, if more event descriptions from a discourse unit were taken into account than only the ground truth mentions. We run an additional experiment with the two-step approach in which four entity types were bundled into one entity slot. Locations, times, human and non-human participants were combined into a cumulative entity slot resulting in a simplified two-slot template. When using two-slot templates for both, document and sentence classification on the ECB+ 70% CoNLL F score was reached. This is 3% less than with five-slot templates. Table 11: Best scoring two-step bag of events approach, evaluated in MUC, B3, entitybased CEAF, BLANC and CoNLL F in comparison with related studies. Note that the BOE approach uses gold while related studies system mentions. Approach Data B&H LEE BOE-2 BOE-5 BOE-2 BOE-5 ECB B&H 2010 ECB Lee et al. 2012 ECB annot. ECB+ ECB annot. ECB+ ECB+ ECB+ Model HDp LR DT+DT DT+DT DT+DT DT+DT MUC R 52 63 65 64 76 71 P 90 63 59 52 70 75 B3 F 66 63 62 57 73 73 R 69 63 77 76 74 71 P 96 74 75 68 68 78 CEAF F 80 68 76 72 71 74 F 71 34 72 68 67 71 BLANC R NA 68 66 65 74 72 P NA 79 70 66 68 71 CoNLL F NA 72 67 65 70 72 F NA 55 70 66 70 73 To the best of our knowledge, the only related study using clues coming from discourse structure for event coreference resolution was done by Humphreys et al. (1997) who perform coreference merging between event template structures. Both approaches determine event compatibility within a discourse representation but we achieve that in a different way, with a much more restricted template (five slots only) which in our two-step approach facilitates merging of all event and entity mentions from a text as the starting point. Humphreys et al. consider discourse events and entities for event coreference resolution while operating on the level of mentions, more similar to our one-step approach. They did not report any event coreference evaluation scores. Some of the metrics used to score event coreference resolution are dependent on the number of singleton events in the evaluation data set (Recasens and Hovy, 2011). Hence for the sake of a meaningful comparison it is important to consider similar data sets. The ECB and ECB+ are the only available resources annotated with both: within- and crossdocument event coreference. To the best of our knowledge no baseline has been set yet for event coreference resolution on the ECB+ corpus. So in Table 11 we will also look at results achieved on the ECB corpus which is a subset of ECB+, and so the closest to the data set used in our experiments but capturing less ambiguity of the annotated event types NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 57/148 (Cybulska and Vossen, 2014b). We will focus on the CoNLL F measure that was used for comparison of competing coreference resolution systems in the CoNLL 2011 shared task. The best results of 73% CoNLL F were achieved on the ECB+ by the two-step bag of events approach using five slot event templates (BOE-5 in Table 11). When using twoslot templates we get 3% less CoNLL F on ECB+. For the sake of comparison, we run an additional experiment on the ECB part of the corpus (annotation by Cybulska and Vossen (2014b)). The ECB was used in related work although with different versions of annotation so not entirely comparable. We run two tests, one with the simplified templates considering two slots only: action and entity slot (as annotated in the ECB by Lee et al. (2012)) and one with five-slot templates. The two slot bag of events (BOE-2 ) on the ECB part of the corpus reached comparable results to related works: 70% CoNLL F, while the five-slot template experiment (BOE-5 ) results in 66% CoNLL F. The approach of Lee et al. (2012) (LEE ) using linear regression (LR) reached 55% CoNLL F although on a much more difficult task entailing event extraction as well. The component similarity method of Cybulska and Vossen (2013b) resulted in 70% CoNLL F but on a simpler within topic task (not considered in Table 11). B&H in the table refers to the approach of Bejan and Harabagiu (2010) using hierarchical Dirichlet process (in the table referred to by HDp); for this study no CoNLL F was reported. In the BOE experiments reported in Table 11 we used the two-step approach. During step 1 only (document based) lemma features (docL) were used and for sentence template classification (step 2) LDES features were employed. In the tests with the bag of events approach, ground truth mentions were used. 3.1.10 Conclusion In this section we experimented with two variations of a new bag of events approach to event coreference resolution: a one-step method and a higher scoring two-step heuristic. Instead of performing topic classification before solving coreference between event mentions, as is done in most studies, the two-step bag of events approach first compares document templates created per discourse unit and only after that, does it compare single event mentions and their arguments. In contrast to a heuristic using a topic classifier, that might have problems distinguishing between different instances of the same event type, the bag of events approach facilitates context disambiguation between event mentions from different discourse units. Grouping events depending on compatibility of event context (time, place and participants) on the discourse level, allows one to take advantage of event context information, which is mentioned only once per unit of discourse and consequently is not always available on the sentence level. From the perspective of performance, the robust two-step bag of events approach using a very small feature set, also significantly restricts the number of compared items. Therefore, it has much lower memory requirements than a pairwise approach operating on the mention level. Given that this approach does not consider any syntactic features and that the evaluation data set is only annotated with 1.8 sentences per text, the evaluation results are highly encouraging. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 3.2 58/148 Evaluation of the NewsReader pipeline The ’bag-of-events’ approach described in the previous sections uses the annotations of the event components. By taking the gold annotation, we can more purely evaluate the impact of the approach without any impact of the other processing that may introduce errors. In this section, we describe the performance of the NewsReader pipeline as is on the same ECB+ data set starting from the text. The same tokenized text was processed with the NewsReader pipeline version 3.0 (Agerri et al. (2015)). There were 4 files out of 982 for which the pipeline gave no output. This resulted in 20 events that were not recovered. The starting point for the cross-document coreference is the intra-document coreference layer in NAF. Event coreference sets are generated using the EventCoreference module that makes a distinction between the event types contextualEvent, sourceEvent and grammaticalEvent. For the latter two, no coreference relations are generated within the same document. For contextualEvents, we first group all lemmas into a candidate coreference set, next decide on the dominant sense of the lemma and finally measure the similarity across candidate coreference sets. If the dominant senses of lemmas are sufficiently similar, the candidate sets are merged. Dominant senses are derived from all occurrences of a lemma in a document by cumulating the WSD score of each occurrence. We take those senses with the 80% highest cumulated WSD scores as the dominant senses (program setting: – wsd 0.8). When we compare different lemma-based coreference sets we use these dominant senses to measure similarity in WordNet according to the Leacock-Chodorow method (Leacock and Chodorow (1998)). We use the hypernym relations and the cross-part-of-speech event relations from WordNet to establish similarity (program setting: –wn-lmf wneng30.lmf.xml.xpos). The threshold for similarity was set to 2.0 (program setting –sim 2.0). If different lemmas are considered coreferential, we store the lowest-common-subsumer synset that established the similarity as an external reference in the coreference set. Once all the coreference sets are established (both singletons and multiforms), we add all the hypernym synsets for the dominant senses as external references. For the cross-document evaluation, we merged all the ECB+ CoNLL files from each topic into a single key file, effectively mixing ECB and ECB+ files into a single task. Since ECB+ reflects a systematic referential ambiguity for seminal topic events, we thus create a task that reflects this ambiguity as well. Since ECB has 43 different topics, 43 unique key files were created. Below, we show a fragment from such a key file for topic 1 in which Tara Reid is checking into rehab in the ECB file number 10, whereas Lindsay Lohan is checking into rehab in the ECB+ file number 10. The identifiers in the key file are created using the CROMER tool across the different documents, each annotated in NAF, as explained in Cybulska and Vossen (2014b).13 #begin document (1); 1_10ecb 0 1 Perennial 1_10ecb 0 2 party 1_10ecb 0 3 girl 1_10ecb 0 4 Tara - 13 Note that we reduced multiword phrases such as checked into to the first token only, since the NewsReader system does not mark multiwords as events. Hence in the example below, we removed the idenitifer (132016236402809085484) from token 1 10ecb 0 8 into. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 0 0 0 0 0 0 0 0 0 0 0 0 0 5 Reid 6 checked (132016236402809085484) 7 herself 8 into 9 Promises 10 Treatment 11 Center 12 ,13 her 14 rep 15 told (132016235311629112331) 16 People 17 . - 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 59/148 A friend of the actress told (110088372) People she went (132016236402809085484) to Promises on Tuesday and that her friends and family supported (110088386) her decision (132016236402809085484) . - 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1 1 1 1 1 1 1 30 31 32 33 34 35 36 Lindsay Lohan checks (132015738091639707092) into Betty Ford Center - 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 After skipping out on entering (132015832182464413376) a Newport Beach rehabilitation facility and facing (132015992713150306962) the prospect of arrest for violating (132015992916565253818) her probation (132015993252785693471) ,Lindsay Lohan has checked (132015738091639707092) into the Betty Ford Center to begin (132015992988761097172) a 90 -day court -mandated stay (132015736700251185985) in her - NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 3 3 3 3 83 84 85 86 60/148 reckless driving (132015732766174435569) conviction (132015993650409892802) . - The NAF2SEM module from NewsReader was used to generate SEM-RDF files for each topic by processing all the NAF files within that topic. From the SEM-RDF files, we extract unique numerical identifiers from the event instances and insert them in a CoNLL response file for the tokens that form the mentions of the event. Below we show the same fragment for topic 1 with the output from the NewsReader system added to the tokens: #begin document (1); 1_10ecb 0 1 Perennial 1_10ecb 0 2 party 1_10ecb 0 3 girl 1_10ecb 0 4 Tara 1_10ecb 0 5 Reid 1_10ecb 0 6 checked (139) 1_10ecb 0 7 herself 1_10ecb 0 8 into 1_10ecb 0 9 Promises 1_10ecb 0 10 Treatment 1_10ecb 0 11 Center 1_10ecb 0 12 ,1_10ecb 0 13 her 1_10ecb 0 14 rep 1_10ecb 0 15 told (139) 1_10ecb 0 16 People 1_10ecb 0 17 . 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 1_10ecb 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 A friend of the actress told (139) People she went to Promises (445) on Tuesday and that her friends and family supported (239) her decision (18) . - 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1 1 1 1 1 1 1 30 31 32 33 34 35 36 Lindsay Lohan checks (153) into Betty Ford Center - 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 After skipping (399) out on entering (173) a Newport Beach rehabilitation facility and facing (499) the prospect (265) of arrest (42) for violating (138) her - NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 1_10ecbplus 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 61/148 probation (367) ,Lindsay Lohan has checked (153) into the Betty Ford Center to begin (464) a 90 -day court -mandated stay (407) in her reckless driving conviction (10) . - We used the latest version (v8.01) of the official CoNLL scorer (Luo et al., 2014) to compare the response output with the key data. We generate the BLANC score for each topic and then macro average the results across all topics. In the next subsection, we describe the results for the NewsReader pipeline without any adaptation. We vary the parameters in the NAF2SEM module to see the impact on the performance. In subsection 3.2.2, we measure the result of augmenting the event detection in the NewsReader pipeline output. 3.2.1 NewsReader output We first applied the NAF2SEM system on the NewsReader pipeline output as is. Since ECB+ is an experimental data set with systematic but limited ambiguity (each seminal event mention within a topic can have two referential values), we ran the NAF2SEM system by comparing all CompositeEvents within each topic, using temporal anchoring as an additional parameter. The CompositeEvent RDF data consist of: • Event action data – the WordNet synsets and ontology types associated to all event mentions based on the intra-document event coreference sets – the labels used to make reference to the event action and the most frequent label as the preferred label – all the mentions in the documents • Participant data – the URI of the participant, either derived from DBpedia or created from the phrase as an entity or non-entity NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 62/148 – the labels used to make reference to the entity and the most frequent label as the preferred label – all the mentions in the documents • Time data – the URI of the time expression – the labels used to make reference to the time – all the mentions in the documents – the owl-time object that represents the normalised ISO value for the time expression, with a specification of at least the year and possibly the month and day • SEM relations – Any specific PropBank, FrameNet or ESO role between event actions and participants – Any sem:hasTime relations between event actions and time expressions When comparing the CompositeEvents across documents, we can use any of the above features to compare events. We experimented with a number of parameters to measure their impact on the performance of the system: Event matching Event mentions across documents need to match proportionally in terms of associated synsets, lemmas or combinations of these Temporal anchoring No temporal constraint, same year, same month or same day Participant match No participant constraint, at least one participant needs to match, either through the URI or through the preferred label For Event matching, we can apply different strategies: compare the WordNet synsets (program parameter: –match-type ili), lemmas of mentions (program parameter: –matchtype lemma) or a combination of synsets and lemmas (program parameter: –match-type ililemma). For using synsets associated with the coreference sets, we can choose the dominant senses, the lowest-common-subsumers (program parameter: –lcs) and the hypernyms of the dominant senses ((program parameter: –hypers). For the Temporal anchoring, we can set the granularity to years, months and days. This means that, depending on the amount of detail of the time anchoring, events need to have matching time according to these settings. When we leave out the –time value, the time anchoring is not considered. Finally for participants, we can define the precise roles to be matched (PropBank, VerbNet, FrameNet, ESO), any role to be matched or none. To match, the program NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 63/148 considers the role label and the role object separately. If the label is specified (e.g. program parameter: –roles a0,a1,a2), we need to match at least one participant object with that label. If no label is specified (program parameter: –roles anyrole), at least one participant needs to match regardless of the role label. Matching the objects is done on the basis of the URI. If the URIs do not match, we check of the preferred label of a participant is among the labels of the other participant. Event coreference depends on the detection of the events. The coreference scorer (Luo et al., 2014) also gives the scores for the detection of the mentions. Without adaptation, the event mention recall for the NewsReader output is 72,66% and the precision is 66,29% (F1 69,01%). For the results below, we cannot expect the coreference to perform above the results for the event mention detection. The event mention detection defines the upper bound. We first experimented with the proportion of WordNet synsets (program parameter: –match-type ili, –hypers, –lcs) that need to match across two CompositeEvents. Using these settings, events only match if they share some of the meanings as scored by the WSD system or through direct hyernyms or lowest-common-subsumers. Events are not matched through their lemma We set the time constraint to match the year and month and the participant have at least one match regardless of the role. We varied the proportion of WordNet synsets to match in steps of 10 between 0% to 100%. The threshold defines minimal the proportion of synsets that needs to be associated for both compared events to be merged. Likewise, events with excessive synsets cannot absorb events with few synsets. The results are shown in Table 12 and in Figure 36. Table 12: BLANC refererence results macro averaged over ECB+ topics in terms of recall (R), precision (P) and F1 (F) for NewsReader output with different proportions of WordNet synsets to match: S=only synset matches, SL= Synsets matches if synsets and lemma matches if no synsets associated, L=lemmas only. Different columns represent proportions in steps of 10% from 1% to 100%. R-SL R-S R-L P-SL P-S P-L F-SL F-S F-L 1 36.07% 36.04% 27.60% 30.57% 30.95% 46.12% 30.42% 30.78% 24.77% 10 36.14% 35.97% 20 35.98% 35.77% 37.60% 38.58% 39.40% 39.96% 34.22% 34.31% 34.50% 34.49% 30 35.74% 35.68% 27.52% 41.25% 41.35% 45.98% 34.79% 34.72% 24.63% 40 35.59% 35.40% 50 35.42% 35.17% 60 35.12% 35.09% 70 34.84% 34.82% 80 34.76% 34.70% 90 34.53% 34.41% 41.64% 41.72% 42.13% 42.07% 42.13% 42.18% 42.37% 42.44% 42.54% 42.54% 42.58% 42.55% 34.71% 34.56% 34.72% 34.47% 34.43% 34.40% 34.24% 34.22% 34.19% 34.13% 34.03% 33.88% 100 34.48% 34.33% 27.49% 42.83% 42.81% 45.61% 33.99% 33.82% 24.57% The highest recall is obtained using first synsets and lemmas in addition with 10% overlap (R-SL=36.14%). We can see that recall drops when more overlap is required. We see the opposite for precision but the highest precision is obtained using solely the lemmas, where 1% overlap is sufficient (R-L=46.12%).14 The highest f-measure is obtained using 14 We did not test all proportions of overlap for lemmas because there is 1.7 lemma per coreference set NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 64/148 Figure 36: Impact of more strict WordNet synset matching on the macro average BLANC for recall (R), precision (P) and F1 for NewsReader output first synsets and lemmas in addition with 30% overlap (F-SL=34.79%). Differences are however very small. We used the latter settings in the further experiments described below. Next, we varied the participant constraints and their roles, whether or not hypernyms and lowest-common-subsumers can be used to match actions and the temporal constraints. We used the following combinations of properties, where we kept the settings for WordNet synsets and lemmas to match for 30% proportionally (which gave the best f-measure so far): AR-H-L-M AR= a single participant match and role is not considered, H=Hypernyms, L=Lowest-common-subsumer, M = Month AR—M AR= a single participant match and role is not considered, Hypernyms and Lowest-common-subsumer are not considered, HM = Month AR–L-M AR= a single participant match and role is not considered, Hypernyms are not considered, L=Lowest-common-subsumer, M = Month AR-H–M AR= a single participant match and role is not considered, H=Hypernyms, Lowest-common-subsumer are not considered, M = Month AR-H-L- AR= a single participant match and role is not considered, H=Hypernyms, L=Lowest-common-subsumer, time is not considered on average. It makes little difference to have 1% or 100% overlap. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 65/148 AR-H-L-Y AR= a single participant match and role is not considered, H=Hypernyms, L=Lowest-common-subsumer, Y = Year AR-H-L-D AR= a single participant match and role is not considered, H=Hypernyms, L=Lowest-common-subsumer, D = Day -H-L-M participants are not considered, H=Hypernyms, L=Lowest-common-subsumer, M = Month A0-H-L-M A0= a single participant match and role should be A0, H=Hypernyms, L=Lowestcommon-subsumer, M = Month A1-H-L-M A1= a single participant match and role should be A1, H=Hypernyms, L=Lowestcommon-subsumer, M = Month A2-H-L-M A2= a single participant match and role should be A2, H=Hypernyms, L=Lowestcommon-subsumer, M = Month A0A1-H-L-M A0A1= a participant and role match for A0 and A1, H=Hypernyms, L=Lowest-common-subsumer, M = Month A0A2-H-L-M A0A2= a participant and role match for A0 and A2, H=Hypernyms, L=Lowest-common-subsumer, M = Month A1A2-H-L-M A1A2= a participant abd role match for A1 and A2, H=Hypernyms, L=Lowest-common-subsumer, M = Month A0A1A2-H-L-M A0A1A2= a participant and role match for A0, A1 and A2, H=Hypernyms, L=Lowest-common-subsumer, M = Month In Table 13, we first show the impact of using hypernyms and lowest-common-subsumbers for matching event actions. We kept the participant constraint stable to require a single participant to match regardless of the role. The time is first set to month matching. In the second part of the table, we maintained the participant constraint and also fixed the use of hypernyms and lowest-common-subsumers. In this case, we varied the time constraint to no time constraint, year, month and day. In all cases, we use WordNet hypernyms and lemmas for event action matching with a proportion of 30% (SL30). We first of all observe that the differences are small. The differences in recall and f-measure are not significant. Most notably, adding the day as time-constraint give the highest precision: 45.12% but als the lowest recall and f-measure. In Table 13, we kept the standard setting for hypernyms and lowest-common-subsumer with the time constraint to month matching but now varied the specification of the roles for which there should be a participant match. In principle, we can test roles from PropBank, FrameNet and ESO. However since the PropBank roles are more general and are always given, we restricted the testing to the most important PropBank roles A0, A1 and A2. We also tested combination of roles: A0A1, A0A2 and A1A2. The first column (NR) represents no role restriction. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 66/148 Table 13: BLANC refererence results macro averaged over ECB+ topics in terms of recall (R-SL30), precision (P-SL30) and F1 (F-SL30). AR is stable across the results, meaning that a single participant in any role needs to match. We varied the hypernyms (H) and lowest-common-subsumer (L) for action matches and the time constraints: no time constraint (NT), year (Y), month (M) and day (D) R-SL30 P-SL30 F-SL30 AR—M 35.77% 40.78% 34.68% AR–L-M 35.79% 41.00% 34.74% AR-H–M 35.72% 40.98% 34.71% AR-H-L-NT 35.78% 41.26% 34.84% AR-H-L-Y 35.74% 41.25% 34.80% AR-H-L-M 35.74% 41.25% 34.79% AR-H-L-D 30.40% 45.12% 29.35% Again in all cases, we use WordNet hypernyms and lemmas for event action matching with a proportion of 30% (SL30). Table 14: BLANC refererence results macro averaged over ECB+ topics in terms of recall (R-SL30), precision (P-SL30) and F1 (F-SL30). The hypernyms (H), lowest-commonsubsumer (L) and time constraint month (M) are kept stable. We varied the roleparticipant constraints: NR=no constraint, A0 role participant should match, A1 should match, A2 should match, A0 and A1 should match, A0 and A2 should match, A1 and A2 should match R-SL30 P-SL30 F-SL30 NR-H-L-M 41.54% 36.21% 37.58% A0-H-L-M 30.58% 42.57% 29.20% A1-H-L-M 31.53% 46.33% 30.76% A2-H-L-M 28.48% 43.62% 26.06% A0A1-H-L-M 28.66% 48.74% 26.60% A0A2-H-L-M 27.48% 46.09% 24.56% A1A2-H-L-M 27.48% 46.09% 24.56% Using no constraints on the participant and their role give the highest recall (41.54%) and f-measure (37.58%) so far. The results are even higher than for a single participant in any role (recall 35.74% and f-measure 34.79%), although the precision is lower. We can observe that adding more specific role constraints lowers the recall and increases the precision, with both prime participants required (A0A1) giving highest precision so far: 48.75%. Note that such a constraint can only be applied to semantic role structures where both participants have been detected in the sentence. There are many cases where either the A0 or A1 is not expressed or not recovered. Concluding In general, we can conclude that ECB+ is probably not rich enough to see any impact of constraints at a very specific level. Since the referential ambiguity is restricted to two seminal event, differentiating between them can be done using more global features such as any participant or just the year rather than the precise role and day. We expect that in realistic news data sets with thousands of sources reporting on the similar events, the details are needed to make a more fine-grained comparison. Nevertheless, we remain dependent on the quality of the NLP software to detect all these details correctly and sufficiently. We can also see that adding constraints increases the precision but that the low recall remains a problem. Finally, it is important to realise that over 95% of all event mentions are not coreferential in ECB+. Detecting coreference relations, even in an artificial data set such as ECB+, is a very delicate task. Since BLANC averages the results of noreference (singleton event mentions without a coreference link to any other NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 67/148 event mention) and coreference relations, any drastic approach on establish coreference relations will be penalised by the noreference results. 3.2.2 Maximizing the event detection To improve the performance of the event detection in terms of both precision and recall, we developed a Conditional Random Fields (CRF) classifier, Event Detection system, inspired by the TIPSem system (Llorens et al., 2010), a state-of-the-art system from the SemEval2010 TempEval task (Verhagen et al., 2010). We implemented the classifier using the SemEval 2013 - TempEval 3 data (UzZaman et al., 2013b), on which it performs with F1 scores of 82.6% and 85.9% using gold and silver training data respectively. We used this re-implemention to either confirm or disqualify predicates that were detected by the NewsReader SRL. The classifier also adds new events to the SRL output not detected by NewsReader. Note that for the latter, we only obtain the predicates and not the roles. When creating the intra-document coreference sets, we only consider predicates from the SRL that were not disqualified (status=”false”). We applied two versions of the Event Detection system to the NAF files to augment the SRL layer. One was trained with the gold data (EDg(old)) and one with the silver data (EDs(ilver)). In addition, we restricted the augmentation to those predicates that do not have an event class from VerbNet, FrameNet or ESO. We assumed that correct events are likely to have some typing from these resources through the PredicateMatrix, whereas wrong events are expected to have no typing. The Event detection systems that skip events with event classes are called EDg(old)EC and EDs(ilver)EC respectively. Finally, we report on event mention detection if we would consider only those tokens that we annotated as events in the key data (NWR-key in the last column). In this case, we can assume maximum precision of the predicates detected in relation to the real recall. In Table 15, the first column (NWR) shows the results for event mention detection using the NewsReader system. The other settings are ili30, hypers, lcs, month and anyrole Table 15: Macro averaged Mention identification for ECB+ topics. NWR=NewsReader pipeline v3.0 without adaptation, EDg(old)=NWR augmented with EventDetection trained with gold data, EDg(old)EC= same as EDg(old) but skipping predicates with an Event class, EDs(ilver)= NWR augmented with EventDetection trained with silver data, EDs(ilver)EC= same as EDs but skipping predicates with an Event class. NWR NWR-key EDg EDgEC EDs EDsEC recall 72.66% 72.66% 61.59% 72.64% 53.34% 72.43% precision 66.29% 99.91% 88.23% 76.25% 92.10% 76.37% f1 69.01% 83.80% 72.15% 74.01% 67.12% 73.94% First of all, we see in Table 15 that the Event Detection variants have very high precision (EDg(old) 88.23% and EDs(ilver) 92.10%) compared to NewsReader. Both variants score only less than 9 points lower than the NWR-key with a maximum precision of 99.91%. In NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 68/148 terms of recall, however, both variants score considerably lower: 61.59% and 34% respectively. The F1 scores are consequently not very different from NewsReader and significantly lower than for the SemEval-2013 task. The latter is not so surprising since it trained on a data set from the same task and thus annotated in a similar way. We see that skipping disqualifying predicates with Event classes, almost fully recovers the loss in recall while precision drops less with about 15 points. Maintaining predicates with event classes thus provides the highest F-measures (72.15% and 73.94% respectively), only 10 points lower when considering the key event tokens (83.80%). At the end of this section, we provide tables with the most frequent missed and invented predicates according to EDs(ilver)EC and also the full list of hapaxes, i.e. events only missed and invented once. The most frequently missed events (Table 16) are missed mostly due to their part-of-speech (nouns, adjectives, prepositions and names): dead, earthquake, Watergate, magnitude, guilty, according. Some tokenization errors are also frequent: Shooting, Checks were not down-cased and 1 is in all 22 cases actually 6.1 which was split into separate tokens and annotated as an event indicating the magnitude of the earthquake by the annotators. Especially Table 17 listing predicates missed once as an event makes clear that downcasing and lemmatising may solve many cases. Dealing with parts-of-speech other than verbs more extensively and more proper tokenization will solve the majority of the missed events. Tables 18 and 19 show the predicates for the invented events occurring more than once and only once respectively. In this case, the solution is less clear. Some of the more frequent predicates such as murder, patch, driving all seem correct events but have not been annotated for some reason. Others, such as mother, official, home, police clearly are unlikely to be events regardless off the context. Finally, store, camp, administration are ambiguous. Inspecting cases such as mother and police show that they appear to have eventual readings that were falsely assigned to these mentions. A better filtering of nominal predicates that do not have any eventual meaning (the second group) seems beneficial. The detection of event mentions defines a natural upper bound for the event detection. In Table 20, we give the reference results when maximizing the event detection compared to the standard NewsReader output and using different settings to maximize the recall, the precision and the f-measure. Results are reported in all different measures: MUC, BCUB, CEAFe and BLANC, while the F measure of the first 3 scores are averaged as CoNLL F. The table is divided into 3 sets of rows with different settings for running the system: ARM at least one participant should match regardless of the role (any role), time anchor match at granularity month and action match with phrases and concepts should overlap with 30%. mR maximizes recall: no matching of participants and time is required and action matches with phrases and concepts is set to 1%. mP maximizes precision: two participants should match with PropBank roles A0 and A1, time anchors should match with granularity of the day and actions should match 100% in terms of concepts and phrases associated. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 69/148 Table 16: Predicates missed more than once by NewsReader extended with EventDetection (silver) and Event class filter as events in ECB+ dead earthquake according Watergate fire magnitude guilty injured 1 Oscars quake security deal DUI party heart shooting playoff Shooting degree it murder playoffs pregnant record death first job sexual Checks scoring arson emergency drunken game piracy riots career business downtime drunk market outage problem role Run tsunami WWDC According Charged 42 37 35 30 28 27 23 23 22 21 20 18 17 17 17 16 16 14 14 13 13 13 13 13 13 12 12 12 12 11 11 10 10 9 9 9 9 8 7 7 7 7 7 7 7 7 7 7 6 6 double first-degree IR merger natural problems rehab went win acquisition be design Injured It Killed list Macworld MVC operation Oscar polygamy second-degree sequel stint suicide traffic Trial trial Valley accident basis blaze campaign Convicted custody damage Dead earthquakes fatally Fire going Murder news operations Playoff prison refresh rounds Sequel swimming NewsReader: ICT-316404 6 6 6 6 6 6 6 6 6 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 temblor top treatment triggered which ”spiritual Acquisition affected air arraignment bank Charges Clinch Conference congestive crash Cut cut damaging data dies DOUBLE Earthquake event go gone Guilty health hit Industry issues Magnitude manslaughter matters Missouri mortar move new NFL Nominee one Pregnant privacy quarterfinals Rehab safe Science Seacom season senior 4 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 SET striking technology that touchdown TRIAL Win worth 7.5 7.6 abortion Accident AFC announcement ARRESTED Arson attacks Attorney attorney Availability availability basket battery Beat bigamy Bombing Bombs bombs Burns Business California changes chase clashes clinch coma communications Consulting consulting Convicts crimes critical crossfire cuts Damaged deaths definitive democracy Direct done 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 door DWI Endangered energy engineering Fall fatal Financial financial Found Francisco furious games GUILTY Heist heist hijack history Hit Host Hurt incident Indonesia injures injury interception Internet journalism Killing landslides lawless lawyer leaner learning Leaves MAKE more musicals offer Overturn panic Placed policy Polygamy position pricing program public purchase quakes 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Raid reasons rehires reigning remote repairs Reports reports rescue Returns Riot rioting San second Services sex Sexual sexually Shot smash specializes speculation Stolen strategy suspicion takeover telecommunications tensions unpatched unrest Unveils Victory vigil violence violent what February 1, 2016 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Event Narrative Module, version 3 70/148 Table 17: Predicates missed once by NewsReader extended with EventDetection (silver) and Event class filter as events in ECB+ Riot ’This ’murder” #Oscars ’Knows ’bombs 2004 ACCIDENT ACCUSED ARRESTS Accuses Acquire Acquiring Affects Aftershocks Alzheimer Anger Arrest Arrested Attacks Awards BEAT BLAZE Begins Bombed Bumped CHARGES CLINCH Call Caught Celebrates Cloud Congrats Consultations Continue Cooling County Cruises DEAD DIRECT DNA Deal Death Defensive Dies Drunk ESCAPE Emergency Engineers Enraged Escape Expand Expo FALLOUT FIRE FIRED Fatal Feigned Fired Flagship Follow Football Game Goes Handcuffs Hire I.R Insanity Instated Interdicts Investigation Investment Jeopardy Jolts Lake Launches Lead Levels MIA MURDER MacWorld Make Marriage Musical NEGOTIATIONS Named Negotiation Negotiations New News Nominates Nomination OFFER Operation Oscars PLAYOFF PLAYOFFS POLYGAMY Pending Picked Plus Powder Pre Preorders Protection Protections Protests Pulls Quake REACH REMAINS Rampage Reasons Record Recorded Releases Rescue Restored Riots Rumors Rushes SAYSSEPTEMBER SEQUEL STATEMENT Sacked Semis Server Shelter Shoots Six Spree Strategy Strike Strikes Suspicious TOP Takes Testimony That Undisclosed Vigil Voting Vying WWDC12 War Winter Wounded ablaze access accidents achievement activities additional adultery affair aftershocks aid alert anger announcements any are armed arrest arrival artery assault attack attempted attractiveness available balloting banking barefoot basketball battleground behind betrayal bid bids NewsReader: ICT-316404 blow bout brak breadth break built-in bust cable-news casting cause ceasefire celebration chance chaotic cheaper checking chip chip-making circumstances code cold communication compared complete computer computing conference connectivity coronary count counts coverup coveted crazy credit credits culture dangerous daunting debut delighted die die’ disappointing disaster disguised dismissal disrupts do domestic double-team down drama drinkdrinking drive driving due effectiveness efficient eighth elections emotion equal era escalate escort experiences extinction failures fall famed feedback felony fighting finale fir fixes flagship flurry foils footing free frustrating fuels fun game-winning gaminggeothermal gettingready good graphics great green guide gunplay hacked hacking harbors has have health-care healthcare help herself home homicide homicides hostage hundreds hungry hurt impending implications incarnation indecency infected inferno ink insanity integration intoxicated investigation is isolation jail jumper keyboard keynote large largest latest lay-offs lead leading leaves lies life little-used living loan long longer lose lunacy made making manhunt mark markets matter measuring media-oriented menace mental microservers misdemeanor mixed modeling money murders musicals” needs negotiation negotiations neutrality next nice nine nominee normal office official opportunity outperform outraged overdue overturns payoffs pending percent pick-six pirate place playoffs playing pneumonia polygamous postpower practice pre prediction preorder presidency press profile protections published raids rare re-arrest reason recession recorded repair repercussions reserve resolve return returned returns review riot ripening robbery running sacks saga scenario scheduled school scientific screen secret sector seismic semifinals series service share shelter shocking shot show show’s significant situation sixth slain slow snafus snowballs sobriety sobs spate spending spiritual spiritual’ star stardom status stillness stopgap straight-talking stretch subject supported supremacy surgery-enhanced swoop system systems tackled talks task tech telecom telecoms teleconference telephone temblors terrestrial thwarts time time-travelling tour trade tradition transaction tremor turned underage underway underwent undisclosed update upgrade upping use vacant verdict vetting vigils war way weather weed wellness wet wheel whereabouts willing winter woes word worst years February 1, 2016 Event Narrative Module, version 3 71/148 Table 18: Predicates invented and occurring more than once by NewsReader extended with EventDetection (silver) and Event class filter as events in ECB+ mother murder patch store camp administration driving official home police star had pirates running season including run have host update left officials services assault attack head player team time building say source suspicion ’s coast director following life sequel show sources support users 28 26 23 20 19 17 16 16 15 14 14 13 13 13 13 12 12 11 11 11 10 10 10 9 9 9 9 9 9 8 8 8 8 7 7 7 7 7 7 7 7 7 7 branch causes center climber figure made portfolio receiver said seed sentences shot tackle age cables came coach death details employee estimated fix forces has part place Promises route shooting spot workers aged agent became belts berth bombs choice come contract denied deputy disclosed NewsReader: ICT-316404 6 6 6 6 6 6 6 6 6 6 6 6 6 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 4 4 4 4 4 4 4 4 4 4 4 4 end endangered failure group hit knee maker management manager operator products report reports robbers seems strike times tournament Catching affect aid appears authorities believed border boss Breaking candidate case caused centers committed connection convicted crimes date defenders execution factory film get got groups 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 help history image judge law leading lineup looks manufacturer name pass planned point processors refused reporter residents rule seeded service statement suspect term Voters warship weapons players presumed processor projects quake questions received record referee registration rest resulting reveal rival routes rules saw 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 sentence says section seen standings start statements steps stuck style suffered surrounding suspected suspects take target teams telecast terms territory toll trial trustee type unit used want watches waters wave where wholesaler worker parked partners parts pick pin ACCUSED actor agreement analyst assaulting 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 attackers attempt back backed began boats brazen carried cause chance charges couple crown deal display Employees employees evidence face failed feed Fix found generation government held helped hired homes house information leaving link marriages meaning measuring minister model nickname nominee number offender offerings opening operators 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 February 1, 2016 Event Narrative Module, version 3 72/148 Table 19: Predicates invented and occurring only once by NewsReader extended with EventDetection (silver) and Event class filter as events in ECB+ ‘undeserved accepted accused address advantage advocates aim aimed alive alleged analysts answer appeared armed arrest asked assurances attempted audience audiences backs band Based based battle become belt BERTH blown bomb born box breach break BREAKING bug call camps Case centred ceremony challenger change charge climbers close Coach coaching combination coming comment commented commit communications competition complications computing condition conditions conference confirmed consulting contain contention continuing control convention conversations convict Counts counts crews criminals critics cure daring data decision declined declining defender defending defense deficiency delays denying devastated did didn’t disclose disputed documents dodging duty earnings edge efforts employed employers ending entrance eWEEK excess exercise expected exploit extended fact fading father favorite feat feeding feuds finds fire flag followed followers Following NewsReader: ICT-316404 force gave given going grab growing guard guards guilt hand happened having he hold hole holiday husband impact indicated inmate input intended investigation issued killer kittens last launched leave leg let level live located look loss lost loved making managed managing manufacturers margin mark matter mean meant meetings memory merged mermaid moving murderer named names need network numbers occurred offense offer Officials onslaught operations opinion opposition Order order ordered organizers owner page park parking participants Parts passes past patrolled perform photographs pioneer pipeline plagued planning play portables portions poses present price priority prisoner prisoners producers product proposed prosecution prospect protected protection protesters provider providers put quarter range ranging reach reaction recommendation refugee related relations remain remained remaining repaired representative required researcher reserve resident resigning result Results results return revamped revenue review revolution revolving reward rewards risks rivals Robbers Ruling Running salesman saying scoring searching secret security see semifinal serve serves set setting shock shots sign signals signed skipping slide sorts sounds spasm specify speculate spoke spree stabbing stages stand standard standards stars started steering stock stockholders stop stores students success supplier survivors Take tapes tell tenant test tests thieves threatened throw tip total touched tourist touting transcripts transition treatment tribute tried try trying turned types user value valued vessel vessels view viewers violence visiting voters Wanted warehouse warming warning wasn’t way ways weapon weighing when whips winner wiretapped worry February 1, 2016 Event Narrative Module, version 3 73/148 Within each set of rows, the first row (kNWR) shows the results when we only use the key event annotations as markables for the system response. That means that we maximize the precision but not the recall. We thus expect higher results due to more precision. We can compare the real results against these maximized results, where NWR represents the NewsReader results without any adaptation. In case of EDg(old) and EDs(ilver), events are detected by CRF Event Detection system trained with gold data and silver data respectively. In case of EDg(old)EC and EDs(ilver)EC, the system only disqualifies predicates as events that do not have an event ontology type. Each of these are then combined with the above settings ARM, mR and mP. Best scores of the true systems per metric are in bold. Table 20: Reference results macro averaged over ECB+ topics with different options for event detection. kNWR=NewsReader event detection without invented mentions, maximizing precision, NWR=NewsReader pipeline v3.0 without adaptation, EDg=NWR augmented with EventDetection trained with gold data, EDgEC= same as EDg but skipping predicates with an Event class, EDs= NWR augmented with EventDetection trained with silver data, EDsEC= same as EDs but skipping predicates with an Event class. ARM= standard setting one participant in any role (AR), time month match and action concept and phrase match 30%, mR= maximizing recall by no constraints on participant match and time, action concept and phrase match 1%, mP=maximizing precision by participant roles A0A1, time day match and action concept and phrase match is set to 100%. kNWR-ARM NWR-ARM EDg-ARM EDgEC-ARM EDs-ARM EDsEC-ARM kNWR-mR NWR-mR EDg-mR EDgEC-mR EDs-mR EDsEC-mR kNWR-mP NWR-mP EDg-mP EDgEC-mP EDs-mP EDsEC-mP R 33.55 33.55 23.54 33.10 23.26 33.03 53.77 53.77 38.94 54.31 38.16 53.98 11.09 11.09 8.15 11.35 7.96 11.15 MUC P 74.63 53.78 69.78 59.78 70.54 59.91 68.75 48.59 63.47 54.60 64.15 54.58 70.31 54.42 62.77 57.45 63.02 57.97 F 45.54 40.64 34.45 41.92 34.12 41.91 59.73 50.54 47.58 53.84 47.10 53.65 18.77 18.02 14.12 18.54 13.82 18.31 R 41.10 41.10 28.49 39.98 28.63 40.15 52.05 52.05 36.26 51.83 36.01 51.75 31.59 31.59 22.34 30.99 22.45 31.09 BCUB P 85.21 55.68 78.50 63.94 79.44 64.10 62.70 39.68 58.55 45.27 59.31 45.35 95.93 63.18 87.09 72.52 88.11 72.65 F 54.83 46.73 41.25 48.42 41.55 48.61 56.14 44.55 44.11 47.62 44.18 47.62 46.97 41.44 35.19 42.75 35.40 42.85 R 60.59 60.23 43.61 58.14 43.94 58.38 45.62 44.41 33.53 42.71 34.01 42.90 62.71 62.54 44.47 60.60 44.96 60.85 CEAFe P 52.09 32.73 43.69 36.64 45.56 37.02 69.41 40.27 57.25 47.29 60.02 47.66 39.20 25.31 34.29 28.60 35.58 28.79 F 55.12 41.82 42.78 44.33 43.87 44.66 54.26 41.51 41.57 44.19 42.70 44.44 47.47 35.52 37.83 38.31 38.91 38.53 R 34.75 34.75 19.90 34.24 19.26 34.14 41.44 41.44 25.24 41.95 24.28 41.58 27.73 27.73 15.58 27.72 15.00 27.56 BLANC P 77.69 42.64 68.97 51.31 70.22 51.85 67.59 34.96 59.73 43.22 60.62 43.23 81.31 47.35 69.13 55.17 70.39 55.56 F 45.40 34.05 28.96 37.21 28.40 37.27 49.30 36.89 33.77 41.20 33.05 40.98 35.64 25.03 22.42 28.58 21.95 28.53 CoNLL F 51.83 43.06 39.49 44.89 39.85 45.06 56.71 45.53 44.42 48.55 44.66 48.57 37.73 31.66 29.04 33.20 29.38 33.23 We first discuss the ARM output which is supposed to give the most balanced results for precision and recall and thus the highest F measure. EDs(ilver) and EDs(ilver)EC in most cases have the highest score. The EDg(old) and EDg(old)EC results are slightly lower. This is in line with the difference in event mention detection observed earlier. The best CoNLL F score of 45.06% is obtained by EDs(ilver)EC, which is 5 points less than the kNWR version, with maximum precision. Overall the best scores are about 5 points lower than the maximum precision scores of kNWR. BLANC F scores are lower than CoNLL. This is mainly due to the low recall. The best precision of BLANC is 70.22% (EDs(ilver)), while the kNWR precision is 77.69%. These scores are high and comparable to the stateof-the-art. This shows that most improvement can be expected from improving the recall and especially the recall in the event detection. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 74/148 If we look at the results for maximizing recall (mR), we see that in most cases the recall is higher and the precision is lower, while when maximizing the precision (mP), the precision is significantly higher and the recall is lower. The exception is CEAFe where recall and precision are exactly the reversed. Ignoring the CEAFe results, we see that the maximum recall is 54.31% MUC for EDg(old)EC-mR and the maximum precision is 88.11% BCUB for EDs(ilver). As for ARM, most results are about 5 points below the key results kNWR. Highest F measures are obtained with maximized recall for EDs(ilver)EC (40.98% F-BLANC and 48.57% F-CoNLL). In the state-of-the-art literature, cross-document coreference is not only tested across documents within the same topic but also across the whole data set. To compare our results with the state-of-the-art, we abandoned the topic structure and ran the NAF2SEM program on all the 982 ECB+ files processed by NewsReader. This results in a single RDF file after comparing all events in the data set with each other. We compare our results with Yang et al. (2015), who report best results on ECB+ and compare their results to other systems that have so far only been tested on ECB and not on ECB+. Yang et al use a distance-dependent Chinese Restaurant Process (DDCRP (Blei and Frazier, 2011)), which is an infinite clustering model that can account for data dependencies. They define a hierarchical variant (HDDCRP) in which they first cluster event mentions and data within a document and next cluster the within document clusters across documents. Their hierarchical strategy is similar to our CompositeEvent approach, in the sense that event data can be scattered over multiple sentences in a document and needs to be gathered first. Our approach differs in that we use a semantic representation to capture all event properties and do a logical comparison, while Yang et al and all the other methods they report on are based on machine learning methods (both unsupervised clustering and supervised mention based comparison). Yang et al test their system on topics 23-43 while they used topics 1-20 as training data and topics 21-23 as development set. They do not report on topics 44 and 45. To compare our results with theirs, we also used topics 23-43 for testing. Since our system is fully unsupervised for the task itself, the training and development sets are irrelevant. In Table 21, we give the NewsReader results using the ARM settings that give the best F-measure. Table 22 is an exact copy of the results as reported by Yang et al. (2015). They follow a machine-learning approach to event-coreference, exploiting both clustering techniques and supervised techniques with rich feature sets. They also implemented a number of other state-of-the-art systems that use variations on clustering or supervised learning and applied them to the same data set within ECB+: LEMMA a heuristic method that groups all event mentions, either within or across documents, which have the same lemmatized head word. AGGLOMERATIVE a supervised clustering method for within-document event coreference following Chen and Ji (2009b). HDP-LEX an unsupervised Bayesian clustering model for within- and cross-document event coreference (Bejan and Harabagiu, 2010a). It is a hierarchical Dirichlet process NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 75/148 (HDP) model with the likelihood of all the lemmatized words observed in the event mentions. DDCRP a Distance-dependent Chinese Restaurant Process model that ignores document boundaries. HDDCRP* a variant of the proposed HDDCRP that only incorporates the withindocument dependencies but not the cross-document dependencies. HDDCRP their preferred HDDCRP system that also uses cross-document dependencies. Table 21: Reference results macro averaged over ECB+ corpus with different options for event detection. kNWR=NewsReader event detection without invented mentions, maximizing precision, NWR=NewsReader pipeline v3.0 without adaptation, EDg=NWR augmented with EventDetection trained with gold data, EDgEC= same as EDg but skipping predicates with an Event class, EDs= NWR augmented with EventDetection trained with silver data, EDsEC= same as EDs but skipping predicates with an Event class. ARM= standard setting one participant in any role (AR), time month match and action concept and phrase match 30%, mR= maximizing recall by no constraints on participant match and time, action concept and phrase match 1%, mP=maximizing precision by participant roles A0A1, time day match and action concept and phrase match is set to 100%. kNWR-ARM NWR-ARM EDg EDgEC EDs EDsEC R 33.21% 33.21% 23.56% 32.73% 24.73% 32.63% MUC P 73.38% 50.97% 69.09% 57.49% 69.35% 57.52% F 45.73% 40.22% 35.14% 41.71% 36.46% 41.64% R 40.05% 40.05% 27.05% 39.00% 28.49% 39.04% BCUB P 83.49% 53.28% 76.77% 61.98% 77.43% 62.00% F 54.13% 45.73% 40.00% 47.88% 41.66% 47.91% R 55.12% 54.76% 37.78% 52.74% 39.34% 52.54% CEAFe P 53.92% 33.30% 43.71% 37.99% 46.96% 38.48% F 54.52% 41.42% 40.53% 44.17% 42.81% 44.43% CoNLL F 51.46% 42.46% 38.56% 44.59% 40.31% 44.66% Table 22: Reference results macro averaged over ECB+ corpus as reported by Yang et al. (2015) for state-of-the-art machine learning systems MUC LEMMA HDP-LEX AGGLOMERATIVE DDCRP HDDCRP* HDDCRP R 55.40% 63.50% 59.20% 58.20% 66.40% 67.10% P 75.10% 75.50% 78.30% 79.60% 77.50% 80.30% F 63.80% 69% 67.40% 67.10% 71.50% 73.10% R 39.60% 43.70% 40.20% 39.60% 48.10% 40.60% BCUB P 71.70% 65.60% 73.20% 78.10% 69% 73.10% F 51% 52.50% 51.90% 52.60% 56.70% 53.50% R 61.10% 60.20% 65.60% 69.40% 63% 68.90% CEAFe P 36.20% 34.80% 30.20% 31.80% 38.20% 38.60% F 45.50% 44.10% 41.40% 43.60% 47.60% 49.50% CoNLL F 53.40% 55.20% 53.60% 54.40% 58.60% 58.70% In line with Yang et al, we averaged the different F-measures to obtain a CoNLL-F value. BLANC results are not reported by Yang et al. We can see that the best NewsReader CoNLL-F scores about 14 points lower (44.66% against 58.70%), while the NewsReader key version (kNWR) scores about 5 points lower. Looking more precisely at the recall and precision scores, we can see that NewsReader scores significantly lower in recall but often equal and in some cases higher in precision. In case of EDs(ilver), NewsReader scores 77.43% for BCUB precision, while HDDCRP scores 73.1% and the best precision is 78.1% by DDCRP. In the case of CEAFe, EDs(ilver) even has a precision of 46.96% and the NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 76/148 best score by Yang et al is 38.6% for HDDCRP. Yang et al noticed that event detection of a standard SRL system performs low (56% recall) and therefore trained a separate CRF event detection system for event detection using the ECB+ training documents.15 Their CRF classifier obtains 93.5%, 89.0%, 95.0%, and 72.8% recall of the annotated event mentions, participants, time and locations on the test data set. The NewsReader system scores at most 72.66% recall on event detection. This shows that there is a big potential for NewsReader to improve with respect to the above results, when specifically trained on detecting events. Increasing the recall of events by 20 points will have a big impact on the recall for event coreference as well. On the other hand, we can assume that the NewsReader results are more realistic as an indication of the performance on any data set since it has not been trained on a specific data set. The results reported by Yang et al are likely to drop when moving from ECB+ to another data set unless separate training data is provided. For comparison, our own CRF Even Detection system trained on SemEval 2013 TempEval 3 data performed with F-scores above 80% but performed much lower (more than 10 points) when applied to ECB+. Given the nature of the ECB+ data set, it makes sense to consider the actual distribution of predicates over documents, topics and the complete data set to measure the complexity of within-topic and across-topic comparison. The extend that predicates occur across topics can be seen as an indication for the referential ambiguity since the main referential events are separated by topics with a systematic ambiguity to two seminal events within a single topic. We thus expect that contextual events tend to occur only within a single topic, while for example source events and grammatical events are not restricted to a specific seminal event. Table 23 shows the division of mentions of three predicates across documents and across topics. We first give the distribution on the basis of the full news articles and next the distribution in the annotated part of the article, where on average 1.8 sentence has been annotated per article. Table 23: Distribution of tell, kill and election over all text and annotated text per mention, document and topic in ECB+ tell kill election mentions 397 420 77 documents 235 207 29 Full text ment/doc 1.69 2.03 2.66 topics 39 22 6 ment/top 10.18 19.09 12.83 mentions 23 141 17 Annotated text documents ment/doc 21 1.10 129 1.09 10 1.70 topics 8 14 1 ment/top 2.88 10.07 17.00 We can see in Table 23 that a source event such as tell occurs even less than a contextual event such as kill when considering the full article. Nevertheless, tell occurs in more documents and more topics than kill. We can see that a specific event such as election has a low frequency, the highest average document frequency and the lowest document frequency. These full text distributions confirm that a source event has a high dispersion compared to contextuals. However when we consider the annotated text, we see that there are hardly any mentions of tell left (5.7%) in comparison to kill (33.6%) and election (22.1%). Average mentions per document are lower and more equal for all three predicates 15 In fact they also noted this for the detection of participants: 76%, timex-expressions: 65% of times and locations: 13% NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 77/148 since there is on average 1.8 sentence per document that is annotated. However, the average mentions per topic ratio is much higher for election and dropped drastically for tell and substantially for kill. Given the distribution of the annotation in ECB+, we thus can see that source events are marginally annotated and this plays a minor role while some contextuals such as kill have a high dispersion across topics but others such as election do not. The seminal nature of the topics and the little overlap in events across topics, supports the fact that the results for within-topic and across-topic are relatively close. It also suggests that in real-life contexts when dealing with large volumes of news, it makes sense to apply some form of prior topic clustering to avoid excessive ambiguity for especially source events (and grammatical events) that are found in many texts regardless of the context of the event. 3.2.3 Conclusion NewsReader cross-document event coreference We have seen that NewsReader event coreference can be tuned towards either high recall (up to 62% at topic level and 41% at corpus level) or high precision (up to 81% at topic level and 72% at corpus level). We have seen that recall is still limited and that this is mainly due to the event detection. This is hopeful because it shows the validity of the approach and it is relatively easy to improve in comparison to the more complex event coreference process. We thus expect that further improving the event detection will also directly boost the quality of the event coreference. It is important to note that our state-of-the-art results are obtained using generic technology using logical comparison and processing without any domain adaptation and without any machine-learning on the specific data set that has been used for testing. This is an important feature of the system, since machine learning based systems tend to have lower results when applied to different data sets than trained on. We compared the performance of NewsReader against the latest state-of-the-art system by Yang et al. (2015). Although NewsReader performs lower for CoNLL-F and recall than the reported systems, it tends to have higher precision scores. The state-of-theart systems implemented by Yang et al benefit from training on ECB+ data, whereas NewsReader is not adapted to the annotations and the data set. We can thus expect that the NewsReader performance is more representative for other data sets than ECB+, whereas the methods reported by Yang et al will be expected to perform much lower on other data sets. Furthermore, Yang et al boosted the event mention detection from 56% to 95% (as well as the participants, locations and time detection) by training a separate classifier on ECB+, whereas the recall of the event detection of NewsReader is 72%. What events are annotated and what events are not is often dependent on the style of annotation and thus differs from data set to data set. We can assume that boosting the NewsReader event detection on a specific data set by training on annotated events will also lead to a significant boost in the event detection and consequently in the event coreference results. Finally, the ECB+ data set is an artificially created data set that does not represent a natural stream of news. Within natural daily news streams, there may not be two seminal events that compete for interpretation on the same day (e.g. two attacks in Paris) and there will be many more topics than the 42 topics in ECB+. The best settings for ECB+ NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 78/148 therefore may not be the best settings for dealing with daily news streams. Evaluating the best set up and best settings for event-coreference in a daily news stream is a lot of work and very complex. Within this project, we did not had the resources to carry out such an evaluation. Another problem is that there is no freely sharable data set that can be used. Such a data set needs to contain the news for a certain period (say one month) from many different sources so that we could follow the daily cumulation. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 4 79/148 Event Relations Event relation extraction, in particular temporal relation extraction, is a crucial step to anchor an event in time, to build event timelines and to reconstruct the plot of a story. In this section, we describe the task of detecting temporal relations, causal relations and predicate time anchor relations. The description of each task begins with the presentation of the annotation schema we have followed. 4.1 4.1.1 Temporal Relations Annotation Schema The annotation schema for temporal relations is based on the TimeML specification language (Pustejovsky et al. (2005)). In the TimeML annotation, temporal links are used to i) establish the temporal order of two events (event-event pair); ii) anchor an event to a time expression (event-timex pair); and iii) establish the temporal order of two time expressions (timex-timex pair). In TimeML, temporal links are annotated with the <TLINK> tag. The full set of temporal relations specified in TimeML version 1.2.1 (Saurı́ et al. (2006)) contains 14 types of relations, as illustrated in Table 24. Among them there are six paired relations (i.e. with one relation being the inverse of the paired one). These relations map one-to-one to 12 of Allen’s 13 basic relations.16 a |———| b |———| a |———| b |———| a |——| b |————| a |——| b |————| a |——| b |——————| a |——————| b |——| a |———| b |———| a |———| b a is before b b is after a a is ibefore b b is iafter a a begins b b is begun by a a ends b b is ended by a a is during b b is during inv a a includes b b is included in a a is simultaneous with b a is identity with b Table 24: Temporal relations in TimeML annotation According to the TimeML 1.2.1 annotation guidelines (Saurı́ et al. (2006)), the difference between during and is included (also their inverses) is that the during relation is specified when an event persists throughout a temporal duration (e.g. John drove for 5 16 Allen’s overlaps relation is not represented in TimeML. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 80/148 hours), while is included relation is specified when an event happens within a temporal expression (e.g. John arrived on Tuesday). In the NewsReader annotation guidelines (Tonelli et al. (2014)), we have simplified the set of relations by not considering the relation types during and identity which is a coreferential relation. A new relation has been added with respect to TimeML: the measure relation. It is used to connect an event and a timex of type duration which provides information on the duration of the related event. Example: The first A380 superjumbo, made pr1 by Airbus, was delivered pr2 today tmx2 to Singapore Airlines (SIA) 18 months tmx3 behind schedule. After the plane was delivered pr4 in Singapore, it was flown pr3 to Toulouse, France for the ceremony pr5 of about 500 guests. (DCT tmx1 : 2007-10-15) The NAF representation of a part of the temporal relations extracted from the sentences is as follows: <t e m p o r a l R e l a t i o n s > <!−−IS INCLUDED ( tmx1 , tmx2)−−> < t l i n k i d =” t l i n k 6 ” from=”tmx1” <!−−BEFORE( pr1 , p r 2)−−> < t l i n k i d =” t l i n k 2 2 ” from=”p r 1 ” <!−−BEFORE( pr4 , p r 3)−−> < t l i n k i d =” t l i n k 2 3 ” from=”p r 4 ” <!−−BEFORE( pr4 , p r 5)−−> < t l i n k i d =” t l i n k 3 1 ” from=”p r 4 ” <!−−IS INCLUDED ( pr2 , tmx2)−−> < t l i n k i d =” t l i n k 5 8 ” from=”p r 2 ” </ t e m p o r a l R e l a t i o n s > 4.1.2 t o=”tmx2” fromType=”t i m e x ” toType=”t i m e x ” r e l T y p e =”SIMULTANEOUS”/> t o=”p r 2 ” fromType=” e v e n t ” toType=” e v e n t ” r e l T y p e =”BEFORE”/> t o=”p r 3 ” fromType=” e v e n t ” toType=” e v e n t ” r e l T y p e =”BEFORE”/> t o=”p r 5 ” fromType=” e v e n t ” toType=” e v e n t ” r e l T y p e =”BEFORE”/> t o=”tmx2 ” fromType=” e v e n t ” toType=”t i m e x ” r e l T y p e =”IS INCLUDED”/> Temporal Relation Extraction The temporal relation extraction module extracts temporal relations holding between two events or between an event and a time expression or between two time expressions. Two methods are used to extract temporal relations: machine learning method based on SVM for classifying relations between two events or between an event and a time expression; rule based method for ordering two time expressions. Extraction of relations between two events or between an event and a timex. We consider all combinations of event/event and event/timex pairs within the same sentence (in a forward manner) as candidate temporal links. For example, if we have a sentence with entity order such as “...ev1 ...ev2 ...tmx1 ...ev3 ...”, the candidate pairs are (ev1 , ev2 ), (ev1 , tmx1 ), (ev1 , ev3 ), (ev2 , tmx1 ), (ev2 , ev3 ) and (ev3 , tmx1 ). We remove event pairs if the two events are part of the same verbal phrase. We also identify relations between verbal events and document creation times and between main events 17 of two consecutive sentences. The problem of determining the label (i.e. temporal relation type) of a given temporal link can be regarded as a classification problem. Given an ordered pair of entities (e1 , e2 ) that could be either event/event or event/timex pair, the classifier has to assign a certain label, namely one of the 14 TimeML temporal relation types. 17 Main events correspond to the ROOT element of the parsed sentence. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 81/148 A classification model is trained for each type of entity pair (event/event and event/timex), as suggested in several previous works (Mani et al. (2006); Chambers (2013)). We build our classification models (Mirza and Tonelli (2014b)) using the Support Vector Machine (SVM) implementation provided by YamCha 18 and train them with the TempEval3 training corpus UzZaman et al. (2013a). The feature vectors built for each pair of entities (e1 , e2 ) are as follows: • String and grammatical features. Tokens, lemmas, PoS tags and flat constituent (noun phrase or verbal phrase) of e1 and e2 , along with a binary feature indicating whether e1 and e2 have the same PoS tags (only for event/event pairs). • Textual context. Pair order (only for event/timex pairs, i.e. event/timex or timex/event), textual order (i.e. the appearance order of e1 and e2 in the text) and entity distance (i.e. the number of entities occurring between e1 and e2 ). • Entity attributes. Event attributes (class, tense, aspect and polarity), and timex type attribute19 of e1 and e2 as specified in TimeML annotation. Four binary features are used to represent whether e1 and e2 have the same event attributes or not (only for event/event pairs). • Dependency information. Dependency relation type existing between e1 and e2 , dependency order (i.e. governor-dependent or dependent-governor ), and binary features indicating whether e1 /e2 is the root of the sentence. • Temporal signals. We take the list of temporal signals extracted from the TimeBank 1.2 corpus into account. We found that the system performance benefit from distinguishing between event-related signals and timex-related signals, therefore we manually split the signals into two separate lists. Signals such as when, as and then are commonly used to temporally connect events, while signals such as at, for and within more likely occur with time expressions. There are also signals that are used in both cases such as before, after and until, and those kind of signals are added to both lists. Tokens of temporal signals occurring around e1 and e2 and and their positions with respect to e1 and e2 (i.e. between e1 and e2 , before e1 , or at the beginning of the sentence) are used as features. • Temporal discourse connectives. Consider the following sentences: i) “John has been taking that driving course since the accident that took place last week.” and ii) “John has been taking that driving course since he wants to drive better.” In order to label the temporal link holding between two events, it is important to know whether there are temporal connectives in the surrounding context, because they may contribute in identifying the relation type. For instance, it may be relevant to distinguish whether since is used as a temporal or a causal cue (example i) and ii) resp.). 18 http://chasen.org/~taku/software/yamcha/ The value attribute tends to decrease the classifier performance as shown in Mirza and Tonelli (2014b), and therefore, it is excluded from the feature set. 19 NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 82/148 This information about discourse connectives is acquired using the addDiscourse tool (Pitler and Nenkova (2009)), which identifies connectives and assigns them to one of four semantic classes in the framework of the Penn Discourse Treebank (The PDTB Research Group (2008)): Temporal, Expansion, Contingency and Comparison. We include as feature whether a discourse connective belonging to the Temporal class occurs in the textual context of e1 and e2 . Similar to temporal signals, we also include in the feature set the position of the discourse connective with respect to the events. The machine learning based module is available on github 20 and technical details about it can be found in the Deliverable 4.2.2 (Section 3.12). The result for relation classification (identification of the relation type given the relations) on TempEval3 test corpus (UzZaman et al., 2013a) is: 58.8% precision, 58.2% recall and 58.5% F1-measure. We compare the performance of tempRelPro to the other systems participating in the Tempeval-3 task in Table 25. According to the figures reported in UzZaman et al. (2013a), tempRelPro is the best performing system both in terms of precision and of recall. System tempRelPro UTTime-1, 4 UTTime-3, 5 UTTime-2 NavyTime-1 NavyTime-2 JU-CSE F1 58.48% 56.45% 54.70% 54.26% 46.83% 43.92% 34.77% Precision 58.80% 55.58% 53.85% 53.20% 46.59% 43.65% 35.07% Recall 58.17% 57.35% 55.58% 55.36% 47.07% 44.20% 34.48% Table 25: Tempeval-3 evaluation on temporal relation classification Our complete system (relation identification and classification) attempts to extract a document’s entire temporal graph, i.e. it extracts a high number of relations in a text. In the evaluation (see Deliverable 4.2.3) this leads to good performance in terms of recall but low in terms of precision due to the incompletness of the manually annotated corpora used as gold standard. Indeed annotating a corpus with all temporal relations between events and time expressions is a difficult and time consuming task. Consequently, in most of the available corpora only small portions of the temporal graph are annotated. For example in the NewsReader annotation guidelines five subtasks were defined to help annotators annotate the most important temporal relations, but many relations are not considered, such as relations between nominal events and document creation times. Cassidy et al. (2014) propose a new annotated corpus called TimeBank-Dense which is composed of files from the TimeBank corpus annotated with ten more temporal relations with respect to the original annotation. Currently, we are not able to evaluate our system 20 https://github.com/paramitamirza/TempCauseRelPro NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 83/148 on the TimeBank-Dense corpus because the set of relations annotated is slightly different from the one annotated by our system and because TimeBank is part of our training corpus. Relation extraction between two time expressions. The second step of the extraction of temporal relation in a document is the detection of timex/timex relations for all dates and times. This step is performed using rules depending on the normalized form of the value of time expressions. If the two time expressions are dates or times, we compare first the years, then the months, weeks, days, etc. In the case of fuzzy expressions with one of the following value PRESENT REF, PAST REF or FUTURE REF, then we use the relation between the second time expression of the pair and the Document Creation Time to order them. This step enables us to make the temporal relations between time expressions explicit. If the normalization of time expressions is correct, then the right order between them is extracted. Wrong relations are identified only if the normalization fails. Examples of timex/timex relations: Apple Computer announced today tmx1 another special event to be held on October 12 tmx2 . (DCT tmx0 : 2005-10-04) • Normalization: tmx1 : 2005-10-04; tmx2 : 2005-10-12 • Relations: tmx1 before tmx2 ; tmx0 simultaneous tmx1 ; tmx0 before tmx2 He will be repatriated to Cuba between now tmx3 and Feb. 10 tmx4 . (DCT tmx5 : 2000-01-07) • Normalization: tmx3 : PRESENT REF; tmx4 : 2000-02-10 • Relations: tmx3 before tmx4 ; tmx5 simultaneous tmx3 ; tmx5 before tmx4 4.2 4.2.1 Causal Relation Annotation Scheme The annotation scheme has been newly defined for the NewsReader project (see the NewsReader guidelines (Tonelli et al., 2014; Mirza et al., 2014)). Similar to the <TLINK> tag in TimeML for temporal relations, we introduce the <CLINK> tag to mark a causal relation between two events. Both TLINKs and CLINKs mark directional relations, i.e. they involve a source and a target event. However, while a list of relation types is part of the attributes for TLINKs (e.g. before, after, includes, etc.), for CLINKs only one relation type is foreseen, going from a source (the cause, indicated with s in the examples) to a target (the effect, indicated with t ). We also introduce the notion of causal signals through the <C-SIGNAL> tag. CSIGNALs are used to mark-up textual elements signalling the presence of causal relations, which include all causal uses of prepositions (e.g. because of, as a result of, due to), NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 84/148 conjunctions (e.g. because, since, so that), adverbial connectors (e.g. so, therefore, thus) and clause-integrated expressions (e.g. the reason why, the result is, that is why). Wolff (2007) claims that causation covers three main types of causal concepts, i.e. CAUSE, ENABLE and PREVENT. These causal concepts are lexicalized through three types of verbs listed in Wolff and Song (2003): i) CAUSE-type verbs, e.g. cause, prompt, force; ii) ENABLE-type verbs, e.g. allow, enable, help; and iii) PREVENT-type verbs, e.g. block, prevent, restrain. These categories of causation are taken into account as an attribute of CLINKs. Given two annotated events, a CLINK is annotated if there is an explicit causal construction linking them. Such construction can be expressed in one of the following ways: 1. Expressions containing affect verbs (affect, influence, determine, change, etc.), e.g. Ogun ACN crisis s influences the launch t of the All Progressive Congress. 2. Expressions containing link verbs (link, lead, depend on, etc.), e.g. An earthquake t in North America was linked to a tsunami s in Japan. 3. Basic constructions involving causative verbs of CAUSE, ENABLE and PREVENT type, e.g. The purchase s caused the creation t of the current building. 4. Periphrastic constructions involving causative verbs of CAUSE, ENABLE and PREVENT type, e.g. The blast s caused the boat to heel t violently. With “periphrastic” we mean constructions where a causative verb (caused ) takes an embedded clause or predicate as a complement expressing a particular result (heel ). 5. Expressions containing CSIGNALs, e.g. Its shipments declined t as a result of a reduction s in inventories by service centers. Example: The departure from France of the new Airbus A380 superjumbo airliner on a tour of Asia and Australia has been delayed pr4 , leading to a rearrangement pr8 of its public appearances. The NAF representation of the causal relation holding between delayed and rearrangement is as follows: <c a u s a l R e l a t i o n s > <!−−(pr4 , pr8)−−> <c l i n k i d=” c l i n k 5 ” from=”pr4 ” t o=”pr8”/> </ c a u s a l R e l a t i o n s > 4.2.2 Causal Relation Extraction We start with an assumption that causality may only occur between events in the same sentence and between events in two consecutive sentences. Therefore, every possible combination of events in the same sentence and in two consecutive sentences, in a forward manner, is considered as a candidate event pair. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 85/148 The problem of detecting causal relations (CLINKs) between events is taken as a supervised classification task. Given an ordered pair of events (e1 ,e2 ), the classifier has to decide whether there is a causal relation or not. However, since causality is a directional relation between a cause (source) and an effect (target), the classifier has to assign one of three possible labels: (i) clink (where e1 is the source and e2 is the target), (ii) clink-r (with the reverse order of source and target), and (iii) o for no relation. The classification model is built with YamCha21 (Kudo and Matsumoto (2003)), which implements Support Vector Machines (SVMs) algorithm. We employ one-vs-one strategy for multi-class classification, and use the polynomial kernel. The overall approach is inspired by an existing work for identifying causal relations between events (Mirza and Tonelli (2014a)), with some differences in the feature set. The implemented features are explained in the following sections. The module is available on github https: //github.com/paramitamirza/TempCauseRelPro. Event features We implement some morphological, syntactical and textual context features of e1 and e2 , such as: • lemma and part-of-speech (PoS) tags; • sentence distance (e.g. 0 if e1 and e2 are in the same sentence, 1 if they are in adjacent sentences); • entity distance (i.e. the number of entities occurring between e1 and e2 , which is only measured if e1 and e2 are in the same sentence); • dependency path existing between e1 and e2 ; • binary features indicating whether e1 /e2 is the root of the sentence; • event attributes of e1 /e2 , including tense, aspect and polarity; and • a binary feature indicating whether e1 and e2 co-refer.22 Causal marker features We consider three types of causal markers that can cue a causal relation between events: 1. Causal signals. We extracted a list of causal signals from the annotated C-SIGNALs in the Causal-TimeBank corpus.23 2. Causal connectives, i.e. the discourse connectives under the Contingency class according to the output of the addDiscourse tool (Pitler and Nenkova (2009)). 3. Causal verbs. The three types of verbs lexicalizing causal concepts as listed in Wolff and Song (2003): i) CAUSE-type verbs, e.g. cause, prompt, force; ii) ENABLE-type verbs, e.g. allow, enable, help; and iii) PREVENT-type verbs, e.g. block, prevent, restrain. 21 http://chasen.org/~taku/software/yamcha/ When two events co-refer, there is almost no chance that they hold a causal relation. 23 For some causal signals that can have some other tokens in between, e.g. due (mostly) to, we instead include their regular expression patterns, e.g. /due .*to/, in the list. 22 NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 86/148 We further enriched the list of causal signals and causal verbs with the Paraphrase Database (PPDB, Ganitkevitch et al. (2013)), using the initial list of signals and verbs as seeds. Based on the existence of causal markers around e1 and e2 , exactly in that priority order24 , we include as features: • causal marker string; • causal marker position, i.e. between e1 and e2 , before e1 , or at the beginning of the sentence where e1 /e2 is in; and • dependency path between the causal marker and e1 /e2 . TLINKs Mirza and Tonelli (2014a) showed that even though only 32% of the gold annotated causal links have the underlying temporal relations, the temporal relation type of an event pair (e1 , e2 ), if any, contributes in determining the direction of the causal relation (clink vs clink-r), if any. Therefore, we include the information of temporal relation types in the feature set. In building the causal relation extraction system, we use Causal-TimeBank25 (Mirza et al. (2014)) with the previously explained annotation scheme as our development dataset. Causal-TimeBank is the TimeBank corpus26 taken from the TempEval3 evaluation campaign, which is completed with causal information as well. There are 318 causal links (CLINKs), only around 6.2% of the total temporal links (TLINKs) found in the corpus, containing 183 documents in total. The developed causal relation (CLINK) relation system is then evaluated in a five-fold cross-validation setting. Table 26 shows the performance of the system, compared with the system of Mirza and Tonelli (2014a) as a baseline. Given the limited amount of data annotated with causality, the supervised systems still do not yield satisfactory results. Mirza and Tonelli (2014a) report issues with data sparseness, and suggest that other training data could be derived, for instance, from the Penn Discourse Treebank (Prasad et al. (2008)). We adopt a different approach by combining the small data set available with unlabelled dataset in a semi-supervised setting, specifically with the self-training method. We exploit the remaining of TempEval-3 corpus besides TimeBank, i.e. AQUAINT and TE3-platinum (TempEval-3 evaluation corpus), with gold events and TLINKs. The idea of using the corpus with gold standard events and TLINKs for semi-supervised learning is because event’s attributes (tense, aspect and polarity) and TLINKs are quite important as features. There are 90 additional documents in total. The self-training method is done with 9 iterations with 10 documents per iteration. Two different schemes of the self-training are explored: (1) adding all extracted CLINKs 24 We first look for causal signals. If we do not find any, then we continue looking for causal connectives. And so on. 25 http://hlt.fbk.eu/technologies/causal-timebank 26 Dataset annotated with temporal entities such as temporal expressions, events and temporal relations in TimeML annotation framework. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 87/148 as new training data (with imbalanced number of positive and negative examples) and (2) adding balanced number of positive and negative CLINKs. Self-training with the (1) scheme improves the precision but lowers the recall. Meanwhile, self-training with the (2) scheme reduces the precision but improves the recall, but increases the overal performance in terms of F1-score. System Mirza and Tonelli (2014a) CLINK extraction self-training (1) self-training (2) P 0.6729 0.6917 0.7167 0.6382 R 0.2264 0.2921 0.2730 0.3079 F1 0.3388 0.4107 0.3954 0.4154 Table 26: CLINK extraction system’s performance. 4.3 Predicate Time Anchors The amount of temporal relations extracted by the previously described Temporal Relation Extraction modules grows with the number of annotated events and temporal expressions. Some events are linked to a time expression with a relation of type simultaneous or is included, but some are only linked with relations of type after or before either to a time expression or to another event. With the main goal of structuring timelines from events in texts, we propose to use these relations and other textual information in order to build a relation “PredicateTimeAnchor” between all events that can be anchored in time and time expressions. 4.3.1 Annotation Scheme A narrative container is defined by Styler IV et al. (2014) as a temporal expression or an event explicitly mentioned in the text into which other events temporally fall. For the TimeLine shared task at SemEval 2015 (Minard et al. (2015)) we proposed the notion of temporal anchoring of an event, which is a specific type of temporal relation that links an event to the temporal expression to which it is anchored. The anchoring in time of an event can be realized in two ways: either the event is anchored in time through a time expression which can be a DATE or a DURATION, or the event is anchored to an interval through a begin point and an end point. Time expressions can be text consuming or not dependent on the fact that they are explicitly expressed in the text or that they were derived from another time expression. Examples: 27 Stock markets around the world have fallen pr1 dramatically today tmx1 . 27 In order to make examples more readable they all contain the time expression and the event in the same sentence. But the module also anchors in time events with time expressions that are in different sentences. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 88/148 PredicateTimeAnchor (pr1 ): time anchor: 2008-09-17 (tmx1 ) The U.S. dollar rose against the yen after six straight days tmx2 of losses pr2 . PredicateTimeAnchor (pr2 ): time anchor: P6D (tmx2 , begin point: 2008-10-21, end point: 2008-10-27) The Japanese economy contracted pr3 by 0.9% between April tmx3 and June tmx4 . PredicateTimeAnchor (pr3 ): begin point: 2008-04 (tmx3 ); end point: 2008-06 (tmx4 ) The Russian government has continued to hold all stock markets closed pr4 until Friday tmx5 . PredicateTimeAnchor (pr4 ): end point: 2008-09-19 (tmx5 ) In NAF PredicateTimeAnchor relation is described through three attributes which have as value a reference link to a time expression: anchorTime: indicate the point in time when the event occured beginPoint: indicate the begin of the interval in which the event occured endPoint: indicate the end of the interval in which the event occured The NAF representation is as follows: <t e m p o r a l R e l a t i o n s > <p r e d i c a t e A n c h o r i d=”an2 ” anchorTime=”tmx1”> <span> <t a r g e t i d=”pr1”/> </span> </p r e d i c a t e A n c h o r > <p r e d i c a t e A n c h o r i d=”an3 ” b e g i n P o i n t=”tmx3” endPoint=”tmx4”> <span> <t a r g e t i d=”pr3”/> </span> </p r e d i c a t e A n c h o r > </t e m p o r a l R e l a t i o n s > 4.3.2 Predicate Time Anchor Relation Extraction Following the definition of time anchoring of events in the TimeLine shared task at SemEval 2015 (Minard et al. (2015)), we have developed a system to extract “anchoring” relations between an event and a time expression in the text. The system is rule based and performs a kind of reasoning over the temporal information. It uses the temporal relations previously extracted, verb tenses, dependency trees and temporal signals. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 5 89/148 From TimeLines to StoryLines In the previous section, we discussed the detection of temporal expressions, temporal relations and causal relations within a single document. In this section, we describe large structures that go beyond the document level, such as TimeLines of events for entities across documents and StoryLines. Stories are seen as the most natural representation of changes over time that also provide explanatory information to these changes. Not all changes make a story. Repetitive changes without further impact, e.g. the rising and dawning of the sun, do not provide a story. We expect that news is typically focused on those changes that have a certain impact. We further assume that the news tries to explain these events (How did it get so far? Who is responsible? ) and describethe consequences of the event. Our starting point is therefore a key concept from narrative studies, namely that of plot structure (Ryan (1991); Herman et al. (2010)). A plot structure is a representational framework which underlies the way a narrative is presented to a receiver. Figure 37 shows a a graphical representation of a plot structure. Figure 37: General structure of a plot building up to a climax point We therefore seek to create structures of events selected from all extracted events that approximate such abstract plot structure. Contrary to what can be done in narrative studies, where the documents themselves normally provide a linear development of the plot structure, we aim at obtaining the plot structure from collections of news articles on the same topic and spanning over a period of time. We aim at identifying first the “climax”, which in our perspective will correspond to the most salient event in a news article. After the most salient event and its participants have been identified we will use event relations to identify the rising actions (i.e. how and why did the most salient event occur? ), if any28 , and the falling actions and consequences (i.e. what happened after the climax? what are the speculations linked to the climax event? . . . ). The first step towards the creation of StoryLines is to to establish the temporal ordering of events. In subsection 5.1.2 we report on a TimeLine extraction system. In subsection 5.2.1 we describe two computational models for StoryLines and their implementation, which to our knowledge are unique in their kind. Finally, in subsection 5.2.3, we report on the results and insights 28 Notice that (unforeseeable) natural events, like earthquakes, are to be considered as self-contained climax events. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 90/148 of the first workshop on “Computing News Storylines (CNewsStory 2015)” organised as a satellite event of the ACL-IJCNLP 2015 conference. 5.1 TimeLine extraction This section reports on the advancements in the development of the TimeLine extraction. TimeLine extraction aims at reconstructing the chronological temporal order of events form (large) collections of news spanning over different years. In the following subsection, we describe the task, the benchmark data which had been developed for the SemEval 2015 evaluation exercise Task 4: TimeLine: Cross-Document Event Ordering, and the new version of the TimeLine system. 5.1.1 TimeLines: task description The Task 4: TimeLine: Cross-Document Event Ordering was proposed as a new pilot task for the SemEval 2015 Evaluation Exercise. The task builds on previous temporal processing tasks organised in previous SemEval editions (TempEval-129 , TempEval-230 and TempEval331 ). The task aimed at advancing temporal processing by tackling for the first time with a common and public dataset the following issues: • cross-document and cross-temporal event detection and ordering. • entity-based temporal processing. Following the task guidelines, a TimeLine can be defined as a set of chronologically anchored and ordered events related to an entity of interest (i.e. a person, a commercial or financial product, an organization, and similar) obtained from a document collection spanning over a (large) period of time. Furthermore, not all events are eligible to enter a TimeLine. The task organisers have restricted the event mentions to specific parts-ofspeech and classes, as defined in the task Annotation Guidelines32 , in particular, an event can enter into a TimeLine only if the following conditions apply: • it is realised by a verb, a noun or a pronoun (anaphoric reference); • it semantically denotes the happening of something (e.g. the launch of a new product) or it describes the action of declaring something, narrating an event, informing about an event; • it is a factual or certain event, i.e. something which happened in the past or in the present, or for which there is evidence that will happen in the future. 29 http://www.timeml.org/tempeval/ http://timeml.org/tempeval2/ 31 http://www.cs.york.ac.uk/semeval-2013/task1/ 32 http://alt.qcri.org/semeval2015/task4/data/uploads/documentation/ manualannotationguidelines-task4.pdf 30 NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 91/148 No training data was provided. Only a trial dataset of 30 articles manually annotated from WikiNews, and associated TimeLines for six entities, were provided to the task participants. Each event in the TimeLine is associated with a time anchor of type “DATE” following the TimeML Annotation Guidelines (Saurı́ et al. (2006)). In case an event cannot be associated with a specific time anchor, an underspecified time anchor of the type “XXXX-XX-XX” is provided. The final TimeLine representation is composed by a tab field file containing three fields. The first field (ordering) contains a cardinal number which indicates the position of the event in the TimeLine. Simultaneous, but not coreferential, events are associated with the same ordering number. Events which cannot be reliably ordered, either because of a missing time anchor or underspecified temporal relations (e.g. an event which is associated with a generic date with value “PAST REF”) are to be put at the beginning of the TimeLine and associated with cardinal number 0. The second field (time anchor) contains the time anchor. The third column (event) consists of one event or a list of coreferential events. Each event must be represented by the file id, the sentence id and the extent of the event mention (i.e. the token). To clarify the representation of a TimeLine, in Figure 38 we report the output of SPINOZA VU 1 system for the target entity “Airbus”33 . Figure 38: Example of timeline output generated by the SPINOZA VU 1 system Two tracks were proposed. Track A aimed at extracting TimeLines for target entities from raw text. Track B aimed at extracting TimeLines for target entities by providing manually annotated data for the gold events. Both tracks have a subtrack, Subtrack A and Subtrack 33 The entity ”Airbus” was one of the entity provided by the SemEval task organisers NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 92/148 B, whose goals are to evaluate only the ordering of events, without taking into account the time anchoring. The test data consisted of three different corpora, each containing 30 articles, and an overall 37 target entities (12 for the first corpus, 12 for the second corpus and 13 for the third corpus). The evaluation is based on the TempEval-3 evaluation (UzZaman et al. (2012)) tool. All events associated with cardinal number 0 in a TimeLine are excluded from the evaluation. Results and ranking reports the micro average F1 score for temporal awareness. 5.1.2 System Description and Evaluation A detailed description of the first version of the TimeLine extraction system can be found in Rospocher et al. (2015) and Caselli et al. (2015a). Two different versions were developed, called SPINOZA VU 1 and SPINOZA VU 2 respectively. The systems took part only to the Track A (both main and subtask) of the SemEval 2015 Task 4. In Table 27 we report the results of both versions of the system for the Task A - Main, including the best performing system. Table 28 reports the results of both versions of the system for the Task A - Subtask. In Table 28 no other result is reported because only our system participated. The F1-scores ranges from 0 to 100. System Version SPINOZA VU 1 SPINOZA VU 2 WHUNLP 1 Corpus 1 4.07 2.67 8.31 Corpus 2 5.31 0.62 6.01 Corpus 3 0.42 0.00 6.86 Overall 3.15 1.05 7.28 Table 27: System Results (micro F1 score) for the SemEval 2015 Task 4 Task A - Main System Version SPINOZA VU 1 SPINOZA VU 2 Corpus 1 1.20 0.00 Corpus 2 1.70 0.92 Corpus 3 2.08 0.00 Overall 1.69 0.27 Table 28: System Results (micro F1 score) for the SemEval 2015 Task 4 Task A - Subtask Overall, the results are not satisfying. Out of 37 entity based TimeLines, we obtained results only for 31 of them. An error analysis showed three main sources of errors which affected both versions of our system: event detection, temporal relations, semantic role labelling and the connections between these three. This means that: i.) we may be able to identify the correct event with respect to the target entity, but we are lacking the temporal relation information for that event, thus failing to put it into the TimeLine, ii.) we fail in the identification of the target entity as an argument of an event. The temporal relation of an event is the main source of error from these three. A detailed report on the error analysis of the system can be found in Caselli et al. (2015c). Future work is directed towards detecting more temporal relations between events and expressions that are explicit in the text but also to use knowledge on temporal ordering of events that is implicit and NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 93/148 not expressed in the text. The latter can be learned from large text corpora. In the next subsection, we describe the first results for resolving implicit relations exploiting the whole document. 5.1.3 Document level time-anchoring for TimeLine extraction As seen previously, TimeLine extraction requires a quite complete time anchoring. We have shown that the temporal relations that explicitly connect events and time expressions are not enough to obtain a full time-anchor annotation and, consequently, produce incomplete TimeLines. For this reason, we propose that for a complete time-anchoring the temporal analysis must be performed at a document level in order to discover implicit temporal relations. We have developed a preliminary module based on other research lines involving the extraction of implicit information (Palmer et al., 1986; Whittemore et al., 1991; Tetreault, 2002). Particularly, we are inspired by recent works on Implicit Semantic Role Labelling (ISRL) (Gerber and Chai, 2012) and very specially on the work by Blanco and Moldovan (2014) who adapted the ideas about ISRL to focus on modifiers, including arguments of time, instead of core arguments or roles. We have developed a deterministic algorithm of the type of (Laparra and Rigau, 2013) for ISRL. Similarly to the module presented in 5.1.2, we implemented a system that builds TimeLines from events with explicit time-anchors. We defined a three step process to build TimeLines. Given a set of documents and a target entity, the system first obtains the events in which the entity is involved. Second, it obtains the time-anchors for each of these events. Finally, it sorts the events according to their time-anchors. For steps 1 and 2 we apply the NewsReader pipeline to obtain annotations at different levels. Specifically, we are interested in Named-Entity Recognition (NER) and Disambiguation (NED), Co-reference Resolution (CR), Semantic Role Labelling (SRL), Time Expressions Identification (TEI) and Normalization (TEN), and Temporal Relation Extraction (TRE). Named-Entity Recognition (NER) and Disambiguation (NED): We perform NER using the ixa-pipe-nerc that is part of IXA pipes (Agerri et al., 2014). The module provides very fast models with high performances, obtaining 84.53 in F1 on CoNLL tasks. Our NED module is based on DBpedia Spotlight (Daiber et al., 2013). We have created a NED client to query the DBpedia Spotlight server for the Named entities detected by the ixa-pipe-nerc module. Using the best parameter combination, the best results obtained by this module on the TAC 2011 dataset were 79.77 precision and 60.67 recall. The best performance on the AIDA dataset is 79.67 precision and 76.94 recall. Coreference Resolution (CR): In this case, we use a coreference module that is loosely based on the Stanford Multi Sieve Pass sytem (Lee et al., 2011). The system consists of a number of rule-based sieves that are applied in a deterministic manner. The system scores 56.4 F1 on CoNLL 2011 task, around 3 points worse than the system by Lee et al. (2011). NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 94/148 Semantic Role Labelling (SRL): SRL is performed using the system included in the MATE-tools (Björkelund et al., 2009). This system reported on the CoNLL 2009 Shared Task a labelled semantic F1 of 85.63 for English. Time Expression Identification (TEI) and Normalization (TEN): We use the time module from TextPro suite (Pianta et al., 2008) to capture the tokens corresponding to temporal expressions and to normalize them following TIDES specification. This module is trained on TempEval3 data. The average results for English is: 83.81% precision, 75.94% recall and 79.61% F1 values. Time Relation Extraction (TRE): We apply the temporal relation extractor module from TextPro to extract and classify temporal relations between an event and a time expression. This module is trained using yamcha tool on the TempEval3 data. The result for relation classification on the corpus of TempEval3 is: 58.8% precision, 58.2% recall and 58.5% F1. Our TimeLine extraction system uses the linguistic information provided by the pipeline. The process to extract the target entities, the events and time-anchors can be described as follows: (1) Target entity identification: The target entities are identified by the NED module. As they can be expressed in several forms, we use the redirect links contained in DBpedia to extend the search of the events involving those target entities. For example, if the target entity is Toyota the system would also include events involving the entities Toyota Motor Company or Toyota Motor Corp. In addition, as the NED does not always provide a link to DBpedia, we also consider the matching of the wordform of the head of the argument with the head of the target entity. (2) Event selection: We use the output of the SRL module to extract the events that occur in a document. Given a target entity, we combine the output of the NER, NED, CR and SRL to obtain those events that have the target entity as filler of their ARG0 or ARG1. We also set some constraints to select certain events according to the specification of the SemEval task. That is, we only return those events that are not negated and are not accompanied by modal verbs except will. (3) Time-anchoring: We extract the time-anchors from the output of the TRE and SRL. From the TRE, we extract as time-anchors those relations between events and timeexpressions identified as SIMULTANEOUS. From the SRL, we extract as time-anchors those ARG-TMP related to time expressions. In both cases we use the time-expression returned by the TEI module. The tests performed on the trial data show that the best choice for time-anchoring is combining both options. For each time anchor we normalize the time expression using the output of the TEN module. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 95/148 The TimeLine extraction process described following this approach builds TimeLines for events with explicit time-anchors. We call this system BTE and it can be seen as a baseline since we believe that the temporal analysis should be carried out at document level. The explicit time anchors provided by the NLP tools do not cover the full set of events involving a particular entity. That is, most of the events do not have an explicit time anchor and therefore are not captured as part of the TimeLine of that entity. Thus, we need to recover the time-anchors that appear implicitly in the text. In this preliminary work, we propose a simple strategy that tries to capture implicit time-anchors while maintaining the coherence of the temporal information in the document. This strategy follows previous works on Implicit Semantic Role Labelling. Figure 39: Example of document-level time-anchoring. The rationale behind algorithm 1 is that by default the events of an entity that appear in a document tend to occur at the same time as previous events involving the same entity, except stated explicitly. For example, in Figure 39 all the events involving Steve Jobs, like gave and announced, are anchored to the same time-expression Monday although this only happens explicitly for the first event gave. The example also shows how for other events that occur in different times the time-anchor is also mentioned explicitly, like for those events that involve the entities Tiger and Mac OS X Leopard. Algorithm 1 starts from the annotation obtained by the tools described above. For a particular entity a list of events (eventList) is created sorted by its occurrence in the text. Then, for each event in this list the system checks if that event has already a time-anchor (eAnchor). If this is the case, the time-anchor is included in the list of default timeanchors (def aultAnchor) for the following events of the entity with the same verb tense (eT ense). If the event does not have an explicit time-anchor but the system has found a time-anchor for a previous event belonging to the same tense (def aultAnchor[eT ense]), this time-anchor is also assigned to the current event (eAnchor). If none of the previous conditions satisfy, the algorithm anchors the event to the Document Creation Time (DCT) and sets this time-expression as the default time-anchor for the following events with the same tense. Note that algorithm 1 strongly depends on the tense of the events. As this information can be only recovered from verbal predicates, this strategy cannot be applied to events described by nominal predicates. For these cases just explicit time-anchors are taken into NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 96/148 Algorithm 1 Implicit Time-anchoring 1: eventList = sorted list of events of an entity 2: for event in eventList do 3: eAnchor = time anchor of event 4: eT ense = verb tense of event 5: if eAnchor not N U LL then 6: def aultAnchor[eT ense] = eAnchor 7: else if def aultAnchor[eT ense] not N U LL then 8: eAnchor = def aultAnchor[eT ense] 9: else 10: eAnchor = DCT 11: def aultAnchor[eT ense] = DCT 12: end if 13: end for account. The TimeLine is built ordering the events according to the time-anchors obtained both explicitly and implicitly. We call this system DLT. We evaluated our two TimeLine extractors on the main track of the SemEval 2015 task 4. Two systems participated in this track, WHUNLP and the module explained in 5.1.2, with three runs in total. Their performances in terms of Precision (P), Recall (R) and F1-score (F1) are presented in Table 29. We also present in italics additional results of both systems. On the one hand, the results of a corrected run of the WHUNLP system provided by the SemEval organizers. On the other hand, the results of an out-ofcompetition version of the SPINOZAVU module The best run is obtained by the corrected version of WHUNLP 1 with an F1 of 7.85%. The low figures obtained show the intrinsic difficulty of the task, especially in terms of Recall. Table 29 also contains the results obtained by our systems. We present two different runs. On the one hand, we present the results obtained using just the explicit time-anchors provided by BTE. As it can be seen, the results obtained by this run are similar to those obtained by WHUNLP 1. On the other hand, the results of the implicit time-anchoring approach (DLT) outperforms by far our baseline and all previous systems applied to the task. To check that these results are not biased by the time-relation extractor we use in our pipeline (TimePro), we reproduce the performances of BTE and DLT using another system to obtain the time-relations. For this purpose we used CAEVO by Chambers et al. (2014). The results obtained in this case show that the improvement obtained by our approach is quite similar, regardless of the time-relation extractor chosen. The figures in Table 29 seem to prove our hypothesis. In order to obtain a full timeanchoring annotation, the temporal analysis must be carried out at a document level. The TimeLine extractor almost doubles the performance by just including a straightforward strategy as the one described in this section. As expected, Table 29 shows that this improvement is much more significant in terms of Recall. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 System SPINOZAVU-RUN-1 SPINOZAVU-RUN-2 WHUNLP 1 OC SPINOZA VU WHUNLP 1 BTE DLT BTE caevo DLT caevo 97/148 P R F1 7.95 1.96 3.15 8.16 0.56 1.05 14.10 4.90 7.28 7.12 14.59 5.37 7.85 26.42 4.44 7.60 20.67 10.95 14.31 17.56 4.86 7.61 17.02 12.09 14.13 Table 29: Results on the SemEval-2015 task 5.2 Storylines The TimeLines discussed in the previous section form the basis for StoryLines. Stories are a pervasive phenomenon in human life. They are explanatory models of the world and of its happenings (Bruner, 1990). We make reference to the narratology framework of Bal (Bal, 1997) to identify the basic concepts which inform our model. Every story is a mention of a fabula, i.e., a sequence of chronologically ordered and logically connected events involving one or more actors. Actors are the agents, not necessarily humans, of a story that perform actions. In Bal’s framework “acting” refers both to performing and experiencing an event. Events are defined as transitions from one state to another. Furthermore, every story has a focalizer, a special actor from whom’s point of view the story is told. Under this framework, the term “story” is further defined as the particular way or style in which something is told. A story, thus, does not necessarily follow the chronological order of the events and may contain more than one fabula. Extending the basic framework and focusing on the internal components of the fabula, a kind of universal grammar can be identified which involves the following elements: • Exposition: the introduction of the actors and the settings (e.g. the location); • Predicament: it refers to the set of problems or struggles that the actors have to go through. It is composed by three elements: rising action, the event(s) that increases the tension created by the predicament, climax, the event(s) which creates the maximal level of tension , and, finally, falling action, the event(s) which resolve the climax and lower the tension; • Extrication: it refers to the “end” of the predicament and indicates the ending. The model allows to focus on each its the components, highlighting different, though connected, aspects: the internal components of the fabula are event-centered; the actors and the focalizer allows access to opinions, sentiments, emotions and world views; and, the medium to the specific genres and styles. We developed two different approaches to NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 98/148 Figure 40: Example of a StoryLine merging the TimeLines of the entities Steve Jobs and Iphone 4. create StoryLine structures that focus on different aspects of the fabula. The first approach aggregates stories from separate TimeLines for different actors through co-participation. The second approach aggregates stories from climax events and bridging relations with other events that precede and follow the climax. 5.2.1 StoryLines aggregated from entity-centered TimeLines Timelines as described in the previous section are built for single entities. However, stories usually involve more than one entity. In this section, we present a proposal to create StoryLines by merging the individual TimeLines of two or more different entities, provided that they are co-participants of at least one relevant event. In general, given a set of related documents, any entity appearing in the corpus is a candidate to take part in a StoryLine. Thus, a TimeLine for every entity should be extracted following the requirements described by the SemEval-2015 task. Then, those TimeLines that share at least one relevant event must be merged. Those entities that do not co-participate in any event with other entities are not considered participants of any StoryLine. The expected StoryLines should include both the events where the entities interact and the events where the entities selected for the StoryLines participate individNewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 99/148 ually. The events must be ordered and anchored in time in the same way as individual TimeLines, but it is also mandatory to include the entities that take part in each event. Figure 40 presents the task idea graphically. In the example, two TimeLines are extracted using 5 sentences from 3 different documents, one for the entity Steve Jobs and another one for the entity Iphone 4. As these two entities are co-participants of the events introducing and introduced, the TimeLines are merged in a single StoryLine. As a result, the StoryLine contains the events of both entities. The events are represented by the ID of the file, the ID of the sentence, the extent of the event mention and the participants (i.e. entities) of the event. timelines from SemEval storylines events events / storyline interacting-events interacting-events / storyline entities entities / storyline Apple Inc. Airbus GM Stock Total 6 13 11 13 43 1 2 1 3 7 129 135 97 188 549 129 67.5 97 62.7 78.4 5 12 2 11 30 5 6 2 3.7 4.3 4 9 4 9 26 4 4.5 4 3 3.7 Table 30: Figures of the StoryLine gold dataset. Dataset As a proof-of-concept, we start from the dataset provided in SemEval-2015. It is composed of 120 Wikinews articles grouped in four different corpora about Apple Inc.; Airbus and Boeing; General Motors, Chrysler and Ford; and Stock Market. The Apple Inc. set of 30 documents serve as trial data and the remaining 90 documents as the test set. We consider each corpus a topic to extract StoryLines. Thus, for each corpus, we have merged the interacting individual TimeLines to create a gold standard for StoryLines. As a result of this process, from a total of 43 TimeLines we obtained 7 gold-standard StoryLines spread over 4 topics. Table 30 shows the distribution of the StoryLines and some additional figures about them. Airbus, GM and Stock corpora are similar in terms of size but the number of gold StoryLines varies from 1 to 3. We also obtain 1 StoryLine from the Apple Inc. corpus, but in this case the number of TimeLines is lower. The number of events per StoryLine is quite high in every corpus, but the number of interacting events is very low. Finally, 26 out of 43 target entities in SemEval-2015 belong to a gold StoryLine. Note that in real StoryLines all interacting entities should be annotated whereas now we only use those already selected by the TimeLines task. Evaluation The evaluation methodology proposed in SemEval-2015 is based on the evaluation metric used for TempEval-3 (UzZaman et al., 2013a) which captures the temporal awareness of NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 100/148 an annotation (UzZaman and Allen, 2011). For that, they first transform the TimeLines into a set of temporal relations. More specifically, each time anchor is represented as a TIMEX3 so that each event is related to the corresponding TIMEX3 by means of the SIMULTANEOUS relation. In addition, SIMULTANEOUS and BEFORE relation types are used to connect the events. As a result, the TimeLine is represented as a graph and evaluated in terms of recall, precision and F1-score. As a first approach, the same graph representation can be used to characterize the StoryLines. Thus, for this trial we reuse the same evaluation metric as the one proposed in SemEval-2015. However, we already foresee some issues that need to be addressed for a proper StoryLines evaluation. For example, when evaluating TimeLines, given a set of target entities, the gold standard and the output of the systems are compared based on the F1 micro average scores. In contrast, when evaluating StoryLines, any entity appearing in the corpus is a candidate to take part in a StoryLine, and several StoryLines can be built given a set of related documents. Thus, we cannot compute the micro-average of the individual F1-scores for each StoryLine because the number of StoryLines is not set in advance. In addition, we also consider necessary to capture the cases in which having one gold standard StoryLine a system obtains more than one StoryLine. This could happen when a system is not able to detect all the entities interacting in events but only some of them. We consider necessary to offer a metric which takes into account this type of outputs and also scores partial StoryLines. Obviously, a deeper study of the StoryLines casuistry will lead to a more complete and detailed evaluation metric. Example of a system-run In order to show that the dataset and evaluation strategy proposed are ready to be used on StoryLines, we follow the strategy described to build the gold annotations to implement an automatic system. This way, we create a simple system which merges automatically extracted TimeLines. To build the TimeLines, we use the system explained in section 5.1.3. For each target entity, we first obtain the corresponding Timeline. Then, we check which TimeLines share the same events. In other words, which entities are co-participants of the same event and we build StoryLines from the TimeLines sharing events. This implies that more than two TimeLines can be merged into one single StoryLine. The system builds 2 StoryLines in the Airbus corpus. One StoryLine is derived from the merging of the TimeLines of 2 target entities and the other one from the merging of 4 TimeLines. In the case of the GM corpus, the system extracts 1 StoryLine where 2 target entities participate. For the Stock corpus, one StoryLine is built merging 3 TimeLines. In contrast, in the Apple corpus, the system does not obtain any StoryLine. We evaluated our StoryLine extractor system in the cases where it builts StoryLines. The evaluation results are presented in Table 31. Based on the corpus, the results of our strategy vary. The system is able to create StoryLines which share data with the gold-standard in the Airbus corpus, but it fails to create comparable StoryLines in the GM and Stock corpora. Finding the interacting events is crucial for the extraction of the StoryLines. If these events are not detected for all their participant entities, their corresponding TimeLines cannot be merged. For that reason, NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 Corpus Airbus GM Stock 101/148 Precision 6.92 0.00 0.00 Recall Micro-F 14.29 4.56 0.00 0.00 0.00 0.00 Table 31: Results of the StoryLine extraction process. our dummy system obtains null results for the GM and Stock corpus. However, this is an example of a system capable of creating StoryLines. Of course, more sophisticated approaches or approaches that do not follow the TimeLine extraction approach could obtain better results. 5.2.2 Storylines aggregated from climax events The above method aggregates StoryLines across TimeLines. However, it still does not provide an explanatory notion for the sequences of events. In this section, we present a model that starts from a climax event that motivates the selection of events. In our model we use the term StoryLine to refer to an abstract structured index of connected events which provides a representation matching the internal components of the fabula (rising action(s), climax, falling action(s) and resolution). On the other hand, we reserve the term Story for the textual expression of such an abstract structure34 . Our model, thus, does not represent texts but event data from which different textual representations could be generated. The basic elements of a StoryLine are: • A definition of events, participants (actors), locations and time-points (settings) • Anchoring of events to time • A TimeLine (or basic fabula): a set of events ordered for time (chronological order) • Bridging relations: a set of relations between events with explanatory and predictive value(s) (rising action, climax and falling action) Storylines are built on top of the instance level of representation, as illustrated in 2, and TimeLines. Given a TimeLine for a specific period of time, we define a StoryLine S as n-tuples T, E, R such that: Timepoints = (t1 , t2 , , ..., tn ) Events = (e1 , e2 , ..., en ) Relations = (r1 , r1 , ..., rn ) 34 Note that a StoryLine can be used to generate a textual summary as a story, comparable to (cross)document text summarization. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 102/148 T consists of an ordered set of points in time, E is a set of events and R is a set of bridging relations between these events. Each e in E is related to a t in T. Furthermore, for any pair of events ei and ej , where ei precedes ej there holds a bridging relation [r, ei , ej ] in R. We assume that there is a set of TimeLines L for every E, which is any possible sequence of events temporally ordered. Not every temporal sequence l of events out of L makes a good StoryLine. We want to approximate a StoryLine that people value by defining a function that maximizes the set of bridging relations across different sequences of events l in L. We therefore assume that there is one sequence l that maximizes the values for R and that people will appreciate this sequence as a story. For each l in L, we therefore assume that there is a bridging function B over l that sums the strength of the relations and that the news StoryLine S is the sequence l with the highest score for B : S(E) = M AX(B(l)) B(l)) = n X C(r, ei , ej ) i,j=1 Our bridging function B sums the connectivity strength C of the bridging relations between all time-ordered pairs of events from the set of temporally ordered events l. The kind of bridging relation r and the calculation of the connectivity strength C can be filled in many ways: co-participation, expectation, causality, enablement, and entailment, among others. In our model, we leave open what type of bridging relations people value. This needs to be determined empirically in future research. The set L for E can be very large. However, narratology models state that stories are structured around climax events. The climax event makes the story worthwhile to tell. Other preceding and following events are grouped around the climax to explain it. It thus makes sense to consider only those sequences l that include at least one salient event as a climax and relate other events to this climax event. Instead of calculating the score B for all l in L, we thus only need to build event sequences around events that are most salient as a climax event and select the other events on the basis of the strength of their bridging relation with that climax or with each other. For any climax event ec , we can therefore define: n M AX(B(ec E)) = max C(r, ei , ec ) i=1 The climax value for an event can be defined on the basis of salience features, such as: • prominent position in a source; • number of mentions; • strength of sentiment or opinion; • salience of the involved actors with respect to the source. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 103/148 An implementation should thus start from the event with the highest climax score. Next, it can select the preceding event el with the strongest value for r. Note that this is not necessarily the event that is most close in time. After that, the event el with the strongest connectivity is taken as a new starting point to find any event ek preceding this event with the highest value for r. This is repeated until there are no preceding events in the TimeLine l. The result is a sequence of events up to ec with the strongest values for r. The same process is repeated forward in time starting from ec and adding em with the strongest connectivity value for r, followed by en with the strongest connectivity score r to em . The result is a sequence of events with local maxima spreading from ec : ...ek , rmax , el , rmax , ec , rmax , em , rmax , en ... This schema models the optimized StoryLine starting from a climax event. By ranking the events also for their climax score, the climax events will occupy the highest position and the preceding and following events the lower positions approximating the fabula or plot graph shown in Figure 37. Storyline Extraction System The StoryLine extraction system is composed by three components: a.) TimeLine extraction; b.) climax event identification; c.) rising and falling actions identification. The TimeLine structures are obtained from the system described in Caselli et al. (2015a). Although, all events may enter in a TimeLine, including speech-acts such as say, not every sequence of ordered events makes a StoryLine. Within the total set of events in a TimeLine, we compute for each event its prominence on the basis of the mention sentence number and the number of mentions in the source documents. We currently sum the inverse sentence P number of each mention of an event in the source documents: P (e) = (1/S(em )). em =1→N This formula combines the number of references made to an event with the position in the text that a word is mentioned: early mentions counts more than late mentions and more mentions make it more prominent. All event instances are then ranked according to the degree of prominence P. We implemented a greedy algorithm in which the most prominent event will become the climax event.35 Next, we determine the events with the strongest bridging relation preceding and following the climax event in an iterative way until there are no preceding and following events with a bridging relation. Once an event is claimed for a StoryLine, we prevent it from being re-used for another StoryLine. For all remaining events (not connected to the event with the highest climax score), we again select the event with the highest climax score of the remaining events and repeat the above process. Remaining events thus can create parallel StoryLines although with a lower score. When descending the climax scores, we ultimately are left with events with low climax score that are not added to any StoryLine and do not constitute StoryLines themselves. 35 Future versions of the system can include other properties such as emotion or salience of actors NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 104/148 For determining the value of the bridging relations we use various features and resources, where we make a distinction between structural and implicit relations: • Structural relations: – co-participation; – explicit causal relations; – explicit temporal relations; • Implicit relations: – expectation based on corpus co-occurrence data; – causal WordNet relation; – frame relatedness in FrameNet; – proximity of mentions; – entailment; – enablement. Our system can in principle use any of the above relations and resources. However, in the current version, we have limited ourselves to co-participation and FrameNet frame relations. Co-participation is the case when two events share at least one participant URI which has a PropBank relation A0, A1 or A2. The participant does not need to have the same relation in the two events. Events are related to FrameNet frames if there is any relation between their frames in FrameNet up to a distance of 3. Below we show an example of a larger StoryLine extracted from the corpus used in the SemEval 2015 TimeLine task. :Airbus 29 3 [C]61 23 6 1 15 22 39 20040101 20041001 20040301 20050613 20050613 20050613 20061005 20070228 20070319 21 12 3 21 13 4 20 20070319 20070609 20070708 20080201 20090730 20041124 20141213 ["manufacturer","factory","manufacture"] :Boeing:European_aircraft_manufacturer_Airbus:Airbus ["charge","level","kill"] :United_States_Department_of_Defense:the_deal ["purchase"] :People_s_Republic_of_China:Airbus_aircraft ["win"] :European_aircraft_manufacturer_Airbus:Boeing ["aid","assistance","assist"] :Airbus:Boeing:for_the_new_aircraft ["spark"] :Airbus ["compensate"] :Airbus:of_its_new_superjumbo_A380s ["cut","have","reduction","make"] :Airbus:the_company ["supply","provide","resource","supplier","fund","tube"] :European_Aeronautic_Defence_and_Space_Company_EADS_N.V. :Airbus:United_States_Department_of_Defense ["carry","carrier"]:the_airplane:Airbus_will ["jet"]:Airbus:Airbus_A320_family ["write","letter"]:Airbus:Boeing ["ink","contract","sign"]:Royal_Air_Force:Airbus ["lead","give","offer"] :France:Airbus ["personnel","employee"] :Airbus:Former_military_personnel ["carry","flight","fly"] :The_plane:Airbus Figure 41: Storyline for Airbus and Boeing from the SemEval 2015 Task 4 dataset. The StoryLine is created from a climax event ["purchase"] involving Airbus with a score of 61. The climax event is marked with C at the beginning of the line. After connecting the other events, they are sorted according to their time anchor. Each line NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 Figure 42: Airbus StoryLines order per climax event 105/148 Figure 43: Airbus StoryLine for climax event [61] “purchase” is a unique event instance (between square brackets) anchored in time, preceded by the climax score and followed by major actors involved.36 We can see that all events reflect the commercial struggle between Airbus and Boeing and some role played by governments. In Figure 42, we visualise the extracted StoryLines ordered per climax event. Every row in the visualisation is a StoryLine grouped per climax event, ordered by the climax score. The label and weight of the climax event are reported in the vertical axis together with the label of the first participant with an A1 Propbank role, which is considered to be most informative. Within a single row each dot presents an event in time. The size of the dot represents the climax score. Currently, the bridging relations are not scored. A bridging relation is either present or absent. If there is no bridging relation, the event is not included in the StoryLine. When clicking on a target StoryLine a pop up window opens showing the StoryLine events ordered in time (see Figure 43). Since we present events at the instance level across different mentions, we provide a semantic class grouping these mentions based on WordNet which is shown on the first line. Thus the climax event “purchase” is represented with the label more general label “buy” that represents a hypernym synset. If a StoryLine is well structured, the temporal order and climax weights mimic the fabula internal structure, as in this case. We expect that events close to the climax have larger dots than more distant events in time.37 Stories can be selected per target entity through the drop-down menu on top of the graph. In the Figure 42, all stories concerning Airbus are marked in red. An online version of this visualisation can be found on the project website at http://ic.vupr.nl/timeline. You can upload NAF files from which StoryLines are extracted or JSON files extracted from a collection of NAF files. Comparing the StoryLine representation with the TimeLine (see Figure 38) some differences can be easily observed. In a StoryLine, events are ordered in time and per climax weight. The selection of events in the StoryLine is motivated by the bridging relations which exclude non-relevant events, such as say. We used the visualisation to inspect the results. We observed that some events were missed because of metonymic relations between participants, e.g. Airbus and Airbus 380 are not considered as standing in a coparticipation relation by our system because they have different URIs. In other cases, we 36 37 We manually cleaned and reduced the actors for reasons of space. In future work, we will combine prominence with a score for the strength of the bridging. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 106/148 see more or less the opposite: a StoryLine reporting on journeys by Boeing is interrupted by a plane crash from Airbus due to overgenerated bridging relations. What is the optimal combination of features still needs to be determined empirically. Storyline Evaluation: Unsolved Issues At this stage phase we are not able to provide an extensive evaluation of the system yet. Evaluation methods for StoryLines are not trivial. Most importantly, they cannot be evaluated with respect to standard measures such as Precision and Recall. The value of a story depends a lot on the interest of a user. Evaluation of StoryLines thus should be based on relevance rather than precision and recall. In this section, we describe and propose a set of evaluation methods to be used as a standard reference method for this kind of tasks. The evaluation of a StoryLine must be based, at least, on two aspects: informativeness and interest. A good StoryLine is a StoryLine which interests the user, provides all relevant and necessary information with respect to a target entity, and is coherent. We envisage two types of evaluation: direct and indirect. Direct evaluation necessarily needs human interaction. This can be achieved in two ways: using experts and using crowdsourcing techniques. Experts can evaluate the data provided with the StoryLines with respect to a set of reference documents and check the informativeness and coherence parameters. Following Xu et al. (2013), two types of questions can be addressed at the micro-level and at the macro-level of knowledge. Both evaluation types address the quality of the generated StoryLines. The former addresses the efficiency of the StoryLines in retrieving the information while the latter addresses the quality of the StoryLines with respect to a certain topic (e.g. the commercial “war” between Boeing and Airbus). Concerning metrics, micro-knowledge can be measured by the time the users need to gather the information, while the macroknowledge can be measured as text proportion, i.e. how many sentences of the source documents composing the StoryLine are used to write a short summary. Crowdsourcing can be used to evaluate the StoryLines by means of simplified tasks. One task can ask the crowd to identify salient events in a corpus and then validate if the identified events correlate with the climax events of the StoryLines. Indirect evaluation can be based on a cross-document Summarization tasks. The ideal situation is the one in which the StoryLine contains the most salient and related events and nothing else. These data sets can be used either to recover the sentences in a collection of documents and generate an extractive summary (story) or used to produce an abstractive summary. Summarization measures such as ROUGE can then be used to evaluate the quality of summaries and, indirectly, of the StoryLines (Nguyen et al., 2014; Huang and Huang, 2013; Erkan and Radev, 2004). NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 5.2.3 107/148 Workshop on Computing News Storylines The notion of computational StoryLines for streams of news articles is new. We organised a workshop (Caselli et al., 2015b)38 at ACL in 2015 to discuss this as a new paradigm for research. The workshop brought together researchers from different communities working on representing and extracting narrative structures in news, a text genre which is highly used in NLP but which received little attention with respect to narrative structure, representation and analysis. Currently, advances in NLP technology have made it feasible to look beyond scenario-driven, atomic extraction of events from single documents and work towards extracting story structures from multiple documents, while these documents are published over time as news streams. Policy makers, NGOs, information specialists (such as journalists and librarians) and others are increasingly in need of tools that support them in finding salient stories in large amounts of information to more effectively implement policies, monitor actions of ”big players” in the society and check facts. Their tasks often revolve around reconstructing cases either with respect to specific entities (e.g. persons or organizations) or events (e.g. hurricane Katrina). We received 12 submissions and accepted 9. Overall, we had 20 participants to the workshop. The two approaches developed in NewsReader were also presented at the workshop. In final discussion additional lines of research around questions such as: • which properties do make a sequence of events a story? • how can we identify the salience of events and differentiate it from more subjective notions such as interestingness and importance? • is there a “pattern of type of events” which guides the writing of stories in news? • how can we move from an entity-based approach to StoryLines to coarser-grained representations? • what is the best granularity of representation of a StoryLine? • is a gold standard dataset feasible and useful for evaluating StoryLines? We submitted a follow-up workshop proposal for ACL2016 in Berlin. 38 https://sites.google.com/site/computingnewsstorylines2015/ NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 6 108/148 Perspectives In this section, we describe the design and first steps towards the implementation of a perspective and attribution module. The implementation so far is preliminary, and will be updated in the coming period. The attribution module described in 6.2 is currently restricted to factuality. Other attribution values will be derived in further updates. 6.1 Basic Perspective Module Events are distinguished into contextual events about the domain and source events; speech-acts and cognitive events that relate sources to the statements about the contextual events. In NewsReader, we represent both types of events as instances. They are both events involving participants and bound by time and possibly a place. The perspective layer is generated on top of this initial event representation. Perspectives are complex: they consist of what someone says (the information they choose to provide) and how he/she sais it (choices in expression, opinions and sentiment). A complete representation of perspectives thus includes the basic information provided, who it is provided by (the author or a quoted source), the certainty with which the source makes a statement, whether the source is speculating about the future, confirming or denying something, uses plain language or expressions that carry sentiment, expresses an explicit opinion, etc. Within NewsReader, we focus on identifying the source and establishing attribution values relating to factuality (how certain, future or not, and confirmed or denied) and the sentiment. This information is obtained using information from various layers of the NAF representation. Perspectives are expressed through mentions in text. We find both cases where various sources confirm a perspective and other cases where a different perspective is expressed on the same contextual statement: for example, this would be the case if one source denies a statement while another source confirms it. Moreover, the same source can express different perspectives over time. Because we (1) only aim to represent the perspective a source expresses and do not wish to make a claim about the perspective (s)he has and (2) for this reason feel that perspectives are linked to mentions, we also model perspectives at a mention level. In our current setup and model, a perspective consists of: • The instance representation of the source of a statement, e.g. the CEO of Porsche Wiedeking. • The statement made by the source in the form of triples representing an instance of a contextual event, e.g. Porsche increasing sales in 2004. • A mention of the statement (which is linked to the original statement) • The attribution values that define the relation between the source and the contextual statement: denial/confirmation, positive/negative, future/current, certain/probable/possible, etc. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 109/148 • A link to the source of the mention which can either be the document it is situated in (which in turn is linked to an author or publisher) or a quoted source. According to this specification, we abstract from how the source made the statement, e.g. saying, shouting, thinking, which is represented by the source event itself that is the basis for deriving the perspective. However, we do represent implications that can be derived from the way in which the source is made, e.g. promise to implies future, claim implies certainty, guess implies that the source thinks something is probable and hope that it is possible, therefore is positive. To derive the perspective relations between sources and the contextual statement, we implemented a perspective module that takes several layers as input: • source events with entities that have roles like prop:A0, fn:Statement@Speaker, fn:Expectation@Cognizer, fn:Opinion@Cognizer. • the contextual events that can be related to the source event through a message or topic role as defined in FrameNet or PropBank. • the attribution layer in NAF that indicates the future/non-future tense, the confirmation or denial and/or the certainty of the contextual event according to the source. • the opinion layer in NAF that indicates whether the source has a positive or negative opinion with respect to some target. • the NAF header which provides information about the magazine, publisher and author (if known). In order to combine these pieces of information, we need to intersect the layers. In this first version of the module, we use a very basic and simple algorithm to do this: 1. For all source events in the triples we check if it has a valid relation to an object of the proper semantic type such as person or organization. Valid relations are e.g. pb:A0, fn:Statement@Speaker, fn:Expectation@Cognizer, fn:Opinion@Cognizer. If not we ignore it. 2. We access the SRL layer in NAF to check if the event mention also has message role, e.g. fn:Statement@Message or fn:Statement@Topic. If no such role is found we ignore the source event. 3. We take the span of the message role and we check for any triples in the contextual data to see if they have mentions equal to or embedded within the span of the message. For all these triples, we create a perspective relation between the source and contextual triples. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 110/148 4. We check the attribution layer of each NAF file from which the source events and selected contextual events originate to see of there are attribution properties assigned to the event mention. If so, we copy them to the perspective relation. 5. We check the opinion layer in NAF to find opinion-targets. Next, we check if the spans of the above contextual triples also match or are embedded in the span of the opinion-target. If there is a match, we take over the opinion value (positive/negative) as a value for the perspective relation. The module can be adapted by defining the role constraints in the algorithm. After applying the above algorithms there will be event mentionsz with an explicit perspective value assigned to an explicit source and another set of event mentions for which no perspective is represented. The latter mentions will be assigned to the document as a source. We use the document URI as the source URI. If the event mention is within an opinion-target we represent the sentiment as a value for the attribution relation. If there is a factuality value associated with the mention, we also assign it as a value of the attribution relation. The author, publisher and magazine are meta-properties of the document URI. Finally, all event mentions are either assigned to the document using prov:wasAttributedToor to an explicit source in the document using gaf:wasAttributedTo. Let us consider the following example to illustrate the algorithm: Chrysler expects to sell 5,000 diesel Liberty SUVs, President Dieter Zetsche says at a DaimlerChrysler Innovation Symposium in New York. There are two perspectives expressed with respect to the selling of 5,000 diesel Liberty SUVs. First, the fact that this is an expectation of Chrysler and, secondly, that this is a statement made by the President Dieter Zetsche. Our text processing generates the following 3 predicate-role structures for this sentence as shown in Figure 44 starting from line 1, 26 and 50 respectively, where we abbreviated some of the lists of span elements. Next, the NAF2SEM modules generates two source events (line 1 and line 30) from this data involving the entities Chrysler (line 6) and Zetsche (line 26), as shown in Figure 45. The relations between the events and the entities are expressed starting from line 15 for expect and line 40 for say. Both source events expect and say meet the first constraints that they have an entity of the proper type with a role of the type source: fn:Expectation@Cognizer and fn:Statement@Speaker. Within the set of contextual triples, we find the event sell and its corresponding triples as shown in Figure 46. Next, we intersect the mentions of the contextual event (line 4) with the role layer to see if they can be connected to the source events in the proper way. The SRL has roles for Expectation@Topic, Statement@Topic and Statement@Message. Their spans are defined as a list of term identifiers that need to be matched with tokens that can be matched with their offsets. In this case, we can conclude that the offsets for sell, Chrysler and Livery SUVs match with these roles. Therefore, the software decides that these triples fall within the scope of the perspective. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 111/148 <!−−t 3 6 e x p e c t s : A0 [ t 3 5 C h r y s l e r ] A1 [ t 3 7 t o]−−> <p r e d i c a t e i d =”p r 6”> <!−−e x p e c t s −−> <span><t a r g e t i d =”t 3 6 ”/></span> <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =” E x p e c t a t i o n ”/> <e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”O p i n i o n ”/> <e x t e r n a l R e f r e s o u r c e =”EventType ” r e f e r e n c e =” c o g n i t i o n ”/> </ e x t e r n a l R e f e r e n c e s > < r o l e i d =” r l 1 6 ” semRole=”A0”> <!−−C h r y s l e r −−> <span><t a r g e t i d =”t 3 5 ” head=” y e s ”/></span> <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”E x p e c t a t i o n @ C o g n i z e r ”/> <e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”O p i n i o n @ C o g n i z e r ”/> </ e x t e r n a l R e f e r e n c e s > </ r o l e > < r o l e i d =” r l 1 7 ” semRole=”A1”> <!−−t o s e l l 5 , 0 0 0 d i e s e l L i b e r t y SUVs−−> <span><t a r g e t i d =”t 3 7 ” head=” y e s ”/ >... < t a r g e t i d =”t 4 2 ”/></span> <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”Expectation@Phenomenon”/> <e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”E x p e c t a t i o n @ T o p i c ”/> <e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”Opinion@Topic”/> </ e x t e r n a l R e f e r e n c e s > </ r o l e > </ p r e d i c a t e > <!−−t 3 8 s e l l : A0 [ t 3 5 C h r y s l e r ] A1 [ t 3 9 5,000]−−> <p r e d i c a t e i d =”p r 7”> <!−− s e l l −−> <span><t a r g e t i d =”t 3 8 ”/></span> <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”C o m m e r c e s e l l ”/> <e x t e r n a l R e f r e s o u r c e =”ESO” r e f e r e n c e =” S e l l i n g ”/> </ e x t e r n a l R e f e r e n c e s > < r o l e i d =” r l 1 8 ” semRole=”A0”> <!−−C h r y s l e r −−> <span><t a r g e t i d =”t 3 5 ” head=” y e s ”/></span> <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =” C o m m e r c e s e l l @ S e l l e r ”/> <e x t e r n a l R e f r e s o u r c e =”ESO” r e f e r e n c e =” S e l l i n g @ p o s s e s s i o n −o w n e r 1 ”/> </ e x t e r n a l R e f e r e n c e s > </ r o l e > < r o l e i d =” r l 1 9 ” semRole=”A1”> <!−−5,000 d i e s e l L i b e r t y SUVs−−> <span><t a r g e t i d =”t 3 9 ”/ >... < t a r g e t i d =”t 4 2 ” head=” y e s ”/></span> <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e s o u r c e =”VerbNet ” r e f e r e n c e =” g i v e −13.1@Theme”/> <e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”Commerce sell@Goods”/> <e x t e r n a l R e f r e s o u r c e =”PropBank ” r e f e r e n c e =” s e l l . 0 1 @1”/> </ e x t e r n a l R e f e r e n c e s > </ r o l e > </ p r e d i c a t e > <!−−t 4 7 s a y s : A1 [ t 3 5 C h r y s l e r ] A0 [ t 4 4 P r e s i d e n t ] AM−LOC [ t 4 8 a t]−−> <p r e d i c a t e i d =”p r 8”> <!−−s a y s −−> <span><t a r g e t i d =”t 4 7 ”/></span> <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”S t a t e m e n t ”/> <e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =” T e x t c r e a t i o n ”/> </ e x t e r n a l R e f e r e n c e s > < r o l e i d =” r l 2 0 ” semRole=”A1”> <!−− C h r y s l e r e x p e c t s t o s e l l 5 , 0 0 0 d i e s e l L i b e r t y SUVs−−> <span><t a r g e t i d =”t 3 5 ”/> <t a r g e t i d =”t 3 6 ” head=” y e s ”/ >... < t a r g e t i d =”t 4 2 ”/></span> <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”Statement@Message”/> <e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”Statement@Topic”/> <e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =” T e x t c r e a t i o n @ T e x t ”/> <e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”Choosing@Chosen”/> </ e x t e r n a l R e f e r e n c e s > </ r o l e > < r o l e i d =” r l 2 1 ” semRole=”A0”> <!−− P r e s i d e n t D i e t e r Z e t s c h e −−> <span><t a r g e t i d =”t 4 4 ”/ >... < t a r g e t i d =”t 4 6 ” head=” y e s ”/></span> <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”S t a t e m e n t @ S p e a k e r ”/> <e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”T e x t c r e a t i o n @ A u t h o r ”/> <e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”C h o o s i n g @ C o g n i z e r ”/> </ e x t e r n a l R e f e r e n c e s > </ r o l e > < r o l e i d =” r l 2 2 ” semRole=”AM−LOC”> <!−−a t a D a i m l e r C h r y s l e r I n n o v a t i o n Symposium i n New York−−> <span><t a r g e t i d =”t 4 8 ” head=” y e s ”/ >... < t a r g e t i d =”t 5 5 ”/></span> </ r o l e > </ p r e d i c a t e > Figure 44: Semantic Role elements in NAF for expect, say and sell NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 112/148 nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#e v 1 6 E x p e c t a sem : Event , f n : E x p e c t a t i o n , f n : O p i n i o n , n w r o n t o l o g y : SPEECH COGNITIVE ; rdfs : label ” expect ” ; g a f : denotedBy nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r = 2 0 8 , 2 1 5 . dbp : r e s o u r c e / C h r y s l e r rdfs : label ” C h r y s l e r ” , ” C h r y s l e r Group ” ; g a f : denotedBy nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r = 3 6 , 5 0 , nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r =740 ,748 , nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r =199 ,207 , nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r =1114 ,1122 > nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r =130 ,132 . , nwr : / d a t a / c a r s / e n t i t i e s / L i b e r t y S U V s a n w r o n t o l o g y : MISC ; rdfs : label ” Liberty suvs ” ; g a f : denotedBy nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r =237 ,249 nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#e v 1 6 E x p e c t fn : Expectation@Cognizer dbp : r e s o u r c e / C h r y s l e r ; f n : Expectation@Phenomenon nwr : / d a t a / c a r s / e n t i t i e s / L i b e r t y S U V s ; fn : Expectation@Topic nwr : / d a t a / c a r s / e n t i t i e s / L i b e r t y S U V s ; f n : Opinion@Topic nwr : / d a t a / c a r s / e n t i t i e s / L i b e r t y S U V s . . dbp : r e s o u r c e / D i e t e r Z e t s c h e rdfs : label ” Dieter Zetsche ” ; g a f : denotedBy nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r =261 ,275 . nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#ev11Say a sem : Event , n w r o n t o l o g y : SPEECH COGNITIVE , f n : S t a t e m e n t ; rdfs : label ” say ” ; g a f : denotedBy nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r =276 ,280 . nwr : / d a t a / c a r s / e n t i t i e s / D a i m l e r C h r y s l e r I n n o v a t i o n S y m p o s i u m a n w r o n t o l o g y : ORGANIZATION ; rdfs : label ” D a i m l e r C h r y s l e r I n n o v a t i o n Symposium ” ; g a f : denotedBy nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r =286 ,322 . nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#ev11Say f n : Statement@Speaker dbp : r e s o u r c e / D i e t e r Z e t s c h e ; sem : h a s P l a c e nwr : / d a t a / c a r s / e n t i t i e s / D a i m l e r C h r y s l e r I n n o v a t i o n S y m p o s i u m . Figure 45: SEM-RDF extracted from in NAF for expect, say and sell 1 2 3 4 5 6 7 8 9 10 11 12 13 nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#e v 1 7 S e l l a sem : Event , f n : C o m m e r c e s e l l , e s o : S e l l i n g ; rdfs : label ”sell” ; g a f : denotedBy nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r =219 ,223 . nwr : / d a t a / c a r s / e n t i t i e s / L i b e r t y S U V s rdfs : label ” Liberty suvs ” ; g a f : denotedBy nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r = 2 3 7 , 2 4 9 . nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#e v 1 7 S e l l e s o : p o s s e s s i o n −o w n e r 1 dbp : r e s o u r c e / C h r y s l e r ; fn : Commerce sell@Seller dbp : r e s o u r c e / C h r y s l e r ; f n : Commerce sell@Goods nwr : d a t a / c a r s / e n t i t i e s / L i b e r t y S U V s . Figure 46: SEM-RDF extracted from in NAF for expect, say and sell NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 113/148 <o p i n i o n i d =”o1”> <o p i n i o n e x p r e s s i o n p o l a r i t y =” p o s i t i v e ” s t r e n g t h =”1”> <!−− P r e s i d e n t D i e t e r Z e t s c h e s a y s a t a D a i m l e r C h r y s l e r I n n o v a t i o n Symposium i n New York .−−> <span> <t a r g e t i d =”t 4 4 ” / > . . . . <t a r g e t i d =”t 5 6 ”/> </span> </ o p i n i o n e x p r e s s i o n > </ o p i n i o n > Figure 47: Opinion element in NAF In a similar way, we check for opinions and attribution values to fill in further details of the perspective relation. For this example, we find the following opinion information as shown in Figure 47. The span of the opinion expression matches event say. From this we derive a positive value for the attitude of the source towards the triples in its scope. 6.2 Factuality module The attribution values between a source and the target events are derived from the opinion layer and the factuality layer. In this section, we describe the factuality module that was developed for NewsReader. The description of the opinion module can be found in Agerri et al. (2015). We first describe how factuality needs to be modeled within the attribution module, then describe the current implementation of the module that identifies these values and conclude this section with an outline of future work. 6.2.1 Event factuality Event factuality is a property of the events expressed in a (written or oral) text. We follow Saurı́ (2008) conception of event factuality, which is understood as the level of information expressing the commitment of relevant sources towards the factual nature of eventualities in text. That is, it is in charge of conveying whether eventualities are characterised as corresponding to a fact, to a possibility, or to a situation that does not hold in the world. The term eventualities is used here to refer to events, which can be processes or states. The main characteristics of events are that they have a temporal structure and a set of participants. Factuality is not an absolute property, but it is always relative to a source, since events are always presented from the point of view of someone. The source does not need to be the author of a text, several sources can be reporting about the same event and the same source can assign different factuality values to an event along different points in time. Additionally, we assume that factuality value assignments are made at a specific point in time. We also follow Saurı́ in considering three factuality components: source, time and factuality value. Source refers to the entity that commits to the factuality value of a certain event. Time is relevant because the factuality values of an event can change not only depending on the source but also along time. Furthermore, we assume that any statement made about the future is speculation to a certain extent. It should be noted that, NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 114/148 even though time of the statement and source are considered inherent parts of factuality components, the current implementation only focuses on factuality values and relative tense (i.e. did the source talk about the future or not). Time expressions and source identification are handled by separate components. The previous section explained how we determine the source of given information. The factuality values will be characterized across three dimensions: polarity, certainty and tense. The certainty dimension measures to which extent the source commits to the correlation of an event with a situation in the world, whereas the polarity dimension encodes whether the source is making a positive or a negative statement about an event happening in the world. The certainty dimension can be described as a continuum ranging from absolute certain to absolute uncertain. For the sake of simplicity, here we will consider it as a discrete category with three values, certain, probable and possible, following Saurı́. Polarity is a discrete category which can have two values, positive or negative. Additionally both categories have also an underspecified value, for cases in which there is not enough information to assign a value. Events will be assigned one value per dimension. Finally, our “tense” dimension simply indicates whether a statement is about the future or not. Certainty certain (CERT) probably (PROB) possible (POSS) unknown (U) Polarity positive (POS) negative (NEG) unknown (U) Tense non-future (NONFUT) future (FUT) unknown (U) Table 32: Certainty, polarity and tense values 6.2.2 Identifying factualities Factuality is often expressed by multiple elements in an expression which do not necessarily stand right next to the event the factuality values apply to. A (highly) summarized overview of potentially relevant information is provided below: • Tense, aspect • Lexical markers – Polarity markers: no, nobody, never, fail, ... . They can act at different structural levels. At the clausal level they scope over the event-referring expression; at the subclausal level they affect one of the arguments of the event; at the lexical level by means of affixes. Polarity markers can negate the predicate expressing the event, the subject, the direct or indirect object. – Modality markers (epistemic or deontic) include verbal auxiliaries, adverbials and adjectives: may, might, .perhaps, possible, likely, hopefully, hopeful, ... NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 115/148 – Commissive and volitional predicates (offered, decided) assign the value underspecified to the subordinated event. – Event selecting predicates (ESP) are predicates that select for an argument denoting an event. Syntactically they subcategorise for a that-, gerundive or infinitival clause: claim, suggest, promise, request, manage, finish, decide, offer, want, etc. ESP project factuality information on the event denoted by its argument. Depending on the type of ESP they project different factuality values. For example, prevent projects a counterfactual value, while manage projects a factual value. Some ESP such as claim, are special in that they assign a factuality value to the subcategorised event and at the same time they express who is the source that commits to that factuality value, without the author of the statement committing to it. ESP can be source (SIP) or non-source introducing predicates (NSIP). SIP such as suspect or know are ESP that contribute an additional source relative to which the factuality of the subcategorised event is assessed. • Some syntactic constructions can introduce a factuality value. – In participial adverbial clauses, the event in the subordinated clause is presupposed as true (e.g. Having won Slovenia’s elections, political newcomer Miro Cerar will have to make tough decisions if he is to bring stability to a new government). – In purpose clauses, the main event is presented as underspecifed (e.g. Government mobilizes public and private powers to solve unemployment in the country). – In conditional constructions, the factuality value of the main event in the consequent clause is dependent on the factuality of the main event in the antecedent clause (e.g. If this sentence is not true then it is true). – In temporal clauses, the event is presupposed to be certain in most cases (e.g. While the main building was closed for renovation, the architects completed the Asian Pavilion). Additionally, some syntactic constructions act as scope islands, which means that the events in that construction cannot be affected by markers which are outside the constructions and at the same time the markers in the construction cannot scope over the events which are outside the construction. – Non-restrictive relative clauses (e.g. The new law might affect the house owners who bought their house before 2002 ). – Cleft sentences (e.g. It could have been the owner who replaced the main entrance door ). NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 116/148 An event can be within the scope of one or more lexical markers, which means that in order to compute its factuality, the values of all the markers that scope over it have to be considered, as well as the ways in which markers interact with each other. It is also necessary to take into account that some syntactic contexts establish scope boundaries. The next subsection describes our current factuality module and explains how relevant factors are integrated in the model. 6.2.3 Factuality module The factuality module consists of two main components. The first component is a machine learning component that establishes the certainty and polarity of the event based on a model trained on FactBank. The second component provides a rule-based interpretation of the tense of the event indicating whether it is future, past or present or unknown. The rule based component simply checks the explicit tense marking on the verb or, in case of a nominal event, the tense marking on the verb that governs it. If no tense values are found this way, the value is set to ‘unknown’. The machine learning component improves on the previously existing module in three ways. FactBank contains several layers of annotation for factuality values. The factuality values assigned by the direct source making the statement and all factuality values assigned by any other source. This means that when someone is quoted in an article, the factuality values from the quoted source are provided as well as the factuality values of the author of the article. Because the author typically does not provide explicit indication of whether he or she agrees with their source, the values associated with the author tend to be underspecified for both polarity and certainty. As a result, factuality values at the highest level (those attributed to the author) are almost exclusively certain-positive and underspecifiedunderspecified. The first difference between the new module and the old one is that we train on the most embedded layer. This provides more variety in factuality values. The second difference is that the values are translated from the joint certainty-polarity values found in FactBank to the individual dimensions used in NewsReader annotations. This allows us to experiment with training each dimension separately. The third difference is that we use a much more elaborate set of features based on the relevant elements outlined above. The following features are used in the current system: • lemma and surface form of event and words in direct context • lemma of the event’s head • dependency relation • POS and morphological information about the event and the head word • dependency chain NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 117/148 < f a c t u a l i t y i d =” f 1 ”> <span> <t a r g e t i d =”t 3 1 0 ”/> </span> <f a c t V a l r e s o u r c e =”nwr : a t t r i b u t i o n T e n s e ” v a l u e =”UNDERSPECIFIED”/> <f a c t V a l r e s o u r c e =” f a c t b a n k ” v a l u e =”NONE”/> <f a c t V a l r e s o u r c e =”nwr : a t t r i b u t i o n C e r t a i n t y ” v a l u e =”CERTAIN”/> <f a c t V a l r e s o u r c e =”nwr : a t t r i b u t i o n P o l a r i t y ” v a l u e =”POS”/> </ f a c t u a l i t y > < f a c t u a l i t y i d =” f 2 ”> <span> <t a r g e t i d =”t 2 1 4 ”/> </span> <f a c t V a l r e s o u r c e =”nwr : a t t r i b u t i o n T e n s e ” v a l u e =”NON FUTURE”/> <f a c t V a l r e s o u r c e =” f a c t b a n k ” v a l u e =”CT+”/> <f a c t V a l r e s o u r c e =”nwr : a t t r i b u t i o n C e r t a i n t y ” v a l u e =”CERTAIN”/> <f a c t V a l r e s o u r c e =”nwr : a t t r i b u t i o n P o l a r i t y ” v a l u e =”POS”/> </ f a c t u a l i t y > < f a c t u a l i t y i d =” f 2 0 ”> <span> <t a r g e t i d =”t 8 1 ”/></span> <f a c t V a l r e s o u r c e =”nwr : a t t r i b u t i o n T e n s e ” v a l u e =”FUTURE”/> <f a c t V a l r e s o u r c e =” f a c t b a n k ” v a l u e =”CT+”/> <f a c t V a l r e s o u r c e =”nwr : a t t r i b u t i o n C e r t a i n t y ” v a l u e =”CERTAIN”/> <f a c t V a l r e s o u r c e =”nwr : a t t r i b u t i o n P o l a r i t y ” v a l u e =”POS”/> </ f a c t u a l i t y > Figure 48: NAF examples for factuality It should furthermore be noted that the system was trained on the events that the NewsReader pipeline identified in the FactBank data. Events that are not marked as such in FactBank receive ’NONE’ as a value, since the gold values for these are unknown. These events form a significant part of the data and the ’NONE’ value is regularly found by the classifier. We add the interpretation certain and positive to these events, because this combination forms a strong majority class. Our pipeline identifies more nominal event references than found in FactBank and the majority class is even stronger for these nominal references, which further justifies this decision. The output does still indicate that the original value found by the classifier was ‘NONE’, so that one can distinguish between these default interpretations and cases where the value was assigned by the classifier. Figure 48 shows some output examples. The span elements need to be matched with the event spans to decide on the events to which the factuality applies. 6.2.4 Future work The module described above covers all main factuality values we are interested in and takes the most relevant information that can influence factuality into account. It thus forms a solid basis for investigating factuality detection in text. It is, however, only the first version of this new approach. In future work, we will work in two directions to improve the model. First, we will experiment with various forms of training data. This involves not considering events that are not in FactBank and thus avoiding the ‘NONE’ class as well as extending the set so that the machine learner can also identify tense features.39 Second, we will experiment with a variety of features aiming for information that is more specifically linked to factuality markers. Currently, the factuality module only handles English. In order to adapt the module to other languages new factuality lexicons will be needed and the syntactic rules to find 39 Note that excluding the NONE-class needs to be taken into account when evaluating on FactBank: for events that are identified by the pipeline, but that are not in FactBank, we have no idea how well the classification behaves. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 118/148 nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#e v 1 6 E x p e c t rdfs : label ” expect ” ; g a f : denotedBy nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r = 2 0 8 , 2 1 5 . fn : Expectation@Cognizer dbp : r e s o u r c e / C h r y s l e r . nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#ev11Say rdfs : label ” say ” ; g a f : denotedBy nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r =276 ,280 f n : Statement@Speaker dbp : r e s o u r c e / D i e t e r Z e t s c h e ; sem : h a s P l a c e nwr : / d a t a / c a r s / e n t i t i e s / D a i m l e r C h r y s l e r I n n o v a t i o n S y m p o s i u m . nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#e v 1 7 S e l l rdfs : label ”sell” ; g a f : denotedBy nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r =219 ,223 dbp : r e s o u r c e / C h r y s l e r ; fn : Commerce sell@Seller f n : Commerce sell@Goods nwr : d a t a / c a r s / e n t i t i e s / L i b e r t y S U V s . . . Figure 49: Simplified SEM-RDF triples for events as named-graphs the scopes of factuality markers will have to be adapted. The lexicons can be probably translated from the English lexicons and manually revised by a linguistics expert in order to check whether the same factuality behaviour applies. More costly might be to adapt the syntactic rules, though part of the cost can be reduced by preprocessing the documents with parsers that have models for several languages (such as MaltParser). We will make a predevelopment analysis of the cost of adapting the processor to other languages in order to design it in such a way that the cost can be maximally reduced. 6.3 A perspective model The previous sections described how we extract the source of events and how we identify factuality attributes of events. In this subsection, we explain how we combine this information to represent perspectives in RDF. As mentioned, we associate perspective information with mentions. After all, people can change their perspective so that a single source may make incompatible statements about an event. To capture this fact, we represent all information related to perspectives (for now: source, factuality and sentiment) in the mention layer. The triples below illustrate how this looks like for the example from Section 6.1, repeated here for convenience: Chrysler expects to sell 5,000 diesel Liberty SUVs, President Dieter Zetsche says at a DaimlerChrysler Innovation Symposium in New York. Consider the following triples associated with these statements in Figure 49 for the events expect, say and sell, which is a reduced representation according to the SEM-RDF format. They represent the labels, the SEM relations and the mentions of the three events. We know from our perspective interpretation algorithm that the source of the statement about selling SUVs (line 13) is Chrysler. Chrysler introduced the statement through its expectation (line 7, 8 and 9). The source of this expectation is Dieter Zetsche (lines 1 till 4). The factuality module should tell us that Zetsche’s statement (saying) is CERTAIN, POSITIVE and NON-FUTURE according to the article. Zetsche assigns the same factuality values to Chrysler’s expecting SUV sales. On the other hand, the selling is NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 119/148 # meta d a t a p r o p e r t i e s on t h e document : a u t h o r , magazine , p u b l i s h e r nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml p r o v : w a s A t t r i b u t e d T o <h t t p : / /www. n e w s r e a d e r −p r o j e c t . eu / p r o v e n a n c e / m a g a z i n e / autoweek . com> . #a t t r i b u t i o n : ev11Say of Dieter Zetsche g a f : denotedBy says nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r =276 ,280 nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r =276 ,280 gaf : hasAttribution nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . x m l d A t t r 1 . . nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . x m l d A t t r 1 rdf : value g a f : CERTAIN NON−FUTURE POS ; prov : wasAttributedTo nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml . #a t t r i b u t i o n o f C h r y s l e r e x p e c t s : e v 1 6 E x p e c t g a f : denotedBy nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r =208 ,215 . nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r =208 ,215 gaf : generatedBy nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r =276 ,280 gaf : hasAttribution nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . x m l s A t t r 2 nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . x m l s A t t r 2 rdf : value g a f : CERTAIN NON−FUTURE POS g a f : wasAttributedTo dbp : r e s o u r c e / D i e t e r Z e t s c h e . ; . ; #a t t r i b u t i o n o f s e l l : e v S e l l g a f : denotedBy nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r = 2 1 9 , 2 2 3 . nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r =219 ,223 gaf : hasAttribution nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . x m l s A t t 3 ; gaf : generatedBy nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r =208 ,215 nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . x m l s A t t 3 rdf : value g a f : PROBABLE FUTURE POS, g a f : wasAttributedTo dbp : r e s o u r c e / C h r y s l e r <p o s i t i v e > . . ; Figure 50: Perspective RDF triples for event mentions PROBABLE, POSITIVE and FUTURE (according to Chrysler). In principle, each statement comes from a source and has factuality values.40 In some cases, we have multiple pieces of information that constitute the source (e.g. author, magazine and publisher of an article). In order to avoid massive multiplications of triples, we attribute statements and factuality values that come directly from the article to the article itself. In turn, the article can be attributed to an author, publisher or magazine. This way, information is not repeated for each mention. Figure 50 shows the output triples for the provenance and factuality of the statements about Zetsche and Chrysler from our example, attached to the mentions of the events given in Figure 49. Starting with line 1, we first give the document source properties through the prov:wasAttributed predicate. The sentence originates from the autoweek.com website which is given as the magazine of the source text. The source text is represented through its NewsReader URL. Next, we find attribution relations for each event mention. For convenience, we repeat the gaf:denotedBy triple that links the mention to the event URI: lines 6, 16, 27. Each mention is then associated with specific factuality values. These factuality values also come from the source of the mention. We therefore use an intermediate node related to rdf:value to model factuality and sentiment values and provenance of the statement. This means that each event mention is linked to its own attribution node. This attribution node is linked to the factuality/sentiment values of the 40 Note that unknown factuality values are also factuality values. For instance, the sentence I do not know whether Zetsche said that forms a case where polarity and certainty are ’unknown’ by the source. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 120/148 statement and to the source of the mention. Specifically, line 9 shows the attribution triple for say, line 20 for expect and line 30 for sell. The attribution itself is a URI for which we can define any set of properties. For example, we can see in lines 11, 12, 13 that :dAttr1 has rdf:value gaf:CERTAIN NONFUTURE POS and that it is attributed (gaf:hasAttribution) to the document. The document properties then point to the magazine autoweek.com. We can also see (lines 18, 19, 20) that :sAttr2 has the same rdf:value but is assigned to the source Dieter Zetsche. Finally, :sAtt3 is attributed to the source Chrysler, has factuality values gaf:PROBABLE FUTURE POS and the sentiment value positive. Note that we distinguish between attribution to the source that printed the statement from attribution to a source mentioned in text, where we use prov:wasAttributedTo to indicate the provenance of the article and gaf:wasAttributedTo to indicate the provenance of a quoted source. The attribution relation defined in the grounded annotation framework establishes the same relation between subject and object as the original relation from the PROV-O (Moreau et al., 2012). The difference is that in the case of prov:wasAttributedTo, we are modelling the fact that we pulled the information from a specific source. In the case of gaf:wasAttributedTo we are modelling the fact that the information is attributed to a specific source by someone else. The factuality component indicates which factuality value is associated with the event or statement. We use composed values that contain the three factuality elements described above: certainty, polarity and tense. The ontology defines each of these values separately providing the components of the complex values used in the example above (e.g. CERTAIN NON FUTURE POS has properties certainty CERTAIN, polarity POSITIVE and tense NON-FUTURE). As the information on perspectives grows by adding information such as specific emotions, the ontology will be extended to contain not only these new values, but also more combined values so that we can continue using the compact perspective representation that is presented here. However, the model is flexible enough to also allow separated values for different perspective values, so that we can separate sentiment and emotion from factuality in our representation. Finally, the current implementation creates a perspective RDF file in addition to the SEM RDF output. To connect the mentions of events in the perspective RDF with the event instances in the SEM RDF, a query needs to be formulated that matches the perspective mentions with the gaf:denotedBy triples in SEM RDF. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 7 121/148 Cross-lingual extraction The processing of text in NewsReader takes place in two steps. First, language-specific processors interpret text in terms of mentions of entities, events, time-expressions and relations between them. Secondly, we create a language-independent representation of the instances of entities, events, time-expressions and their relations in RDF-TRiG, as shown schematically in Figure 51: Figure 51: Semantic interoperability across NAF representations Although the first step is specific for each language, the representation of the output in NAF is the same for all the 4 NewsReader languages. Differences are restricted to the tokens that make up the mentions of things. In the case of entities, we normalize these tokens by using the DBpedia URIs, where Dutch, Spanish and Italian DBpedia URIs are mapped to English URIs. In the case of time-expressions, such as Thursday and yesterday, we interpret them to ISO dates. Events are however still represented by the tokens of the predicates. To make the events interoperable across languages, we use the GlobalWordnetGrid to map each predicate to concepts in the InterLingualIndex (ILI, (Vossen et al., 2016; Bond et al., 2016)). As explained in section 2, coreference and therefore identity of events is defined as a function of the identity of its components: the similarity of the action, the participants, the place and time. The latter 3 are semantically interoperable across the NAF representations in the four languages. The events can be compared through their ILI mapping based on the wordnets in their language, as we have been doing so far for English as well. In Figures 52 and 53, we show two fragments in English and Spanish respectively for the same Wikinews article with the representation of an entity, a predicate with roles and a time-expression. The English representation of the entity Boeing 787 has an external reference to the English DBpedia, where the Spanish entity maps to both the Spanish and English DBpedia entries. By taking the English reference for Spanish, we can map both entity references to each other across the documents. We can see the same for the time-expressions in both languages that are normalized to the same value: 2005-01-29. In NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 122/148 case of the predicates, both examples show external references to various ontologies among which FrameNet and ESO and also to WordNet ILI-concepts. The predicates of these two examples can therefore be mapped into each other through the reference ili-30-02207206-v. A similar example is shown in Figure 54 for Dutch NAF where the verb kopen is mapped to the same ILI record and in 55 for the Italian NAF in which acquisterá is mapped to the same concept. To compare the capacity of the different language pipelines to extract the same information, we use the NewsReader MEANTIME corpus (van Erp et al. (2015)). The MEANTIME corpus consists of 120 English news articles taken from Wikinews on 4 topics: Airbus, Apple, automotive industry (GM, Chrysler and Ford), stock market news. Each topic has 30 articles. We translated the 120 English articles to Spanish, Dutch and Italian. The texts in all languages have been annotated according to the same NewsReader annotation scheme. We processed the English, Spanish and Dutch Wikinews articles through the respective pipelines, as described in Agerri et al. (2015). After generating the NAF files, we applied the NAF2SEM process to each language data set. We made a small modification to the original algorithm that compares events on the basis of their lemmas. To be able to compare events across languages, we used the ILI-concept references of the predicates to represent events. Spanish, Italian and Dutch events can thus be mapped to English events provided they were linked to the same concept.41 We generated RDF-TRiG files for the 4 Wikinews corpora for English, Spanish, Italian and Dutch. We implemented a triple counter to compare the data created for each language data set. The triple counter generates statistics for the following information: 1. Entities with a DBpedia URI with the number of mentions 2. Entities without a DBpedia URI with the number of mentions 3. Events represented by their ILI-reference with the number of mentions 4. Frequency count of the roles relating events and entities 5. Frequency counts of the triples relating events with ILI-references and entities through these roles We compared the Spanish, Italian and Dutch data against the English data, calculating the coverage of the English mentions by the other languages. Note that we cannot calculate recall through this method. If the software detects a DBpedia entity E in English in sentence A and not in the translated Spanish sentence T(A) but the same entity E is detected in another Spanish sentence T(B) while it is not detected in the English sentence B, then our current method counts 1 mention in English and 1 mention in Spanish with 100% coverage. This may still count as coverage but not as recall because we do not know 41 This also means that we lump together event-instances that are normally kept separate because they do not share the same time and participant. The comparison therefore does not tell us anything about the precise cross-lingual event-coreference but merely gives a rough indication NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 123/148 <e n t i t y i d =”e 1 ” t y p e=”MISC”> <r e f e r e n c e s > <!−−B o e i n g 787−−> <span><t a r g e t i d =”t 8”/>< t a r g e t i d =”t 9 ”/></span> </ r e f e r e n c e s > <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f c o n f i d e n c e =”1.0” r e f e r e n c e =”h t t p : / DBpedia . o r g / r e s o u r c e / B o e i n g 7 8 7 D r e a m l i n e r ” r e f t y p e =”en ” r e s o u r c e =” s p o t l i g h t v 1 ””/> </ e x t e r n a l R e f > </ e x t e r n a l R e f e r e n c e s > </ e n t i t y > <p r e d i c a t e i d =”p r 6”> <!−−p u r c h a s e −−> <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e f e r e n c e =” p u r c h a s e . 0 1 ” r e s o u r c e =”PropBank”/> <e x t e r n a l R e f r e f e r e n c e =”o b t a i n − 1 3 . 5 . 2 ” r e s o u r c e =”VerbNet”/> <e x t e r n a l R e f r e f e r e n c e =”o b t a i n −13.5.2 −1” r e s o u r c e =”VerbNet”/> <e x t e r n a l R e f r e f e r e n c e =”Commerce buy ” r e s o u r c e =”FrameNet”/> <e x t e r n a l R e f r e f e r e n c e =” p u r c h a s e . 0 1 ” r e s o u r c e =”PropBank”/> <e x t e r n a l R e f r e f e r e n c e =”Buying ” r e s o u r c e =”ESO”/> <e x t e r n a l R e f r e f e r e n c e =” c o n t e x t u a l ” r e s o u r c e =”EventType”/> <e x t e r n a l R e f r e f e r e n c e =” i l i −30−02207206−v ” r e s o u r c e =”WordNet”/> </ e x t e r n a l R e f e r e n c e s > <span><t a r g e t i d =”t 2 8 ”/></span> < r o l e i d =” r l 9 ” semRole=”A0”> <!−− O f f i c i a l s from t h e P e o p l e ’ s R e p u b l i c o f China−−> <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e f e r e n c e =”o b t a i n − 1 3 . 5 . 2 @Agent” r e s o u r c e =”VerbNet”/> <e x t e r n a l R e f r e f e r e n c e =”Commerce buy@Buyer ” r e s o u r c e =”FrameNet”/> <e x t e r n a l R e f r e f e r e n c e =” p u r c h a s e . 0 1 @0” r e s o u r c e =”PropBank”/> <e x t e r n a l R e f r e f e r e n c e =”B u y i n g @ p o s s e s s i o n −o w n e r 2 ” r e s o u r c e =”ESO”/> </ e x t e r n a l R e f e r e n c e s > <span> <t a r g e t head=” y e s ” i d =”t 1 7 ”/ >... < t a r g e t i d =”t 2 4 ”/></span> </ r o l e > < r o l e i d =” r l 1 0 ” semRole=”A1”> <!−−60 B o e i n g 787 D r e a m l i n e r a i r c r a f t −−> <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e f e r e n c e =”o b t a i n − 1 3 . 5 . 2 @Theme” r e s o u r c e =”VerbNet”/> <e x t e r n a l R e f r e f e r e n c e =”Commerce buy@Goods ” r e s o u r c e =”FrameNet”/> <e x t e r n a l R e f r e f e r e n c e =” p u r c h a s e . 0 1 @1” r e s o u r c e =”PropBank”/> </ e x t e r n a l R e f e r e n c e s > <span><t a r g e t i d =”t 2 9 ” / > . . . <t a r g e t head=” y e s ” i d =”t 3 3 ”/></span> </ r o l e > < r o l e i d =” r l 1 1 ” semRole=”AM−LOC”> <!−− i n a d e a l worth US$ 7 . 2 bn−−> <span> <t a r g e t head=” y e s ” i d =”t 3 4 ”/ >... < t a r g e t i d =”t 4 0 ”/> </span> </ r o l e > </ p r e d i c a t e > <t i m e x 3 i d =”tmx5” t y p e=”DATE” v a l u e =”2005−01−29”> <!−−today−−> <span><t a r g e t i d =”w183”/> </span> </timex3> Figure 52: Example of representation of entities, events and roles from an English Wikinews fragment NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 124/148 <e n t i t y i d =”e 1 ” t y p e=”MISC”> <r e f e r e n c e s > <span> <!−−B o e i n g 787−−> <t a r g e t i d =”t 8 ”/> <t a r g e t i d =”t 9 ”/> </span> </ r e f e r e n c e s > <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f c o n f i d e n c e =”1.0” r e f e r e n c e =”h t t p : / e s . DBpedia . o r g / r e s o u r c e / B o e i n g 7 8 7 ” r e f t y p e =” e s ” r e s o u r c e =” s p o t l i g h t v 1 ”> <e x t e r n a l R e f c o n f i d e n c e =”1.0” r e f e r e n c e =”h t t p : / DBpedia . o r g / r e s o u r c e / B o e i n g 7 8 7 D r e a m l i n e r ” r e f t y p e =”en ” r e s o u r c e =” w i k i p e d i a −db−esEn”/> </ e x t e r n a l R e f > </ e x t e r n a l R e f e r e n c e s > </ e n t i t y > <p r e d i c a t e i d =”p r 3”> <!−−comprar−−> <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e f e r e n c e =”comprar . 1 . b e n e f a c t i v e ” r e s o u r c e =”AnCora”/> <e x t e r n a l R e f r e f e r e n c e =”g e t − 1 3 . 5 . 1 ” r e s o u r c e =”VerbNet”/> <e x t e r n a l R e f r e f e r e n c e =”o b t a i n − 1 3 . 5 . 2 ” r e s o u r c e =”VerbNet”/> <e x t e r n a l R e f r e f e r e n c e =”o b t a i n −13.5.2 −1” r e s o u r c e =”VerbNet”/> <e x t e r n a l R e f r e f e r e n c e =”Commerce buy ” r e s o u r c e =”FrameNet”/> <e x t e r n a l R e f r e f e r e n c e =”buy . 0 1 ” r e s o u r c e =”PropBank”/> <e x t e r n a l R e f r e f e r e n c e =” p u r c h a s e . 0 1 ” r e s o u r c e =”PropBank”/> <e x t e r n a l R e f r e f e r e n c e =”Buying ” r e s o u r c e =”ESO”/> <e x t e r n a l R e f r e f e r e n c e =” c o n t e x t u a l ” r e s o u r c e =”EventType”/> <e x t e r n a l R e f r e f e r e n c e =” i l i −30−02207206−v ” r e s o u r c e =”WordNet”/> <e x t e r n a l R e f r e f e r e n c e =” i l i −30−02646757−v ” r e s o u r c e =”WordNet”/> </ e x t e r n a l R e f e r e n c e s > <span> <t a r g e t i d =”t 3 4 ”/> </span> < r o l e i d =” r l 5 ” semRole=”a r g 1”> <!−−60 a v i o n e s B o e i n g 787 D r e a m l i n e r −−> <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e f e r e n c e =”g e t − 1 3 . 5 . 1 @Theme” r e s o u r c e =”VerbNet”/> <e x t e r n a l R e f r e f e r e n c e =”o b t a i n − 1 3 . 5 . 2 @Theme” r e s o u r c e =”VerbNet”/> <e x t e r n a l R e f r e f e r e n c e =”Commerce buy@Goods ” r e s o u r c e =”FrameNet”/> <e x t e r n a l R e f r e f e r e n c e =”buy . 0 1 @1” r e s o u r c e =”PropBank”/> <e x t e r n a l R e f r e f e r e n c e =” p u r c h a s e . 0 1 @1” r e s o u r c e =”PropBank”/> </ e x t e r n a l R e f e r e n c e s > <span> <t a r g e t i d =”t 3 5 ”/> <t a r g e t head=” y e s ” i d =”t 3 6 ”/ >... < t a r g e t i d =”t 3 9 ”/> </span> </ r o l e > < r o l e i d =” r l 6 ” semRole=”argM”> <!−−en un a c u e r d o p o r v a l o r de 7 . 2 0 0 m i l l o n e s de US\$−−> <span> <t a r g e t head=” y e s ” i d =”t 4 0 ”/ >... < t a r g e t i d =”t 4 9 ”/> </span> </ r o l e > </ p r e d i c a t e > <t i m e x 3 i d =”t x 2 ” t y p e=”DATE” v a l u e =”2005−01−29”> <!−−29 de e n e r o d e l 2005−−> <span> <t a r g e t i d =”w20 ” / > . . . <t a r g e t i d =”w24”/> </span> </timex3> Figure 53: Example of representation of entities, events and roles from a Spanish Wikinews fragment NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 125/148 <e n t i t y i d =”e 2 ” t y p e=”MISC”> <r e f e r e n c e s > <span> <!−−A i r b u s A320−−> <t a r g e t i d =” t 1 3 ”/> <t a r g e t i d =” t 1 4 ”/> </span> </ r e f e r e n c e s > <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f c o n f i d e n c e =”1.0” r e f e r e n c e =”h t t p : / n l . DBpedia . o r g / r e s o u r c e / A i r b u s A 3 2 0 ” r e f t y p e =” n l ” r e s o u r c e =” s p o t l i g h t v 1 ”> <e x t e r n a l R e f c o n f i d e n c e =”1.0” r e f e r e n c e =”h t t p : / DBpedia . o r g / r e s o u r c e / A i r b u s A 3 2 0 f a m i l y ” r e f t y p e =”en ” r e s o u r c e =” w i k i p e d i a −db−nlEn”/> </ e x t e r n a l R e f > </ e x t e r n a l R e f e r e n c e s > </ e n t i t y > <p r e d i c a t e i d =”p r 6”> <!−−kopen−−> <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e f e r e n c e =”r v −4101” r e s o u r c e =”C o r n e t t o ”/> <e x t e r n a l R e f r e f e r e n c e =” i l i −30−02207206−v ” r e s o u r c e =”WordNet”/> </ e x t e r n a l R e f e r e n c e s > <span> <t a r g e t i d =” t 1 7 ”/> </span> < r o l e i d =”r 8 ” semRole=”Arg1”> <!−− t w i n t i g A i r b u s A320 p a s s a g i e r s v l i e g t u i g e n −−> <span> <t a r g e t i d =” t 1 2 ” / > . . . <t a r g e t head=” y e s ” i d =” t 1 5 ”/> </span> </ r o l e > < r o l e i d =”r 1 0 ” semRole=”ArgM−PNC”> <!−−v o o r een−−> <span> <t a r g e t head=” y e s ” i d =” t 1 8 ”/> <t a r g e t i d =” t 1 9 ”/> </span> </ r o l e > </ p r e d i c a t e > <t i m e x 3 i d =”tmx3” t y p e=”DATE” v a l u e =”2009−06−18”> <!−−donderdag−−> <span> <t a r g e t i d =”w5”/> </span> </timex3> Figure 54: Example of representation of entities, events and roles from a Dutch Wikinews fragment NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 126/148 <e n t i t y i d =”e 4 ” t y p e=”ORGANIZATION”> <r e f e r e n c e s > <!−−A320−−> <span> <t a r g e t i d =”t 9 ” /> </span> </ r e f e r e n c e s > <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e s o u r c e =” s p o t l i g h t v 1 ” r e f e r e n c e =”h t t p : / i t . d b p e d i a . o r g / r e s o u r c e / A i r b u s A 3 2 0 f a m i l y ” c o n f i d e n c e =”1.0” r e f t y p e =” i t ” s o u r c e =” i t ”> <e x t e r n a l R e f r e s o u r c e =” w i k i p e d i a −db−i t E n ” r e f e r e n c e =”h t t p : / d b p e d i a . o r g / r e s o u r c e / A i r b u s A 3 2 0 f a m i l y ” c o n f i d e n c e =”1.0” r e f t y p e =”en ” s o u r c e =” i t ” /> </ e x t e r n a l R e f > </ e x t e r n a l R e f e r e n c e s > </ e n t i t y > <!−−t 4 a c q u i s t e r \ ’ { a } : A0 [ t 1 China ] A1 [ t 5 v e n t i ]−−> <p r e d i c a t e i d =”p r 1”> <!−− a c q u i s t e r \ ’ { a}−−> <span> <t a r g e t i d =”t 4 ” /> </span> <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e s o u r c e =”EventType ” r e f e r e n c e =”OCCURRENCE” /> <e x t e r n a l R e f r e s o u r c e =”PropBank ” r e f e r e n c e =” a c q u i s t a r e . 0 1 ” /> <e x t e r n a l R e f r e s o u r c e =”WordNet” r e f e r e n c e =” i l i −30−00079018−n” /> <e x t e r n a l R e f r e s o u r c e =”WordNet” r e f e r e n c e =” i l i −30−02207206−v ” /> </ e x t e r n a l R e f e r e n c e s > < r o l e i d =” r l 1 ” semRole=”A0”> <!−−China E a s t e r n A i r l i n e s −−> <span> <t a r g e t i d =”t 1 ” head=” y e s ” /> <t a r g e t i d =”t 2 ” /> <t a r g e t i d =”t 3 ” /> </span> </ r o l e > < r o l e i d =” r l 2 ” semRole=”A1”> <!−− v e n t i n u o v i j e t A i r b u s A320−−> <span> <t a r g e t i d =”t 5 ” /> <t a r g e t i d =”t 6 ” /> <t a r g e t i d =”t 7 ” head=” y e s ” /> <t a r g e t i d =”t 8 ” /> <t a r g e t i d =”t 9 ” /> </span> </ r o l e > </ p r e d i c a t e > <t i m e x 3 i d =”tmx1” t y p e=”DATE” v a l u e =”2009−06−18”> <!−−18 g i u g n o 2009−−> <span> <t a r g e t i d =”w10” /> <t a r g e t i d =”w11” /> <t a r g e t i d =”w12” /> </span> </timex3> Figure 55: Example of representation of entities, events and roles from an Italian Wikinews fragment NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 127/148 whether the Spanish mention in T(B) corresponds with the English mention in A.42 The coverage scores we give below are thus just a rough approximation. A true comparison requires a cross-lingual annotation that also aligns the mentions of entities and events with respect to the same instances. The NAF2SEM module is ignorant of the language of the NAF file. This means that we can process English, Spanish, Italian and Dutch NAF files as if they are different sources, just as we processed multiple English NAF files. The module will merge all entities and events according to the implemented heuristics and generate a single RDFTRiG for all the sources. In the ideal case, the same data should be extracted across the languages as would be extracted for any of the languages separately due to their translation relation. Therefore, merging the NAF files across languages should result in exactly the same numbers of entities, events and triples. Merging the data set can be seen as an extreme test for the cross-lingual extraction and compatibility of the different NLPpipelines. Not all extracted information can be compared. Events that are not mapped to WordNet concepts (ILI-records) are represented by their labels, which are different across the languages. The same holds for so-called dark entities that are not mapped to DBpedia. They are represented by their linguistic form which is usually different. For non-entities, i.e. expressions not detected as entities but that play an important role in the event, string matches across languages are very unlikely. In Figures 56, we show the RDF-TRiG result of merging English, Spanish, Italian and Dutch NAF files for some entities. The entity Airbus was found in NAF files for 3 languages with 6 mentions in the English source, 7 mentions in the Spanish source and 4 mentions in the Dutch source. The entity Airbus A380 on the other hand was only detected by the Italian (7) and Dutch (21 mentions) pipelines. In Figure 57, we see events detected across the languages. The first event, represented through the ILI ili-30-00634472-v, is matched across all 4 languages, the other events across different subsets of languages. In the next subsections, we show the statistics for the 4 corpora and 4 languages. We also provide some statistics for the merging of data against the English results. 7.1 Crosslingual extraction of entities In Table 33, we give the totals of DBpedia entities extracted for English and the other languages: Spanish, Italian and Dutch. For English, we give the unique instances (I) for each corpus and the mentions (M). For the other languages, we give the same but in addition the overlapping mentions (O) and the macro (per document) and micro average of coverage. The English figures show that there is some variation across the 4 corpora. The stock market corpus contains only few instances and mentions and the airbus and gm corpus contain most instances and mentions. Ratios between instances and mentions differ slightly across the corpora. Spanish has a similar amount and ratio for airbus but different 42 In some cases a single English sentence has been translated in more than one sentence in another language NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 128/148 <h t t p : / DBpedia . o r g / r e s o u r c e / A i r b u s> rdfs : label ” Airbus ” , ” Airbus ,” ; g a f : denotedBy nwr : dutch−w i k i n e w s / 1 8 1 6 A i r b u s w i n s Q a t a r A i r w a y s o r d e r w o r t h 1 5 b n#c h a r =31 ,37 , nwr : dutch−w i k i n e w s /816 A i r b u s w i n s Q a t a r A i r w a y s o r d e r w o r t h 1 5 b n#c h a r =564 ,570 , nwr : dutch−w i k i n e w s / 1 8 1 6 A i r b u s w i n s Q a t a r A i r w a y s o r d e r w o r t h 1 5 b n#c h a r =655 ,661 , nwr : dutch−w i k i n e w s / 1 8 1 6 A i r b u s w i n s Q a t a r A i r w a y s o r d e r w o r t h 1 5 b n#c h a r =911 ,917 , nwr : s p a n i s h −w i k i n e w s / 1 8 1 6 A i r b u s w i n s#c h a r =0 ,6 , nwr : s p a n i s h −w i k i n e w s / 1 8 1 6 A i r b u s w i n s#c h a r =615 ,621 , nwr : s p a n i s h −w i k i n e w s / 1 8 1 6 A i r b u s w i n s#c h a r =716 ,722 , nwr : s p a n i s h −w i k i e n e w s / 1 8 1 6 A i r b u s w i n s#c h a r =945 ,951 , nwr : s p a n i s h −w i k i e n e w s / 1 8 1 6 A i r b u s w i n s#c h a r =450 ,456 , nwr : s p a n i s h −w i k i e n e w s / 1 8 1 6 A i r b u s w i n s#c h a r =381 ,387 , nwr : s p a n i s h −w i k i e n e w s / 1 8 1 6 A i r b u s w i n s#c h a r =135 ,141 , nwr : e n g l i s h −w i k i e n e w s / A i r b u s w i n s Q a t a r A i r w a y s o r d e r w o r t h $ 1 5 b n#c h a r =1 ,7 , nwr : e n g l i s h −w i k i e n e w s / A i r b u s w i n s Q a t a r A i r w a y s o r d e r w o r t h $ 1 5 b n#c h a r =93 ,100 , nwr : e n g l i s h −w i k i e n e w s / A i r b u s w i n s Q a t a r A i r w a y s o r d e r w o r t h $ 1 5 b n#c h a r =356 ,362 , nwr : e n g l i s h −w i k i e n e w s / A i r b u s w i n s Q a t a r A i r w a y s o r d e r w o r t h $ 1 5 b n#c h a r =641 ,647 , nwr : e n g l i s h −w i k i e n e w s / A i r b u s w i n s Q a t a r A i r w a y s o r d e r w o r t h $ 1 5 b n#c h a r =153 ,156 , nwr : e n g l i s h −w i k i e n e w s / A i r b u s w i n s Q a t a r A i r w a y s o r d e r w o r t h $ 1 5 b n#c h a r =872 ,878 . <h t t p : / d b p e d i a . o r g / r e s o u r c e / A i r b u s A 3 8 0> rdfs : label ” A380 ” , ” A i r b u s A380 ” , ” A i r b u s 380” ; g a f : denotedBy nwr : i t a l i a n −w i k i n e w s /20583 − t e x t p r o . t x t . txp#c h a r =88 ,99 , nwr : i t a l i a n −w i k i n e w s /3828− t e x t p r o . t x t . txp#c h a r =112 ,123 , nwr : i t a l i a n −w i k i n e w s /3828− t e x t p r o . t x t . txp#c h a r =205 ,216 , nwr : i t a l i a n −w i k i n e w s /31769 − t e x t p r o . t x t . txp#c h a r =109 ,120 > , nwr : i t a l i a n −w i k i n e w s /23242 − t e x t p r o . t x t . txp#c h a r =17,28> , nwr : i t a l i a n −w i k i n e w s /23242 − t e x t p r o . t x t . txp#c h a r =122 ,133 > , nwr : i t a l i a n −w i k i n e w s /25115 − t e x t p r o . t x t . txp#c h a r =2011 ,2022 > , , nwr : dutch−w i k i n e w s / 1 0 0 2 6 F i r s t A 3 8 0 e n t e r s c o m m e r c i a l s e r v i c e#c h a r =117 ,128 > nwr : dutch−w i k i n e w s / 1 0 0 2 6 F i r s t A 3 8 0 e n t e r s c o m m e r c i a l s e r v i c e#c h a r =840 ,844 , nwr : dutch−w i k i n e w s / 1 0 0 2 6 F i r s t A 3 8 0 e n t e r s c o m m e r c i a l s e r v i c e#c h a r =954 ,958 , nwr : dutch−w i k i n e w s / 6 4 7 5 S i n g a p o r e A i r l i n e s t o b e c o m p e n s a t e d f o r A 3 8 0 d e l a y s#c h a r =53,57> , , nwr : dutch−w i k i n e w s / 6 4 7 5 S i n g a p o r e A i r l i n e s t o b e c o m p e n s a t e d f o r A 3 8 0 d e l a y s#c h a r =190 ,194 > nwr : dutch−w i k i n e w s / 6 4 7 5 S i n g a p o r e A i r l i n e s t o b e c o m p e n s a t e d f o r A 3 8 0 d e l a y s#c h a r =1193 ,1197 > , nwr : dutch−w i k i n e w s /555 B o e i n g u n v e i l s l o n g −r a n g e 7 7 7#c h a r =1742 ,1746 > ; nwr : dutch−w i k i n e w s / 7 9 2 4 A 3 8 0 m a k e s m a i d e n f l i g h t t o U S#c h a r =48 ,59 , nwr : dutch−w i k i n e w s / 7 9 2 4 A 3 8 0 m a k e s m a i d e n f l i g h t t o U S#c h a r =512 ,516 , nwr : dutch−w i k i n e w s / 7 9 2 4 A 3 8 0 m a k e s m a i d e n f l i g h t t o U S#c h a r =1188 ,1192 , nwr : dutch−w i k i n e w s /260 A i r b u s l a u n c h e s w o r l d l a r g e s t p a s s e n g e r p l a n e#c h a r =785 ,789 , nwr : dutch−w i k i n e w s / 1 3 8 0 W o r l d l a r g e s t p a s s e n g e r a i r l i n e r m a k e s f i r s t f l i g h t#c h a r =84 ,88 , nwr : dutch−w i k i n e w s / 1 3 8 0 W o r l d l a r g e s t p a s s e n g e r a i r l i n e r m a k e s f i r s t f l i g h t#c h a r =166 ,177 , , nwr : dutch−w i k i n e w s / 1 3 8 0 W o r l d l a r g e s t p a s s e n g e r a i r l i n e r m a k e s f i r s t f l i g h t#c h a r =841 ,845 > nwr : dutch−w i k i n e w s / 1 3 8 0 W o r l d l a r g e s t p a s s e n g e r a i r l i n e r m a k e s f i r s t f l i g h t#c h a r =1037 ,1041 > , , nwr : dutch−w i k i n e w s / 8 9 3 5 B o e i n g u n v e i l s n e w 7 8 7 D r e a m l i n e r#c h a r =1709 ,1713 > nwr : dutch−w i k i n e w s / 2 0 0 7 / 7 / 9 / 8 9 3 5 B o e i n g u n v e i l s n e w 7 8 7 D r e a m l i n e r#c h a r =1871 ,1882 > , , nwr : dutch−w i k i n e w s / 1 0 0 2 1 F i r s t A i r b u s A 3 8 0 d e l i v e r e d#c h a r =52,56> nwr : dutch−w i k i n e w s / 1 0 0 2 1 F i r s t A i r b u s A 3 8 0 d e l i v e r e d#c h a r =617 ,621 > , nwr : dutch−w i k i n e w s / 3 2 3 5 E n g i n e t r o u b l e s d e l a y A i r b u s s u p e r j u m b o t o u r#c h a r =624 ,634 > , nwr : dutch−w i k i n e w s / 7 7 4 2 A i r b u s a n n o u n c e s j o b c u t s o f 1 0 ,000# c h a r =1403 ,1407 > , skos : prefLabel ” A380 ” , ” A i r b u s A380 ” , ” A i r b u s 380” . <h t t p : / d b p e d i a . o r g / r e s o u r c e / White House> rdfs : label ” Witte H u i s ” , ” Casa B i a n c a ” ; g a f : denotedBy nwr : i t a l i a n −w i k i n e w s /16014 − t e x t p r o . t x t . txp#c h a r =3,14> , nwr : i t a l i a n −w i k i n e w s /16014 − t e x t p r o . t x t . txp#c h a r =195 ,206 > , nwr : i t a l i a n −w i k i n e w s /16014 − t e x t p r o . t x t . txp#c h a r =387 ,398 > , nwr : i t a l i a n −w i k i n e w s /16014 − t e x t p r o . t x t . txp#c h a r =981 ,992 > , nwr : dutch−w i k i n e w s / 1 4 0 8 3 B a r a c k O b a m a p r e s e n t s r e s c u e p l a n a f t e r G M d e c l a r a t i o n o f b a n k r u p t c y# c h a r =383 ,393 > , nwr : dutch−w i k i n e w s / 1 3 1 4 3 W h i t e H o u s e c o n s i d e r i n g a u t o r e s c u e p l a n#c h a r =0,10> , nwr : dutch−w i k i n e w s / 1 3 1 4 3 W h i t e H o u s e c o n s i d e r i n g a u t o r e s c u e p l a n#c h a r =175 ,185 > , nwr : dutch−w i k i n e w s / 1 3 1 4 3 W h i t e H o u s e c o n s i d e r i n g a u t o r e s c u e p l a n#c h a r =359 ,369 > , nwr : dutch−w i k i n e w s / 1 3 1 4 3 W h i t e H o u s e c o n s i d e r i n g a u t o r e s c u e p l a n#c h a r =864 ,874 > ; skos : prefLabel ” Witte H u i s ” , ” Casa B i a n c a ” . Figure 56: RDF-TRiG representation of entities merged from English, Spanish, Italian and Dutch Wikinews NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 129/148 i l i : i l i −30−00634472−v a sem : Event , n w r o n t o l o g y : s o u r c e E v e n t , f n : C o m i n g t o b e l i e v e , f n : R e a s o n i n g ; rdfs : label ” c o n c l u i r ” , ” concluderen ” , ” concludere ” , ” reason ” , ” conclude ” ; g a f : denotedBy nwr : s p a n i s h −w i k i n e w s / 1 2 0 4 7 G o v e r n m e n t A c c o u n t a b i l i t y . t x t#c h a r =875 ,883 > , nwr : i t a l i a n −w i k i n e w s /25429 − t e x t p r o . t x t . txp#c h a r =52,60> , nwr : i t a l i a n −w i k i n e w s /25429 − t e x t p r o . t x t . txp#c h a r =239 ,250 > , nwr : i t a l i a n −w i k i n e w s /10246 − t e x t p r o . t x t . txp#c h a r =1025 ,1035 > , nwr : dutch−w i k i n e w s / 1 2 0 4 7 G o v e r n m e n t A c c o u n t a b i l i t y O f f i c e r e q u e s t s r e r u n o f U S A i r F o r c e t a n k e r b i d# c h a r =850 ,863 > , nwr : e n g l i s h −w i k i n e w s / I n d o n e s i a ’ s t r a n s p o r t m i n i s t e r t e l l s a i r l i n e s n o t t o b u y E u r o p e a n a i r c r a f t d u e t o E U b a n#c h a r =1801 ,1807 > , nwr : e n g l i s h −w i k i n e w s / G o v e r n m e n t A c c o u n t a b i l i t y O f f i c e r e q u e s t s r e r u n o f U S A i r F o r c e t a n k e r b i d# c h a r =724 ,732 > . i l i : i l i −30−01656788−v a sem : Event , f n : B u i l d i n g , f n : C r e a t i n g , n w r o n t o l o g y : c o n t e x t u a l E v e n t ; rdfs : label ” ensamblar ” , ” assemblare ” , ” s a m e n s t e l l e n ” ; g a f : denotedBy nwr : s p a n i s h −w i k i n e w s / 1 2 0 4 7 G o v e r n m e n t A c c o u n t a b i l i t y . t x t#c h a r =481 ,490 > , nwr : s p a n i s h −w i k i n e w s / 1 1 1 6 9 Northrop Grumman . t x t#c h a r =1845 ,1856 > , nwr : i t a l i a n −w i k i n e w s /21058 − t e x t p r o . t x t . txp#c h a r =1979 ,1989 > , nwr : i t a l i a n −w i k i n e w s /25429 − t e x t p r o . t x t . txp#c h a r =461 ,471 > , nwr : dutch−w i k i n e w s / 5 1 3 5 B o e i n g d e l i v e r s f i n a l 7 1 7 t o A i r T r a n , e n d i n g D o u g l a s e r a#c h a r =487 ,494 > . i l i : i l i −30−02680814−v a sem : Event , n w r o n t o l o g y : c o n t e x t u a l E v e n t , f n : A c t i v i t y s t o p , e s o : S t o p p i n g A n A c t i v i t y , fn : P r o c e s s s t o p , fn : Quitting , eso : LeavingAnOrganization , fn : Halt ; rdfs : label ” c e a s e ” , ” d i s c o n t i n u e ” , ” o p h e f f e n ” , ” c e s a r ” , ” op doeken ” ; g a f : denotedBy nwr : s p a n i s h −w i k i n e w s / 1 4 0 8 4 C E O o f G M o u t l i n e s p l a n . t x t#c h a r =3049 ,3054 , nwr : e n g l i s h −w i k i n e w s / C E O o f G M o u t l i n e s p l a n f o r \%22New GM\%22 a f t e r a u t o c o m p a n y d e c l a r e d b a n k r u p t c y#c h a r =2795 ,2800 , nwr : e n g l i s h −w i k i n e w s / F o r d T a u r u s t o b e r e v i v e d#c h a r =930 ,942 , nwr : e n g l i s h −w i k i n e w s / P e n s k e A u t o s e l e c t e d t o b u y G e n e r a l M o t o r s ’ S a t u r n u n i t#c h a r =444 ,456 , nwr : dutch−w i k i n e w s / 3 9 7 1 F o r d M o t o r C o m p a n y c u t t i n g 3 0 , 0 0 0 j o b s b y 2 0 1 2#c h a r =1309 ,1318 , nwr : dutch−w i k i n e w s / 1 3 7 7 4 G M a n d C h r y s l e r r e c e i v e C a n a d i a n l o a n s a m i d U S r e s t r u c t u r i n g u l t i m a t a# char =1784 ,1793. i l i : i l i −30−00156601−v−and− i l i −30−00153263−v a sem : Event , n w r o n t o l o g y : c o n t e x t u a l E v e n t , f n : C h a n g e p o s i t i o n o n a s c a l e , f n : C a u s e c h a n g e o f p o s i t i o n o n a s c a l e , e s o : I n c r e a s i n g , e s o : QuantityChange ; rdfs : label ” aumentare ” , ” i n c r e a s e ” , ” i n c r e m e n t a r ” ; g a f : denotedBy nwr : i t a l i a n −w i k i n e w s /12718 − t e x t p r o . t x t . txp#c h a r =1095 ,1104 , nwr : s p a n i s h −w i k i n e w s / 1 2 6 6 7 M a r k e t s r a l l y a s w o r l d c e n t r a l b a n k s i n f u s e c a s h . t x t#c h a r =912 ,923 , nwr : e n g l i s h −w i k i n e w s / S h a r e s w o r l d w i d e s u r g e d u e t o U S g o v e r n m e n t p l a n#c h a r =129 ,138 , nwr : e n g l i s h −w i k i n e w s / S h a r e s w o r l d w i d e s u r g e d u e t o U S g o v e r n m e n t p l a n#c h a r =503 ,511 , nwr : e n g l i s h −w i k i n e w s / M a r k e t s d o w n a c r o s s t h e w o r l d ; D o w J o n e s f a l l s b e l o w 9 ,000# c h a r =1356 ,1364 nwr : e n g l i s h −w i k i n e w s / B a n k o f A m e r i c a r e p o r t s l o s s e s o f o v e r U S $ 2 . 2 b i l l i o n#c h a r =244 ,253 , nwr : e n g l i s h −w i k i n e w s / B a n k o f A m e r i c a r e p o r t s l o s s e s o f o v e r U S $ 2 . 2 b i l l i o n#c h a r =816 ,824 , nwr : e n g l i s h −w i k i n e w s / S t o c k m a r k e t s w o r l d w i d e f a l l d r a m a t i c a l l y#c h a r =478 ,486 , nwr : e n g l i s h −w i k i n e w s / S t o c k m a r k e t s w o r l d w i d e f a l l d r a m a t i c a l l y#c h a r =1574 ,1582 , nwr : e n g l i s h −w i k i n e w s / U S s t o c k m a r k e t s h a v e t h e i r b e s t w e e k s i n c e N o v e m b e r#c h a r =446 ,454 , nwr : e n g l i s h −w i k i n e w s / U S s t o c k m a r k e t s h a v e t h e i r b e s t w e e k s i n c e N o v e m b e r#c h a r =1266 ,1275 , nwr : e n g l i s h −w i k i n e w s / R u s s i a n s t o c k m a r k e t s s u s p e n d e d a m i d m a r k e t t u r m o i l#c h a r =688 ,696 , nwr : e n g l i s h −w i k i n e w s / R u s s i a n s t o c k m a r k e t s s u s p e n d e d a m i d m a r k e t t u r m o i l#c h a r =1046 ,1054 , nwr : e n g l i s h −w i k i n e w s / W o r l d w i d e m a r k e t s f a l l p r e c i p i t o u s l y#c h a r =461 ,469 , nwr : e n g l i s h −w i k i n e w s / W o r l d w i d e m a r k e t s f a l l p r e c i p i t o u s l y#c h a r =800 ,809 , nwr : e n g l i s h −w i k i n e w s / M a r k e t s r a l l y a s w o r l d ’ s c e n t r a l b a n k s i n f u s e c a s h#c h a r = 7 1 7 , 7 2 5 . , i l i : i l i −30−01128193−v a sem : Event , n w r o n t o l o g y : c o n t e x t u a l E v e n t , i l i : i 3 9 7 0 2 , f n : P r o t e c t i n g ; rdfs : label ” p r o t e c t i o n ” , ” p r o t e c t ” , ” p r o t e g e r ” , ” beschermen ” ; g a f : denotedBy nwr : s p a n i s h −w i k i n e w s / 1 3 7 7 4 G M a n d C h r y s l e r r e c e i v e C a n a d i a n l o a n s a m i d U S r e s t r u c t u r i n g u l t i m a t a . t x t# c h a r =411 ,419 , nwr : e n g l i s h −w i k i n e w s / C E O o f G M o u t l i n e s p l a n f o r \%22New GM\%22 a f t e r a u t o c o m p a n y d e c l a r e d b a n k r u p t c y#c h a r =252 ,262 , nwr : e n g l i s h −w i k i n e w s / U S a u t o m a k e r G M r e p o r t s l o s s e s o f $ 6 b i l l i o n#c h a r =841 ,851 , nwr : e n g l i s h −w i k i n e w s / G M a n d C h r y s l e r r e c e i v e C a n a d i a n l o a n s a m i d U S r e s t r u c t u r i n g u l t i m a t a#c h a r =384 ,391 , nwr : e n g l i s h −w i k i n e w s / G M a n d C h r y s l e r r e c e i v e C a n a d i a n l o a n s a m i d U S r e s t r u c t u r i n g u l t i m a t a#c h a r =2545 ,2555 nwr : e n g l i s h −w i k i n e w s / B a r a c k O b a m a p r e s e n t s r e s c u e p l a n a f t e r G M d e c l a r a t i o n o f b a n k r u p t c y#c h a r =326 ,336 , nwr : e n g l i s h −w i k i n e w s /U . S . m a n u f a c t u r e r G e n e r a l M o t o r s s e e k s b a n k r u p t c y p r o t e c t i o n#c h a r =50 ,60 , nwr : e n g l i s h −w i k i n e w s /U . S . m a n u f a c t u r e r G e n e r a l M o t o r s s e e k s b a n k r u p t c y p r o t e c t i o n#c h a r =171 ,181 , nwr : e n g l i s h −w i k i n e w s / P e n s k e A u t o s e l e c t e d t o b u y G e n e r a l M o t o r s ’ S a t u r n u n i t#c h a r =157 ,167 , nwr : dutch−w i k i n e w s / 1 3 7 7 4 G M a n d C h r y s l e r r e c e i v e C a n a d i a n l o a n s a m i d U S r e s t r u c t u r i n g u l t i m a t a# c h a r =416 ,426 > . Figure 57: RDF-TRiG representation of events merged from English, Spanish, Italian and Dutch Wikinews NewsReader: ICT-316404 February 1, 2016 , Event Narrative Module, version 3 130/148 numbers for the other data sets, although also here the stock market has least instances and mentions. Overlap is highest for airbus and apple, up to 60%, 10 points less for gm and very low for the stock market. If we look at the Italian and Dutch results, we see that they perform very similar but for all data sets lower than Spanish, in terms of instances, mentions and coverage, except for the stock market. When averaged over the data sets, the coverage across the languages is very close. This means that the pipelines are reasonably compatible and interoperable across the languages for entity detection and linking. Table 33: DBpedia entities extracted for English, Spanish, Italian and Dutch Wikinews with proportion of coverage, measured as macro and micro coverage. I=instances, M=mentions, O=overlap, maC=macro-average over all document results, miC=microAverage over all mentions airbus apple gm stock Total English I M 157 795 96 680 118 757 5 61 376 2293 I 142 124 93 12 371 M 756 644 627 42 2069 Spanish O maC 489 51.6 424 52.1 393 35.7 2 2.8 1308 35.5 miC 61.5 62.4 51.9 3.3 44.8 I 110 91 76 77 354 M 446 490 369 202 1507 Italian O 352 344 244 23 963 maC 32.8 31.7 24.1 60.0 37.1 miC 44.3 50.6 32.2 37.7 41.2 I 121 91 82 100 394 M 557 445 540 380 1922 Dutch O 360 321 337 23 1041 maC 35.6 34.6 31.0 60.0 40.3 miC 45.3 47.2 44.5 37.7 43.7 The Tables 34, 35, 36, 37 show the top 15 entities most frequent in English with the number in Spanish, Italian and Dutch for all 4 corpora. For each entity, we show the number of mentions and the proportion of English mentions covered. If there are more mentions in Spanish, Italian or Dutch, the coverage is maximized to 100%. There are a two interesting observations to be made. First of all, United States dollar with 35, 16, 59 and 36 mentions in English across the data sets, turned out to be a systematic error in the English pipeline that is not mirrored by the other languages. The English pipeline erroneously linked mentions of the US to the dollar instead of the country. The second observation relates to the granularity of the mapping. For example in the case of the airbus data, Boeing is the most frequent entity in all 4 languages. The more specific entity Boeing Commercial Airplanes is however only detected in English and not in any of the other languages. This is due to the fact that the mappings across Wikipedia from the other language to English are at a more coarse-grained level. The example in Figure 58 shows that this is partly due to the absence of the specific link in the DBpedias of the specific languages (Italian and Spanish) or to the absence of a link from the specific page in a language to English (the Dutch example). 7.2 Crosslingual extraction of events As explained above, we represent events through the ILI-concepts that are associated with their lemmas. This approximates the representation of the concept. Furthermore in some cases, more than one concept is assigned to a single lemma. To compare such lists of concepts, we checked if there was at least one intersecting ILI-concept across events to decide on a match. We can see in Table 38 that there is a broader set of instances (I) than for entities. The proportions of matched event mentions from Spanish, Italian and Dutch to English NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 131/148 <e n t i t y i d =”e 1 4 ” t y p e=”ORGANIZATION”> <r e f e r e n c e s > <!−−B o e i n g Commercial A i r p l a n e s −−> <span> <t a r g e t i d =”t 2 3 3 ” /> <t a r g e t i d =”t 2 3 4 ” /> <t a r g e t i d =”t 2 3 5 ” /> </span> </ r e f e r e n c e s > <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f r e s o u r c e =” s p o t l i g h t v 1 ” r e f e r e n c e =”h t t p : / / i t . d b p e d i a . o r g / r e s o u r c e / B o e i n g ” c o n f i d e n c e =”1.0” r e f t y p e =” i t ” s o u r c e =” i t ”> <e x t e r n a l R e f r e s o u r c e =” w i k i p e d i a −db−i t E n ” r e f e r e n c e =”h t t p : / / d b p e d i a . o r g / r e s o u r c e / B o e i n g ” c o n f i d e n c e =”1.0” r e f t y p e =”en ” s o u r c e =” i t ” /> </ e x t e r n a l R e f > </ e x t e r n a l R e f e r e n c e s > </ e n t i t y > <e n t i t y i d =”e 2 1 ” t y p e=”ORG”> <r e f e r e n c e s > <span> <!−−B o e i n g Commercial A i r p l a n e s −−> <t a r g e t i d =”t 2 2 9 ”/> <t a r g e t i d =”t 2 3 0 ”/> <t a r g e t i d =”t 2 3 1 ”/> </span> </ r e f e r e n c e s > <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f c o n f i d e n c e =”1.0” r e f e r e n c e =”h t t p : / / e s . d b p e d i a . o r g / r e s o u r c e / B o e i n g ” r e f t y p e =” e s ” r e s o u r c e =” s p o t l i g h t v 1 ” s o u r c e =” e s ”> <e x t e r n a l R e f c o n f i d e n c e =”1.0” r e f e r e n c e =”h t t p : / / d b p e d i a . o r g / r e s o u r c e / B o e i n g ” r e f t y p e =”en ” r e s o u r c e =” w i k i p e d i a −db−esEn ” s o u r c e =” e s ”/> </ e x t e r n a l R e f > </ e x t e r n a l R e f e r e n c e s > </ e n t i t y > <e n t i t y i d =”e 2 2 ” t y p e=”ORG”> <r e f e r e n c e s > <span> <!−−B o e i n g Commercial A i r p l a n e s −−> <t a r g e t i d =” t 2 1 0 ”/> <t a r g e t i d =” t 2 1 1 ”/> <t a r g e t i d =” t 2 1 2 ”/> </span> </ r e f e r e n c e s > <e x t e r n a l R e f e r e n c e s > <e x t e r n a l R e f c o n f i d e n c e = ” 0 . 9 9 9 9 9 9 6 4 ” r e f e r e n c e =”h t t p : / / n l . d b p e d i a . o r g / r e s o u r c e / B o e i n g C o m m e r c i a l A i r p l a n e s ” r e f t y p e =” n l ” r e s o u r c e =” s p o t l i g h t v 1 ” s o u r c e =” n l ”/> </ e x t e r n a l R e f e r e n c e s > </ e n t i t y > Figure 58: Cross-lingual entity linking NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 132/148 Table 34: DBpedia entities in the Wikinews Airbus corpus most frequent in English with Spanish, Italian and Dutch frequencies Airbus Boeing Airbus United States dollar European Union Boeing Commercial Airplanes Boeing 787 Dreamliner United States Air Force Singapore France Airbus A320 family Ryanair Aeroflot Government Accountability Office Aer Lingus Boeing 747 English 131 85 35 34 33 29 29 18 16 15 14 13 12 12 11 Spanish 136 100.00 116 100.00 0 0.00 17 50.00 0 0.00 12 41.38 20 68.97 8 44.44 14 87.50 3 20.00 12 85.71 15 100.00 0 0.00 9 75.00 0 0.00 Italian 103 78.63 75 88.24 0 0.00 12 35.29 0 0.00 3 10.34 4 13.79 9 50.00 10 62.50 3 20.00 10 71.43 9 69.23 1 8.33 11 91.67 2 18.18 Dutch 78 59.54 74 87.06 0 0.00 20 58.82 0 0.00 10 34.48 18 62.07 8 44.44 10 62.50 4 26.67 10 71.43 8 61.54 0 0.00 9 75.00 0 0.00 Table 35: DBpedia entities in the Wikinews Apple corpus most frequent in English with Spanish, Italian and Dutch frequencies Apple Apple Inc. Steve Jobs Steve Waugh The Beatles United States dollar Intel Cisco Systems Microsoft James Cook Mac OS X Lion United Kingdom Motorola Software development kit IBM Apple Worldwide Developers Conference NewsReader: ICT-316404 English 312 49 27 22 16 16 14 12 10 10 8 8 8 8 7 Spanish 240 76.92 21 42.86 0 0.00 9 40.91 0 0.00 11 68.75 9 64.29 3 25.00 4 40.00 9 90.00 6 75.00 4 50.00 4 50.00 10 100.00 5 71.43 Italian 218 69.87 35 71.43 0 0.00 2 9.09 0 0.00 10 62.50 7 50.00 0 0.00 3 30.00 2 20.00 3 37.50 1 12.50 0 0.00 6 75.00 0 0.00 Dutch 179 57.37 31 63.27 0 0.00 7 31.82 1 6.25 4 25.00 8 57.14 4 33.33 1 10.00 0 0.00 8 100.00 1 12.50 5 62.50 5 62.50 2 28.57 February 1, 2016 Event Narrative Module, version 3 133/148 Table 36: DBpedia entities in the Wikinews GM, Chrysler, Ford corpus most frequent in English with Spanish, Italian and Dutch frequencies GM General Motors Ford Motor Company Chrysler United States dollar Fiat Ford Motor Company of Australia United Auto Workers Daimler AG Barack Obama United States Henderson, Nevada Federal government of the United States Canada Toyota Clarence Thomas English 155 81 76 59 30 22 21 21 16 15 14 13 12 11 9 Spanish 143 92.26 71 87.65 31 40.79 0 0.00 21 70.00 0 0.00 0 0.00 3 14.29 8 50.00 70 100.00 0 0.00 0 0.00 10 83.33 11 100.00 0 0.00 Italian 107 69.03 49 60.49 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 3 14.29 3 18.75 54 100.00 0 0.00 0 0.00 6 50.00 6 54.55 0 0.00 Dutch 119 76.77 50 61.73 43 56.58 0 0.00 0 0.00 0 0.00 0 0.00 13 61.90 10 62.50 107 100.00 0 0.00 0 0.00 21 100.00 8 72.73 0 0.00 Table 37: DBpedia entities in the Wikinews stock market corpus most frequent in English with Spanish, Italian and Dutch frequencies Stock United States dollar United States United Kingdom FTSE 100 Index Andy Kaufman Washington, D.C. Buenos Aires United States House of Representatives Dow Jones Industrial Average Reuters JPMorgan Chase Afghanistan Ben Bernanke State (polity) France NewsReader: ICT-316404 English 36 14 5 4 2 0 0 0 0 0 0 0 0 0 0 Spanish 0 0.00 2 14.29 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 1 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 29 11 4 0 2 0 0 9 2 1 1 0 1 0 Italian 0.00 100.00 100.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0 90 11 12 0 1 1 1 29 1 0 1 1 0 3 Dutch 0.00 100.00 100.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 February 1, 2016 Event Narrative Module, version 3 134/148 are just a little bit lower than for DBpedia entities but not much. This is promising since matching events is more difficult than matching entities. We see again that Dutch scores a bit lower than Spanish and Italian. This is due to the fact that the Spanish and Italian wordnets are a direct extension of the English wordnet and have been developed for many years, whereas the Open Dutch wordnet is recently built and partly built independently. Across the different data sets, the results are very similar. Table 38: ILI-based events extracted for English, Spanish, Italian and Dutch Wikinews with proportion of coverage, measured as macro and micro coverage. I=instances, M=mentions, O=overlap, maC=macro-average over all document results, miC=microAverage over all mentions airbus apple gm stock Total English I M 365 848 342 1007 319 1140 283 673 1309 3668 I 166 152 142 140 600 M 484 476 387 325 1672 Spanish O 217 242 257 163 879 maC 26.4 25.8 24.7 28.6 26.4 miC 25.6 24.0 22.5 24.2 24.1 I 535 500 504 450 1989 M 984 1090 1055 895 4024 Italian O 248 202 209 163 822 maC 33.3 29.8 33.1 34.0 32.6 miC 29.3 20.1 18.3 24.2 23.0 I 199 170 170 147 686 M 483 498 622 362 1965 Dutch O 164 170 192 92 618 maC 20.0 19.1 20.4 17.9 19.3 miC 19.3 16.9 16.8 13.7 16.7 In the Tables 39, 40, 41, 42 we show the ILI-based events that are most frequent in English for the 4 corpora. If there is more than one ILI-record assigned, we only list the synonyms for the first synset. For individual events, the results vary a lot across the different languages. There does not appear to be any pattern in this. A typical case is represented by ili-30-02207206-v[buy] in Table 42, which has a good match in Spanish, only 1 in Italian and 0 in Dutch. The Dutch equivalent kopen is linked to a hypernym of buy and the Italian equivalent acquistare is linked to another meaning. Table 39: ILI-based events in the Wikinews Airbus corpus most frequent in English with Spanish, Italian and Dutch frequencies Airbus ili-30-01438304-v[deliver] ili-30-02204692-v[have] ili-30-00764222-v;ili-30-02657219-v;ili-30-00805376-v[agree] ili-30-00974367-v;ili-30-00975427-v[announce] ili-30-02207206-v[buy] ili-30-01653442-v[construct] ili-30-00755745-v;ili-30-00719734-v[ask;expect] ili-30-02413480-v;ili-30-02410855-v[work] ili-30-00705227-v[be after] ili-30-02257767-v;ili-30-00162688-v[interchange;replace] ili-30-02244956-v;ili-30-02242464-v[deal;sell] ili-30-00998399-v[record] ili-30-02641957-v;ili-30-00459776-v[delay] ili-30-01955984-v;ili-30-01957529-v;ili-30-02102398-v;ili-30-01847676-v[ride] ili-30-01583142-v;ili-30-01654628-v[construct;build] 7.3 English 17 17 16 15 14 14 13 12 12 10 9 8 8 8 8 3 0 0 2 16 2 0 0 3 7 5 0 7 0 4 Spanish 17.65 0.00 0.00 13.33 100.00 14.29 0.00 0.00 25.00 70.00 55.56 0.00 87.50 0.00 50.00 7 3 1 3 3 0 1 1 0 0 1 1 5 0 6 Italian 41.18 17.65 6.25 20.00 21.43 0.00 7.69 8.33 0.00 0.00 11.11 12.50 62.50 0.00 75.00 9 0 0 14 4 0 0 2 0 0 5 0 0 0 0 Dutch 52.94 0.00 0.00 93.33 28.57 0.00 0.00 16.67 0.00 0.00 55.56 0.00 0.00 0.00 0.00 Crosslingual extraction of relations Finally, we compared the actual triples extracted across the languages. The triples represent the actual statements, where we only consider triples where the ILI-based event is the subject. Table 43 gives the predicates that are most frequent in the English data. We limited ourselves here to the generic SEM predicates (hasActor, hasTime and hasPlace) as well as the more specific temporal relations added in NewsReader and the most -frequent NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 135/148 Table 40: ILI-based events in the Wikinews Apple corpus most frequent in English with Spanish, Italian and Dutch frequencies Apple ili-30-01224744-v;ili-30-01525666-v[control;function] ili-30-00674607-v[choose] ili-30-02204692-v[have] ili-30-02421374-v[free] ili-30-00974367-v;ili-30-00975427-v[announce] ili-30-02244956-v;ili-30-02242464-v[deal;sell] ili-30-00721889-v;ili-30-02351010-v[price] ili-30-02630189-v[feature] ili-30-00933821-v[break] ili-30-01642437-v[innovate] ili-30-02735282-v;ili-30-02501278-v;ili-30-01486312-v[suit;adjudicate;case] ili-30-00341917-v;ili-30-02743921-v;ili-30-01849221-v[come;come up] ili-30-00802318-v[allow] ili-30-00756338-v[claim] ili-30-00515154-v[process] English 64 48 35 34 34 20 19 16 15 15 15 14 12 12 12 0 1 2 0 30 22 0 0 0 0 0 6 10 0 0 Spanish 0.00 2.08 5.71 0.00 88.24 100.00 0.00 0.00 0.00 0.00 0.00 42.86 83.33 0.00 0.00 2 0 2 3 10 3 0 45 1 0 0 1 0 0 0 Italian 3.13 0.00 5.71 8.82 29.41 15.00 0.00 100.00 6.67 0.00 0.00 7.14 0.00 0.00 0.00 2 0 0 0 17 17 0 0 14 2 0 0 0 0 1 Dutch 3.13 0.00 0.00 0.00 50.00 85.00 0.00 0.00 93.33 13.33 0.00 0.00 0.00 0.00 8.33 Table 41: ILI-based events in the Wikinews GM, Chrysler, Ford corpus most frequent in English with Spanish, Italian and Dutch frequencies GM ili-30-00674607-v;ili-30-00679389-v[choose] ili-30-02244956-v;ili-30-02242464-v[deal;sell] ili-30-01621555-v;ili-30-01640207-v;ili-30-01753788-v;ili-30-01617192-v[create] ili-30-00705227-v[be after] ili-30-02204692-v[have] ili-30-02511551-v[order] ili-30-00561090-v[cut] ili-30-00974367-v;ili-30-00975427-v[announce] ili-30-02410175-v[keep on] ili-30-02324182-v[lend] ili-30-02547586-v[aid] ili-30-00358431-v;ili-30-00354845-v[buy the farm;die] ili-30-01182709-v;ili-30-02327200-v[provide;furnish] ili-30-02613487-v;ili-30-02297142-v[offer up;proffer] ili-30-02207206-v[buy] English 153 60 36 32 26 25 24 24 23 18 16 15 12 12 11 2 13 1 8 0 0 45 19 0 0 4 0 4 7 8 Spanish 1.31 21.67 2.78 25.00 0.00 0.00 100.00 79.17 0.00 0.00 25.00 0.00 33.33 58.33 72.73 2 3 1 0 7 0 0 4 0 0 4 0 6 2 1 Italian 1.31 5.00 2.78 0.00 26.92 0.00 0.00 16.67 0.00 0.00 25.00 0.00 50.00 16.67 9.09 2 29 2 0 0 0 0 22 0 0 3 0 5 0 0 Dutch 1.31 48.33 5.56 0.00 0.00 0.00 0.00 91.67 0.00 0.00 18.75 0.00 41.67 0.00 0.00 Table 42: ILI-based events in the Wikinews stock market corpus most frequent in English with Spanish and Dutch frequencies Stock ili-30-02244956-v;ili-30-02242464-v[deal;sell] ili-30-01307142-v;ili-30-00356649-v[even out;level off] ili-30-02204692-v[have] ili-30-00658052-v;ili-30-00660971-v[grade;rate] ili-30-00153263-v;ili-30-00156601-v[increase] ili-30-02324182-v[lend] ili-30-00974367-v;ili-30-00975427-v[announce] ili-30-02000868-v;ili-30-00589738-v;ili-30-02445925-v;ili-30-01998432-v[follow;be] ili-30-01645601-v[cause] ili-30-00721889-v;ili-30-02351010-v[price] ili-30-00998399-v[record] ili-30-02678438-v[concern] ili-30-02259005-v;ili-30-02260085-v[swap;trade in] ili-30-01778568-v;ili-30-01780434-v;ili-30-01780202-v[fear;dread] ili-30-00352826-v]ili-30-01620854-v[end] ili-30-02421374-v[free] NewsReader: ICT-316404 English 40 26 21 18 14 13 12 10 10 10 9 9 9 8 8 7 4 0 1 0 5 0 12 0 2 0 12 0 1 4 2 1 Spanish 10.00 0.00 4.76 0.00 35.71 0.00 100.00 0.00 20.00 0.00 100.00 0.00 11.11 50.00 25.00 14.29 1 0 0 0 3 0 3 1 1 0 6 0 0 6 2 0 Italian 2.50 0.00 0.00 0.00 21.43 0.00 25.00 10.00 10.00 0.00 66.67 0.00 0.00 75.00 25.00 0.00 5 0 0 0 2 0 13 0 0 0 0 0 0 3 1 0 Dutch 12.50 0.00 0.00 0.00 14.29 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 37.50 12.50 0.00 February 1, 2016 Event Narrative Module, version 3 136/148 PropBank relations. The hasActor, hasTime and hasPlace generalize over the others. We can see that Spanish scores a little better for hasActor than Italian and Dutch except for the stock market data set where the Italian system has a coverage of almost 84 which is twice as high as for the other data sets. In the case of airbus and apple, the Italian pipeline scores high for A2 in comparison to the others but for stock market it is A0 and A1 that score high. For Spanish, we see that results are more consistent but with a high score for A1 in the apple data set. Dutch score lower on the actors overall except for A0 in the stock market data set. The Dutch pipeline is apparently very successful in recovering locations compared to the others, where the Italian pipeline is successful in recovering temporal relations. Table 43: Triple predicates that are most frequent in the English Wikinews corpus with coverage in Spanish, Italian and Dutch Role Airbus Apple GM Stock A0 A1 A2 AM-LOC hasActor hasAtTime hasFutureTime hasPlace hasTime A0 A1 A2 AM-LOC hasActor hasAtTime hasFutureTime hasPlace hasTime A0 A1 A2 AM-LOC hasActor hasAtTime hasFutureTime hasPlace hasTime A0 A1 A2 AM-LOC hasActor hasAtTime hasPlace hasTime English 343 388 97 37 857 1494 74 51 1568 282 248 68 21 608 1809 51 22 1860 307 330 87 29 734 1580 101 29 1681 337 1043 283 40 1714 1700 42 1732 Spanish 112 32.65 216 55.67 64 65.98 0 0.00 528 61.61 852 57.03 0 0.00 0 0.00 852 54.34 113 40.07 207 83.47 48 70.59 0 0.00 471 77.47 1021 56.44 0 0.00 0 0.00 1021 54.89 105 34.20 182 55.15 51 58.62 0 0.00 421 57.36 768 48.61 0 0.00 0 0.00 768 45.69 162 48.07 575 55.13 89 31.45 0 0.00 950 55.43 977 57.47 0 0.00 977 56.41 Italian 140 40.82 178 45.88 91 93.81 2 5.41 410 47.84 1326 88.76 170 100.00 2 3.92 1496 95.41 122 43.26 165 66.53 55 80.88 7 33.33 342 56.25 1321 73.02 144 100.00 7 31.82 1465 78.76 101 32.90 140 42.42 46 52.87 3 10.34 287 39.10 1371 86.77 196 100.00 3 10.34 1567 93.22 435 100.00 851 81.59 153 54.06 4 10.00 1439 83.96 1335 78.53 4 9.52 1443 83.31 151 193 40 38 403 921 0 42 921 90 162 32 29 299 968 0 32 968 119 145 22 33 303 916 0 34 916 371 470 63 52 967 875 56 875 Dutch 44.02 49.74 41.24 100.00 47.02 61.65 0.00 82.35 58.74 31.91 65.32 47.06 100.00 49.18 53.51 0.00 100.00 52.04 38.76 43.94 25.29 100.00 41.28 57.97 0.00 100.00 54.49 100.00 45.06 22.26 100.00 56.42 51.47 100.00 50.52 Table 44 shows the coverage results for the actual triples that relate ILI-based events with entities through the above predicates. We only considered the hasActor and hasPlace relations. Obviously, the coverage is low since this is a very difficult task: all 3 elements need to match exactly. Spanish results are on average 3%, and Italian and Dutch less than 1%. Since triples usually are mentioned only once and at most a few times in the corpora (30 articles only), it makes no sense to show frequency tables of triples. In Figure 59, we give some examples of triples shared by all 4 languages. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 137/148 Table 44: ILI-based Triples extracted for English, Spanish, Italian and Dutch Wikinews with proportion of coverage, measured as macro and micro coverage. I=instances, M=mentions, O=overlap, maC=macro-average over all document results, miC=microAverage over all mentions airbus apple gm stock Total 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 English I M 775 775 525 525 647 647 1463 1473 853 3410 I 369 292 262 497 355 M 369 292 262 500 1420 Spanish O maC 24 3.1 32 6.1 20 3.1 0 0 1423 3.1 miC 3.1 6.1 3.1 0 3.1 I 390 312 273 1312 572 M 390 312 273 1326 2287 Italian O 7 7 1 0 2301 maC 1.0 1.3 0.2 0 0.6 miC 0.9 1.3 0.2 0 0.6 I 381 251 273 773 420 M 381 251 273 801 1678 Dutch O 6 8 2 0 1706 maC 0.8 1.5 0.3 0 0.7 Triples in a l l 4 languages i l i −30−00975427−v ; i l i −30−00974367−v [ announce ] : h a s A c t o r : B o e i n g i l i −30−00975427−v ; i l i −30−00974367−v [ announce ] : h a s A c t o r : A i r b u s i l i −30−02646757−v ; i l i −30−02207206−v [ buy ] : h a s A c t o r : E u r o p e a n U n i o n i l i −30−00761713−v [ n e g o c i a t e ] : h a s A c t o r : A e r o f l o t i l i −30−02244956−v ; i l i −30−02242464−v [ d e a l ; s e l l ] : h a s A c t o r : A i r b u s i l i −30−00634472−v [ c o n c l u d e ] : h a s A c t o r : B o e i n g i l i −30−00882948−v ; i l i −30−00875141−v [ commend ; a d v o c a t e ] : h a s A c t o r : A i r b u s i l i −30−00354845−v ; i l i −30−00358431−v [ d i e ; b u y t h e f a r m ] : h a s A c t o r : S t e v e J o b s i l i −30−01734502−v ; i l i −30−00246217−v [ d u p l i c a t e ; d o u b l e ] : h a s A c t o r : A p p l e I n c . i l i −30−00975427−v ; i l i −30−00974367−v [ announce ] : h a s A c t o r : S t a r b u c k s i l i −30−00975427−v ; i l i −30−00974367−v ; i l i −30−00820801−v ; i l i −30−01010118−v [ announce ; d e c l a r e ] : hasActor : U n i t e d S t a t e s i l i −30−01182709−v ; i l i −30−02327200−v ; i l i −30−02479323−v [ p r o v i d e ; f u r n i s h ; i s s u e ] : hasActor : General Motors i l i −30−00975427−v ; i l i −30−00974367−v ; i l i −30−00820801−v ; i l i −30−01010118−v [ announce ; d e c l a r e ] : h a s A c t o r : Ford Motor Company i l i −30−02244956−v ; i l i −30−02242464−v [ d e a l ; s e l l ] : h a s A c t o r : Opel i l i −30−00975427−v ; i l i −30−00974367−v ; i l i −30−00820801−v ; i l i −30−01010118−v [ announce ; d e c l a r e ] : hasActor : General Motors Figure 59: Identical triples across different languages 7.4 Conclusions We described the results of cross-lingual semantic processing of text. To our knowledge, there is no other system that can perform such a task. Being able to merge the interpretation of text across language is a big achievement and it shows the opportunities for interoperability of the NewsReader system. We have also seen that for most data types coverage still leaves room for improvement. We have also seen that differences in implementation have an impact on the comparability. Spanish results are more closer to English because most of the NLP modules for English and Spanish are developed by the same group, whereas the Dutch and Italian pipelines are mostly based on different software. That does not mean that the output of the Spanish software is better than the Dutch software. It only means that it is more compatible. For a qualitative evaluation, we need to use the cross-lingual annotation of the Wikinews. This is reported in Agerri et al. (2015). NewsReader: ICT-316404 February 1, 2016 miC 0.8 1.5 0.3 0 0.7 Event Narrative Module, version 3 8 138/148 Conclusions In this deliverable, we described the final project results on event modelling, as part of WP5 activities. We explained in detail the conversion process from NAF to SEM representation according to a batch and streaming architecture. This process explains how we get from text to RDF specifications of textual content. The core problem here is event-coreference. We describe the different approaches implemented and the evaluations on cross-document coreference on the ECB+ data set. The NAF2SEM process resolving event-coreference has been applied to over 3 million car documents generating more than half a billion triples. To move beyond event structures, we need to relate events to time and to each other. We describe our modules for extracting event relations, one for temporal relations and one for causal relations. The output of these modules can be used to create timelines, for which we organised a SemEval task in 2015 with evaluation results. Timelines form the basis for creating Storylines. Our approach has been presented at the ACL workshop on this topic that we organized as well in 2015. Not all events are real-world events. Many expressions in news reflect perspectives on real-world events. We explained our perspective module that takes various NAF layers as input to model the attribution relation of sources with respect to their beliefs and opinions. Finally, we reported the results on the cross-lingual processing of news document, obtained by comparing generated RDF-TRiG files for the Wikinews corpora for English, Spanish, Italian and Dutch. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 9 139/148 Appendix Table 45: FrameNet frames for contextualEvents Absorb heat Abundance Abusing Adding up Adjusting Adorning Aging Amalgamation Amounting to Apply heat Arranging Arriving Assemble Assistance Attaching Attack Avoiding Becoming a member Becoming detached Behind the scenes Being attached Being employed Being in category Being in operation Being located Body movement Breathing Bringing Building Bungling Catastrophe Cause change Cause change of consistency Cause change of phase Cause change of position on a scale Cause change of strength Cause expansion Cause fluidic motion Cause harm Cause impact Cause motion Cause temperature change Cause to amalgamate Cause to be dry Cause to be sharp Cause to be wet Cause to experience Cause to fragment Cause to make noise Cause to make progress NewsReader: ICT-316404 Cause to move in place Cause to start Cause to wake Change direction Change event duration Change event time Change of consistency Change of leadership Change of phase Change operational state Change position on a scale Change posture Change tool Closure Collaboration Colonization Come together Coming up with Commerce buy Commerce collect Commerce pay Commerce sell Compatibility Competition Compliance Conquering Cooking creation Corroding Corroding caused Cotheme Create physical artwork Create representation Creating Cure Cutting Damaging Daring Death Defend Delivery Departing Destroying Detaching Dimension Dispersal Dodging Dressing Duplication Earnings and losses Eclipse Education teaching Elusive goal Emitting Employing Emptying Escaping Evading Examination Exchange Exchange currency Exclude member Excreting Expansion Expensiveness Experience bodily harm Experiencer obj Filling Fining Firing Fleeing Fluidic motion Forging Forming relationships Friction Frugality Gathering up Getting Getting up Giving Grinding Grooming Hiding objects Hiring Hit target Holding off on Hostile encounter Imitating Immobilization Impact Imprisonment Inchoative attaching Inchoative change of temperature Ingest substance Ingestion Inspecting Installing Institutionalization Intentional traversing Intentionally create Kidnapping Killing Knot creation Leadership Light movement Limiting Location of light Locative relation Make acquaintance Make noise Manipulate into doing Manipulation Manufacturing Mass motion Motion Motion directional Motion noise Moving in place Operate vehicle Operational testing Path shape Perception Personal relationship Piracy Placing Posture Precipitation Preserving Processing materials Prohibiting Provide lodging Quarreling Quitting Quitting a place Reading Receiving Recording Recovery Rejuvenation Releasing Removing Render nonfunctional Renting Renting out Replacing Reshaping Residence Resolve problem Resurrection Revenge Rewards and punishments Ride vehicle Robbery Rope manipulation Rotting Scouring Scrutiny Seeking Self motion Sending Separating Setting fire Shoot projectiles Shopping Sign agreement Similarity Sleep Smuggling Soaking Social event Sound movement Storing Supply Surpassing Surviving Take place of Taking Text creation Theft Translating Travel Traversing Undergo change Undressing Use firearm Visiting Waiting Waking up Wearing Weather Execution Inhibit movement Proliferating in number February 1, 2016 Event Narrative Module, version 3 140/148 Table 46: FrameNet from for sourceEvents Achieving first Adding up Adducing Agree or refuse to act Appointing Attempt suasion Bail decision Be in agreement on assessment Be translation equivalent Become silent Behind the scenes Being named Body movement Bragging Categorization Chatting Choosing Claim ownership Coming up with Commitment Communicate categorization Communication Communication manner Communication means Communication noise Communication response Compatibility Complaining Compliance Confronting problem Contacting Criminal investigation Deny permission Deserving Discussion Distinctiveness Encoding Eventive cognizer affecting Evidence Experiencer obj Expressing publicly Forgiveness Gesture Grant permission Have as translation equivalent Heralding Imposing obligation Judgment Judgment communication Judgment direct address Justifying Labeling Linguistic meaning Make agreement on action Make noise Making faces Manipulate into doing Motion noise Name conferral Notification of charges Omen Pardon Predicting Prevarication Prohibiting Questioning Referring by name Regard Reporting Request Respond to proposal Reveal secret Rite Seeking Sign Silencing Simple naming Speak on topic Spelling and pronouncing Statement Suasion Subjective influence Successfully communicate message Talking into Telling Text creation Verdict Appearance Categorization Chemical-sense description Locating Perception active Perception body Perception experience Seeking Trust Adopt selection Assessing Awareness Becoming aware Categorization Cause emotion Certainty Choosing Cogitation Coming to believe Daring Desiring Differentiation Emotion active Estimating Expectation Experiencer focus Experiencer obj Familiarity Feeling Feigning Grasp Importance Judgment Occupy rank Opinion Partiality Place weight on Preference Purpose Reliance Scrutiny Seeking Taking sides Topic Table 47: FrameNet frames for grammaticalEvents Accomplishment Achieving first Activity finish Activity ongoing Activity prepare Activity start Activity stop Amassing Arriving Assistance Attempt Avoiding Becoming Birth Causation Cause change Cause to continue Cause to end Coming to be Containing Cooking creation Cotheme Creating Departing Detaining Dough rising Emanating Event Evidence Execute plan NewsReader: ICT-316404 Existence Experiencer obj Grant permission Halt Have as requirement Hindering Holding off on Inclusion Influence of event on cognizer Intentionally act Intentionally affect Launch process Left to do Manipulate into doing Manufacturing Motion Operating a system Permitting Possession Preventing Process continue Process end Process resume Process start Process stop Reasoning Relative time Remainder Ride vehicle Self motion Setting fire Setting out Sidereal appearance State continue Storing Success or failure Successful action Taking Taking time Thriving Thwarting Topic Undergo change Using February 1, 2016 Event Narrative Module, version 3 141/148 References Rodrigo Agerri, Josu Bermudez, and German Rigau. IXA pipeline: Efficient and Ready to Use Multilingual NLP tools. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), 2014. 00013. Rodrigo Agerri, Itziar Aldabe, Zuhaitz Beloki, Egoitz Laparra, German Rigau, Aitor Soroa, Marieke van Erp, Antske Fokkens, Filip Ilievski, Ruben Izquierdo, Roser Morante, and Piek Vossen. Event detection, version 2. NewsReader Deliverable 4.2.3, 2015. Amit Bagga and Breck Baldwin. Algorithms for scoring coreference chains. In Proceedings of the International Conference on Language Resources and Evaluation (LREC), 1998. Collin F. Baker, Charles J. Fillmore, and John B. Lowe. The Berkeley FrameNet project. In COLING-ACL ’98: Proceedings of the Conference, pages 86–90, Montreal, Canada, 1998. Mieke Bal. Narratology: Introduction to the theory of narrative. University of Toronto Press, 1997. Cosmin Adrian Bejan and Sanda Harabagiu. Unsupervised event coreference resolution with rich linguistic features. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, 2010. Cosmin Adrian Bejan and Sanda Harabagiu. Unsupervised event coreference resolution with rich linguistic features. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, 2010. Steven Bird, Ewan Klein, and Edward Loper. Natural Language Processing with Python: http://nltk.org/book. O’Reilly Media Inc., 2009. Anders Björkelund, Love Hafdell, and Pierre Nugues. Multilingual semantic role labeling. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task, CoNLL ’09, pages 43–48, Boulder, Colorado, USA, 2009. Eduardo Blanco and Dan Moldovan. Leveraging verb-argument structures to infer semantic relations. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pages 145–154, Gothenburg, Sweden, 2014. David M Blei and Peter I Frazier. Distance dependent chinese restaurant processes. The Journal of Machine Learning Research, 12:2461–2488, 2011. Francis Bond, Piek Vossen, John P. McCrae, and Christiane Fellbaum. Cili: the collaborative interlingual index. Proceedings of the Eighth meeting of the Global WordNet Conference (GWC 2016), Bucharest, 2016. Jerome S Bruner. Acts of meaning. Harvard University Press, 1990. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 142/148 Tommaso Caselli, Antske Fokkens, Roser Morante, and Piek Vossen. Spinoza vu: An nlp pipeline for cross document timelines. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pages 787–791, Denver, Colorado, June 2015. Association for Computational Linguistics. Tommaso Caselli, Marieke van Erp, Anne-Lyse Minard, Mark Finlayson, Ben Miller, Jordi Atserias, Alexandra Balahur, and Piek Vossen, editors. Proceedings of the First Workshop on Computing News Storylines. Association for Computational Linguistics, Beijing, China, July 2015. Tommaso Caselli, Piek Vossen, Marieke van Erp, Antske Fokkens, Filip Ilievski, Ruben Izquierdo Bevia, Minh Le, Roser Morante, and Marten Postma. When it’s all piling up: investigating error propagation in an nlp pipeline. In WNACP2015, 2015. Taylor Cassidy, Bill McDowell, Nathanael Chambers, and Steven Bethard. An annotation framework for dense event ordering. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 501–506, Baltimore, Maryland, June 2014. Association for Computational Linguistics. Nathanael Chambers, Taylor Cassidy, Bill McDowell, and Steven Bethard. Dense event ordering with a multi-pass architecture. Transactions of the Association for Computational Linguistics, 2:273–284, 2014. Nate Chambers. Navytime: Event and time ordering from raw text. In Proceedings of the Seventh International Workshop on Semantic Evaluation, SemEval ’13, pages 73–77, Atlanta, Georgia, USA, 2013. Zheng Chen and Heng Ji. Event coreference resolution: Feature impact and evaluation. In Proceedings of Events in Emerging Text Types (eETTs) Workshop, 2009. Zheng Chen and Heng Ji. Graph-based event coreference resolution. In TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing, pages 54–57, 2009. Bin Chen, Jian Su, Sinno Jialin Pan, and Chew Lim Tan. A unified event coreference resolution by integrating multiple resolvers. In Proceedings of the 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, 2011. Francesco Corcoglioniti, Marco Rospocher, Roldano Cattoni, Bernardo Magnini, and Luciano Serafini. Interlinking unstructured and structured knowledge in an integrated framework. In Proc. of 7th IEEE International Conference on Semantic Computing (ICSC), Irvine, CA, USA, 2013. (to appear). Agata Cybulska and Piek Vossen. Semantic relations between events and their time, locations and participants for event coreference resolution. In Proceedings of Recent Advances in Natural Language Processing (RANLP-2013), pages 156–163, 2013. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 143/148 Agata Cybulska and Piek Vossen. Semantic relations between events and their time, locations and participants for event coreference resolution. In Proceedings of recent advances in natural language processing, 2013. Agata Cybulska and Piek Vossen. Guidelines for ecb+ annotation of events and their coreference, 2014. Agata Cybulska and Piek Vossen. Using a sledgehammer to crack a nut? lexical diversity and event coreference resolution. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2014), 2014b. Agata Cybulska and Piek Vossen. ”bag of events” approach to event coreference resolution. supervised classification of event templates. In proceedings of the 16th Cicling 2015 (colocated: 1st International Arabic Computational Linguistics Conference), Cairo, Egypt, April 14–20 2015. Joachim Daiber, Max Jakob, Chris Hokamp, and Pablo N. Mendes. Improving efficiency and accuracy in multilingual entity extraction. In Proceedings of the 9th International Conference on Semantic Systems (I-Semantics), 2013. Günes Erkan and Dragomir R Radev. Lexrank: graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, pages 457–479, 2004. Antske Fokkens, Marieke van Erp, Piek Vossen, Sara Tonelli, Willem Robert van Hage, Luciano Serafini, Rachele Sprugnoli, and Jesper Hoeksema. GAF: A grounded annotation framework for events. In Proceedings of the first Workshop on Events: Definition, Dectection, Coreference and Representation, Atlanta, USA, 2013. Antske Fokkens, Aitor Soroa, Zuhaitz Beloki, Niels Ockeloen, German Rigau, Willem Robert van Hage, and Piek Vossen. Naf and gaf: Linking linguistic annotations. In Proceedings 10th Joint ISO-ACL SIGSEM Workshop on Interoperable Semantic Annotation, page 9, 2014. Juri Ganitkevitch, Benjamin Van Durme, and Chris Callison-Burch. PPDB: The paraphrase database. In Proceedings of NAACL-HLT, pages 758–764, Atlanta, Georgia, June 2013. Association for Computational Linguistics. Matthew Gerber and Joyce Chai. Semantic role labeling of implicit arguments for nominal predicates. Computational Linguistics, 38(4):755–798, December 2012. Paul Grice. Logic and conversation. Syntax and semantics. 3: Speech acts, pages 41–58, 1975. David Herman, Manfred Jahn, and Marie-Laure Ryan, editors. Routlege Encyclopedia of Narrative Theory. Routledge, 2010. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 144/148 Johannes Hoffart, Fabian M. Suchanek, Klaus Berberich, and Gerhard Weikum. YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia. Artif. Intell., 194:28–61, 2013. Lifu Huang and Lian’en Huang. Optimized event storyline generation based on mixtureevent-aspect model. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 726–735, Seattle, Washington, USA, October 2013. Association for Computational Linguistics. Kevin Humphreys, Robert Gaizauskas, and Saliha Azzam. Event coreference for information extraction. In Proceedings of a Workshop on Operational Factors in Practical, Robust Anaphora Resolution for Unrestricted Texts, 1997. Taku Kudo and Yuji Matsumoto. Fast Methods for Kernel-based Text Analysis. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1, ACL ’03, pages 24–31, Stroudsburg, PA, USA, 2003. Egoitz Laparra and German Rigau. Impar: A deterministic algorithm for implicit semantic role labelling. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), pages 33–41, 2013. LDC. Ldc. ace (automatic content extraction) english annotation guidelines for events ver. 5.4.3 2005.07.01., 2005. Claudia Leacock and Martin Chodorow. Combining local context with wordnet similarity for word sense identification, 1998. Heeyoung Lee, Yves Peirsman, Angel Chang, Nathanael Chambers, Mihai Surdeanu, and Dan Jurafsky. Stanford’s multi-pass sieve coreference resolution system at the conll2011 shared task. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task, CONLL Shared Task ’11, Portland, Oregon, 2011. Heeyoung Lee, Marta Recasens, Angel Chang, Mihai Surdeanu, and Dan Jurafsky. Joint entity and event coreference resolution across documents. In Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing and Natural Language Learning (EMNLPCoNLL), 2012. Zhengzhong Liu, Jun Araki, Eduard Hovy, and Teruko Mitamura. Supervised withindocument event coreference using information propagation. In Proceedings of the International Conference on Language Resources and Evaluation, 2014. Hector Llorens, Estela Saquete, and Borja Navarro. Tipsem (english and spanish): Evaluating crfs and semantic roles in tempeval-2. In Proceedings of the 5th International Workshop on Semantic Evaluation, pages 284–291. Association for Computational Linguistics, 2010. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 145/148 A. Xiaoqiang Luo, Sameer Pradhan, Marta Recasens, and Eduard Hovy. Scoring coreference partitions of predicted mentions: A reference implementation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, MD, June 2014. Xiaoqiang Luo. On coreference resolution performance metrics. In Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (EMNLP-2005), 2005. Inderjeet Mani, Marc Verhagen, Ben Wellner, Chong Min Lee, and James Pustejovsky. Machine learning of temporal relations. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, ACL-44, pages 753–760, Stroudsburg, PA, USA, 2006. Anne-Lyse Minard, Manuela Speranza, Eneko Agirre, Itziar Aldabe, Marieke van Erp, Bernardo Magnini, German Rigau, Rubén Urizar, and Fondazione Bruno Kessler. SemEval-2015 Task 4: TimeLine: Cross-Document Event Ordering. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015). Association for Computational Linguistics, 2015. Paramita Mirza and Sara Tonelli. An analysis of causality between events and its relation to temporal information. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 2097–2106, Dublin, Ireland, August 2014. Dublin City University and Association for Computational Linguistics. Paramita Mirza and Sara Tonelli. Classifying Temporal Relations with Simple Features. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pages 308–317, Gothenburg, Sweden, 2014. Paramita Mirza, Rachele Sprugnoli, Sara Tonelli, and Manuela Speranza. Annotating causality in the tempeval-3 corpus. In Proceedings of the EACL 2014 Workshop on Computational Approaches to Causality in Language (CAtoCL), pages 10–19, Gothenburg, Sweden, April 2014. Association for Computational Linguistics. Luc Moreau, Paolo Missier, Khalid Belhajjame, Reza B’Far, James Cheney, Sam Coppens, Stephen Cresswell, Yolanda Gil, Paul Groth, Graham Klyne, Timothy Lebo, Jim McCusker, Simon Miles, James Myers, Satya Sahoo, and Curt Tilmes. PROV-DM: The PROV Data Model. Technical report, W3C, 2012. Kiem-Hieu Nguyen, Xavier Tannier, and Veronique Moriceau. Ranking multidocument event descriptions for building thematic timelines. In Proceedings of COLING‘14, pages 1208–1217, 2014. Martha S. Palmer, Deborah A. Dahl, Rebecca J. Schiffman, Lynette Hirschman, Marcia Linebarger, and John Dowding. Recovering implicit information. In Proceedings of NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 146/148 the 24th annual meeting on Association for Computational Linguistics, ACL ’86, pages 10–19, New York, New York, USA, 1986. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in python. In Journal of Machine Learning Research, 12: 2825–2830, 2011. Emanuele Pianta, Christian Girardi, and Roberto Zanoli. The textpro tool suite. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), Marrakech, Morocco, may 2008. Emily Pitler and Ani Nenkova. Using syntax to disambiguate explicit discourse connectives in text. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, ACLShort ’09, pages 13–16, Stroudsburg, PA, USA, 2009. Association for Computational Linguistics. Sameer Pradhan, Lance Ramshaw, Mitchell Marcus, Martha Palmer, Ralph Weischedel, and Nianwen Xue. Conll-2011 shared task: Modeling unrestricted coreference in ontonotes. In Proceedings of CoNLL 2011: Shared Task, 2011. Rashmi Prasad, Nikhil Dinesh, Alan Lee, Eleni Miltsakaki, Livio Robaldo, Aravind Joshi, and Bonnie Webber. The penn discourse treebank 2.0. In In Proceedings of LREC, 2008. James Pustejovsky, Bob Ingria, Roser Sauri, Jose Castano, Jessica Littman, Rob Gaizauskas, Andrea Setzer, Graham Katz, and Inderjeet Mani. The specification language timeml. The language of time: A reader, pages 545–557, 2005. Willard V. Quine. Events and reification. In Actions and Events: Perspectives on the Philosophy of Davidson, pages 162–71. Blackwell, 1985. Marta Recasens and Eduard Hovy. Blanc: Implementing the rand index for coreference evaluation. In Natural Language Engineering,17, (4), pages 485–510, 2011. Marco Rospocher, Anne-Lyse Minard, Paramita Mirza, Piek Vossen, Tommaso Caselli, Agata Cybulska, Roser Morante, and Itziar Aldabe. Event narrative module, version 2. NewsReader Deliverable 5.1.1, 2015. Marco Rospocher, Marieke van Erp, Piek Vossen, Antske Fokkens, Itziar Aldabe, German Rigau, Aitor Soroa, Thomas Ploeger, and Tessel Bogaard. Building event-centric knowledge graphs from news. Journal of Web Semantics, 2016. Marie-Laure Ryan. Possible Worlds, Artificial Intelligence and Narrative Theory. Bloomington: Indian University Press, 1991. Roser Saurı́, Jessica Littman, Robert Gaizauskas, Andrea Setzer, and James Pustejovsky. TimeML Annotation Guidelines, Version 1.2.1, 2006. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 147/148 Roser Saurı́. A factuality profiler for eventualities in text. PhD thesis, Brandeis University, Waltham, MA, USA, 2008. William F Styler IV, Steven Bethard, Sean Finan, Martha Palmer, Sameer Pradhan, Piet C de Groen, Brad Erickson, Timothy Miller, Chen Lin, Guergana Savova, et al. Temporal annotation in the clinical domain. Transactions of the Association for Computational Linguistics, 2:143–154, 2014. Joel R. Tetreault. Implicit role reference. In International Symposium on Reference Resolution for Natural Language Processing, pages 109–115, Alicante, Spain, 2002. The PDTB Research Group. The PDTB 2.0. Annotation Manual. Technical Report IRCS08-01, Institute for Research in Cognitive Science, University of Pennsylvania, 2008. Sara Tonelli, Rachele Sprugnoli, Manuela Speranza, and Anne-Lyse Minard. NewsReader Guidelines for Annotation at Document Level. Technical Report NWR2014-2-2, Fondazione Bruno Kessler, 2014. http://www.newsreader-project.eu/files/2014/12/ NWR-2014-2-2.pdf. Naushad UzZaman and James Allen. Temporal evaluation. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 351–356, Portland, Oregon, USA, 2011. Naushad UzZaman, Hector Llorens, James Allen, Leon Derczynski, Marc Verhagen, and James Pustejovsky. Tempeval-3: Evaluating events, time expressions, and temporal relations. arXiv preprint arXiv:1206.5333, 2012. Naushad UzZaman, Hector Llorens, Leon Derczynski, James Allen, Marc Verhagen, and James Pustejovsky. Semeval-2013 task 1: Tempeval-3: Evaluating time expressions, events, and temporal relations. In Proceedings of the Seventh International Workshop on Semantic Evaluation, SemEval ’13, pages 1–9, Atlanta, Georgia, USA, 2013. Naushad UzZaman, Hector Llorens, Leon Derczynski, Marc Verhagen, James Allen, and James Pustejovsky. Semeval-2013 task 1: Tempeval-3: Evaluating time expressions, events, and temporal relations, 2013. Teun A. van Dijk. News As Discourse. Routledge, 1988. Marieke van Erp, Piek Vossen, Rodrigo Agerri, Anne-Lyse Minard, Manuela Speranza, Ruben Urizar, and Egoitz Laparra. Annotated data, version 2. NewsReader Deliverable 3.3.2, 2015. Willem Robert van Hage, Véronique Malaisé, Roxane Segers, Laura Hollink, and Guus Schreiber. Design and use of the Simple Event Model (SEM). J. Web Sem., 9(2):128– 136, 2011. http://dx.doi.org/10.1016/j.websem.2011.03.003. NewsReader: ICT-316404 February 1, 2016 Event Narrative Module, version 3 148/148 Marc Verhagen, Roser Sauri, Tommaso Caselli, and James Pustejovsky. Semeval-2010 task 13: Tempeval-2. In Proceedings of the 5th international workshop on semantic evaluation, pages 57–62. Association for Computational Linguistics, 2010. Marc Vilain, John Burger, John Aberdeen, Dennis Connolly, and Lynette Hirschman. A model theoretic coreference scoring scheme. In Proceedings of MUC-6, 1995. Piek Vossen, Francis Bond, and John P. McCrae. Toward a truly multilingual global wordnet grid. Proceedings of the Eighth meeting of the Global WordNet Conference (GWC 2016), Bucharest, 2016. Greg Whittemore, Melissa Macpherson, and Greg Carlson. Event-building through rolefilling and anaphora resolution. In Proceedings of the 29th annual meeting on Association for Computational Linguistics, ACL ’91, pages 17–24, Berkeley, California, USA, 1991. Phillip Wolff and Grace Song. Models of causation and the semantics of causal verbs. Cognitive Psychology, 47(3):276–332, 2003. Phillip Wolff. Representing causation. Journal of experimental psychology: General, 136(1):82, 2007. Shize Xu, Shanshan Wang, and Yan Zhang. Summarizing complex events: a cross-modal solution of storylines extraction and reconstruction. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1281–1291, Seattle, Washington, USA, October 2013. Association for Computational Linguistics. Bishan Yang, Claire Cardie, and Peter I. Frazier. A hierarchical distance-dependent bayesian model for event coreference resolution. CoRR, abs/1504.05929, 2015. NewsReader: ICT-316404 February 1, 2016