Feature Engineering for Coreference Resolution in German
Transcription
Feature Engineering for Coreference Resolution in German
Universität Stuttgart Institut für Maschinelle Sprachverarbeitung Azenbergstraße 12 D-70174 Stuttgart Diploma Thesis Feature Engineering for Coreference Resolution in German Improving the link feature set of SUCRE for German by using a more linguistic background Submitted by: Patrick Ziering Commenced: October 1, 2010 Completed: March 31, 2011 Diploma Thesis No. 97 Supervisor: Prof. Dr. Hinrich Schütze, Hamidreza Kobdani Examiner: Prof. Dr. Hinrich Schütze Declaration I declare that this thesis was composed by myself and that the work contained herein is my own, except where explicitly stated otherwise in the text. Stuttgart, March 31, 2011 Patrick Ziering Abstract This diploma thesis concerns the link feature engineering based on linguistic analysis for the coreference resolution part in the SUCRE system. The architecture of SUCRE’s coreference resolution is divided into two steps: classification and clustering. The feature research provided in this thesis modifies the input for the classifier (a decision tree classifier) and thereby indirectly takes effect on the clustering step that results in a coreference partition. The feature research includes two parts: linguistic analysis of misclassifications and implementation of new link features that model the linguistic phenomena detected in the analysis. Among others, the linguistic issues concern the indefiniteness of the anaphor, the right antecedent for a relative pronoun, German morphology (i.e. compound words and inflected nouns), non-coreferring nouns like units, currencies and the like, quantified noun phrases, semantic relatedness or appositive proper names. After the implementation of the new link features and selecting the ones that perform best, the final feature set is evaluated. Here, a clear improvement shows up. Considering the MUC-B3 -F-score (the harmonic mean of the F-scores of MUC and B3 as a trade-off of advantages and disadvantages of both), the performance increases from 67.9% in SemEval-2010 up to 73.0% with the final configuration. In detail, MUC-precision increases about 10%, but MUC-recall decreases about 6%. With respect to B3 , precision increases about 12.4% and recall decreases about 1.6%. This lack of increase of recall arises from the fact that most new implemented features are based on the analysis of false positives (rather than false negatives). However, these results show a great success that confirms this research method. There are some drawbacks in the expressiveness of the feature definition language used in SUCRE. By extending its expressiveness, further linguistic phenomena can be modeled in future and thereby further improvements of SUCRE’s performance can be achieved. Table of Contents 1 Introduction 3 2 The coreference resolution task and its progress in German 2.1 What is coreference resolution? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Coreference vs. Anaphora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 NLP tasks which use coreference resolution . . . . . . . . . . . . . . . . . . . 2.2 Detection of markables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Supervised coreference models based on machine learning . . . . . . . . . . . . . . . . 2.3.1 Mention-Pair Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Entity-Mention Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Ranking Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Unsupervised coreference resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Coreference resolution for German - approaches in the past 10 years . . . . . . . . . . . 2.5.1 Coreference Resolution with Syntactico-Semantic Rules and Corpus Statistics (Hartrumpf, 2001) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 The Influence of Minimum Edit Distance on Reference Resolution - (Strube et al., 2002) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 A Constraint-based Approach to Noun Phrase Coreference Resolution in German Newspaper Text - (Versley, 2006) . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.4 Optimization in Coreference Resolution Is Not Needed: A Nearly-Optimal Algorithm with Intensional Constraints - (Klenner and Ailloud, 2009) . . . . . . . 2.5.5 Extending BART to provide a Coreference Resolution System for German (Broscheit et al., 2010b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Evaluation scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 MUC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.2 B3 (B-Cubed) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.3 CEAF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.4 BLANC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.5 A comparative example (borrowed from (Luo, 2005)) . . . . . . . . . . . . . . . 2.7 SemEval-2010 Task 1: Coreference Resolution in Multiple Languages . . . . . . . . . . 30 33 33 35 36 36 38 44 3 The SUCRE system 3.1 The project . . . . . . . . . . . . . . . 3.2 The architecture . . . . . . . . . . . . . 3.2.1 Preprocessing . . . . . . . . . . 3.2.2 Features in SUCRE . . . . . . . 3.2.3 The Relational Database Model 3.2.4 Coreference Resolution . . . . . 3.3 Visualization with SOMs . . . . . . . . 3.3.1 Self Organizing Map . . . . . . 49 49 50 51 52 52 54 55 55 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 6 7 7 8 8 11 11 12 16 16 19 23 26 VII Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . 57 59 61 62 4 Linguistic error analysis 4.1 The initial configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 The initial results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 The link features in the prefilter . . . . . . . . . . . . . . . . . . . . . 4.1.3 The link features for the feature vectors . . . . . . . . . . . . . . . . . 4.1.4 The performance of the 40 features . . . . . . . . . . . . . . . . . . . 4.2 One problem with distance features . . . . . . . . . . . . . . . . . . . . . . . 4.3 Error analysis in false positives . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 The second markable is indefinite . . . . . . . . . . . . . . . . . . . . 4.3.2 Wrong assignment of a relative pronoun . . . . . . . . . . . . . . . . . 4.3.3 Relative proximity in context . . . . . . . . . . . . . . . . . . . . . . . 4.3.4 Reflexive pronouns and non-subjects . . . . . . . . . . . . . . . . . . . 4.3.5 Problems with substring-matches . . . . . . . . . . . . . . . . . . . . 4.3.6 “Es“(“it“) as expletive pronoun in German . . . . . . . . . . . . . . . 4.3.7 Units, currencies, month names, weekdays and the like . . . . . . . . . 4.3.8 Problems with the alias feature . . . . . . . . . . . . . . . . . . . . 4.3.9 First markable begins with “kein“ . . . . . . . . . . . . . . . . . . . . 4.3.10 Disagreement in gender and number . . . . . . . . . . . . . . . . . . . 4.4 Error analysis in false negatives . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Reflexive pronouns with non-subjects or considerable sentence distance 4.4.2 Semantic Relations between the markables . . . . . . . . . . . . . . . 4.4.3 Both markables contain a common, possibly appositive proper name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 64 64 65 65 68 71 71 72 74 77 79 80 82 84 84 85 86 89 89 90 92 5 Implementation of the features 5.1 Features for False Positives . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 The second markable is indefinite . . . . . . . . . . . . . . . . . . 5.1.2 Wrong assignment of a relative pronoun . . . . . . . . . . . . . . . 5.1.3 Reflexive pronouns and non-subjects . . . . . . . . . . . . . . . . . 5.1.4 Problems with substring-matches . . . . . . . . . . . . . . . . . . 5.1.5 “Es“(“it“) as expletive pronoun in German . . . . . . . . . . . . . 5.1.6 Units, currencies, month names, weekdays and the like . . . . . . . 5.1.7 First markable begins with “kein“ . . . . . . . . . . . . . . . . . . 5.1.8 Disagreement in gender and number . . . . . . . . . . . . . . . . . 5.2 Features for False Negatives . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Both markables contain a common, possibly appositive proper name 5.3 Features from inspirations of German approaches in (2.5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 95 95 98 100 101 104 106 107 108 111 111 112 6 Evaluation of the implemented link features 6.1 The final link feature set . . . . . . . . . . . . . . . . 6.2 The final prefilter feature set . . . . . . . . . . . . . . 6.3 Evaluation of improvement steps . . . . . . . . . . . . 6.4 The final scores . . . . . . . . . . . . . . . . . . . . . 6.5 The performance of each feature in the final feature set 6.6 Additional evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 115 117 118 120 123 123 3.4 3.5 3.6 3.3.2 Application of SOMs in coreference resolution The multi-lingual aspect in SUCRE . . . . . . . . . . Evaluation results in SemEval-2010 . . . . . . . . . . The dataset for German coreference resolution . . . . . 7 Summary and conclusions VIII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 7.1 7.2 7.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 A The Stuttgart-Tübingen tag set (STTS) B The pseudo language for SUCRE’s link feature definition B.1 Markable keywords . . . . . . . . . . . . . . . . . . . . . . B.2 Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . B.3 Arithmetic operations . . . . . . . . . . . . . . . . . . . . . B.4 Arithmetic predicates . . . . . . . . . . . . . . . . . . . . . B.5 Boolean operations . . . . . . . . . . . . . . . . . . . . . . B.6 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 137 137 138 138 138 139 C A python script for computing the BLANC-score 141 D Upper and lower bounds / evaluation results in (Versley, 2006) 143 E All link errors from Chapter 4 E.1 False positives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E.1.1 The second markable is indefinite . . . . . . . . . . . . . . . . . . . . E.1.2 Wrong assignment of a relative pronoun . . . . . . . . . . . . . . . . . E.1.3 Relative proximity in context . . . . . . . . . . . . . . . . . . . . . . . E.1.4 Reflexive pronouns and non-subjects . . . . . . . . . . . . . . . . . . . E.1.5 Problems with substring-matches . . . . . . . . . . . . . . . . . . . . E.1.6 “Es“(“it“) as expletive pronoun in German . . . . . . . . . . . . . . . E.1.7 Units, currencies, month names, weekdays and the like . . . . . . . . . E.1.8 Problems with the alias feature . . . . . . . . . . . . . . . . . . . . . . E.1.9 First markable begins with “kein“ . . . . . . . . . . . . . . . . . . . . E.1.10 Disagreement in gender and number . . . . . . . . . . . . . . . . . . . E.2 False negatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E.2.1 Reflexive pronouns with non-subjects or considerable sentence distance E.2.2 Semantic Relations between the markables . . . . . . . . . . . . . . . E.2.3 Both markables contain a common, possibly appositive proper name . . . . . . . . . . . . . . . . . 145 145 145 149 152 157 159 163 166 169 170 172 174 174 175 178 The Bell numbers B(n) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Link feature set used in (Soon et al., 2001) . . . . . . . . . . . . . . . . . . . . . . . . . Example of a feature vector in (Soon et al., 2001) . . . . . . . . . . . . . . . . . . . . . Weights and incompatiblef -values for each feature used in (Cardie and Wagstaff, 1999) F-measure results for the clusterer and some baselines on the MUC-6 datasets . . . . . . Coreference resolution results in (Hartrumpf, 2001) . . . . . . . . . . . . . . . . . . . . The initial feature set by (Strube et al., 2002) . . . . . . . . . . . . . . . . . . . . . . . The first evaluation in (Strube et al., 2002) . . . . . . . . . . . . . . . . . . . . . . . . Revision of the feature set in (Strube et al., 2002) . . . . . . . . . . . . . . . . . . . . . 6 9 9 14 15 19 21 21 22 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . List of Tables 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 IX List of Tables X 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17 2.18 2.19 2.20 2.21 2.22 2.23 The second evaluation in (Strube et al., 2002) . . . . . . . . . . . . . . . . . . . . . Performance comparison of (Strube et al., 2002) and (Versley, 2006) . . . . . . . . . The global ILP constraints in (Klenner and Ailloud, 2009) . . . . . . . . . . . . . . CEAF-Results in (Klenner and Ailloud, 2009) . . . . . . . . . . . . . . . . . . . . . Evaluation results of Broscheit et al. (2010b) . . . . . . . . . . . . . . . . . . . . . . The BLANC confusion matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The BLANC scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparison of evaluation metrics (Luo, 2005) . . . . . . . . . . . . . . . . . . . . . Corpora used for each langauge in SemEval-2010 . . . . . . . . . . . . . . . . . . . The training, development and test set of TüBa-D/Z in SemEval-2010 . . . . . . . . Comparison of architectures of BART, SUCRE, TANL-1 and UBIU in SemEval-2010 The baseline scores for German and English in SemEval-2010 . . . . . . . . . . . . Official results of SemEval-2010 for German . . . . . . . . . . . . . . . . . . . . . closed vs. open in SemEval-2010 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 26 29 30 32 37 37 44 45 46 47 47 48 48 3.1 3.2 3.3 3.4 3.5 3.6 Word Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Markable Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Link Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Three features used by Burkovski et al. (2011) . . . . . . . . . . . . . . . . . . . . Results of SUCRE and the best competitor system, TANL-1, in SemEval-2010 Task 1 The training and test set of TüBa-D/Z in this study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 53 53 57 61 62 4.1 4.2 4.3 4.4 4.5 4.6 4.7 Confusion matrix for classification judgements . . . . . . . Initial results of SUCRE . . . . . . . . . . . . . . . . . . . The usage of MUC-B3 . . . . . . . . . . . . . . . . . . . . Cumulative performance of the 40 original features . . . . . Reversed cumulative performance of the 40 original features Results of the new baseline . . . . . . . . . . . . . . . . . . Table of the possessive pronouns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 64 65 69 70 71 87 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14 5.15 5.16 5.17 5.18 5.19 5.20 5.21 Evaluation of the feature Indef1 . Evaluation of the feature Indef2 . Evaluation of the feature Indef3 . Evaluation of the feature Indef4 . Evaluation of the feature Indef5 . Evaluation of the feature Relpron1 The final set without Relpron1 . . Evaluation of the feature Relpron2 Evaluation of the feature Relpron3 Evaluation of the feature Reflex1 . The final set with Reflex1 . . . . Evaluation of the feature Substr1 . Evaluation of the feature Substr2 . Substr2 with compound words . . Evaluation of the feature Substr3 . Evaluation of the feature Substr4 . Evaluation of the feature Substr5 . Evaluation of the feature Substr6 . The final set with Substr6 . . . . Evaluation of the feature Es1 . . . Evaluation of the feature Es2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 96 97 97 98 99 99 99 100 100 100 101 102 102 102 103 104 104 104 105 105 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . List of Tables 5.22 5.23 5.24 5.25 5.26 5.27 5.28 5.29 5.30 5.31 5.32 5.33 5.34 5.35 5.36 5.37 5.38 5.39 5.40 Evaluation of the feature Es3 . . The final set without Es3 . . . . Evaluation of the feature Unit1 . Evaluation of the feature Unit2 . Evaluation of the feature Kein1 Evaluation of the feature Kein2 Evaluation of the feature Agree1 The final set without Agree1 . . Evaluation of the feature Agree2 Evaluation of the feature Agree3 The final set with Agree3 . . . . Evaluation of the feature Proper1 Evaluation of the feature Proper2 Evaluation of the feature Inspire1 The final set with Inspire1 . . . Evaluation of the feature Inspire2 The final set with Inspire2 . . . Evaluation of the feature Inspire3 Evaluation of the feature Inspire4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 106 106 107 108 108 109 109 110 110 110 111 112 113 113 113 113 113 114 6.1 6.2 6.3 6.4 6.7 6.5 6.6 6.8 The steps from the new baseline to the final feature set . . . . . Performance of the improvement steps . . . . . . . . . . . . . . The final scores . . . . . . . . . . . . . . . . . . . . . . . . . . SemEval-2010 Results - German, closed, gold vs. Final scores The performance without vector feature no. 19 . . . . . . . . . Cumulative performance of the final feature set . . . . . . . . . Reversed cumulative performance of the final feature set . . . . The performance with the final features and sentence distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 120 121 122 123 124 125 125 7.1 Runtime comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 B.1 A value table for the defined boolean operators . . . . . . . . . . . . . . . . . . . . . . 139 D.1 Upper and lower bounds / evaluation results in (Versley, 2006) . . . . . . . . . . . . . . 143 1 List of Tables 2 CHAPTER 1 Introduction Denoting any kind of entity, one could use different expressions and descriptions. Persons could be expressed by their full name or title (e.g. Bundeskanzlerin Dr. Angela Merkel), only by their last name (e.g. Merkel) or in a less formal context by their first name (e.g. Angela). It is also possible to describe people or things by their properties or functions. In the case of Angela Merkel, a definite description like die Bundeskanzlerin is also as possible as any kind of pronouns like sie, sich or ihre. These expressions that could refer to an entity are called a markable. So one can say that every markable referring to the same entity is coreferent to each other and disreferent to any other markable referring to another entity. That means that Angela Merkel is coreferent to die Bundeskanzlerin but disreferent to Barack Obama. This concept of coreference is an equivalence relation. Every group (i.e. cluster, entity) of coreferent markables constitutes an equivalence class. A pair of markables can be regarded as a link that connects either two coreferent markables or two disreferent markables. Thus, any markable pair represents a coreference link or a disreference link. Creating such clusters of coreferent markables is the task of coreference resolution. A possible partition of coreference resolution might be: {{Bundeskanzlerin Dr. Angela Merkel, Merkel, die Bundeskanzlerin, sie, . . . },{Barack Obama, er, sein, . . . }, . . . }. In the course of text comprehension the human mind makes coreference resolution on the fly within a conversation or while reading a text. Doing this task automatically reveals several problems. One way of performing coreference resolution automatically is using machine learning methods. Here, one possibility is to use a classifier which labels every given markable pair with a probability of being a coreference link. Afterwards markables connected as coreferent with a high probability can be put together using an appropriate clustering method and the coreference resolution is done. But such a resolution system brings along various issues. There is a lot of knowledge needed: Partof-Speech-tags (e.g. to identify special kinds of pronouns), grammatical information like gender or number and semantic information like semantic class (to exclude two incompatible markables as being coreferent). Such a knowledge as well as combinations of given information can be modeled as a link feature between two markables. A lot of those features describe the agreement or disagreement between the connected markables with respect to a specific atomic feature. For example, information about gender can produce a link feature like Both markables have the same gender. But one counter-example that this feature can be used to indicate disreference by mismatch might be the fact that sometimes the grammatical gender of a markable is mixed up with its natural gender. So Mädchen (the girl, neuter) can also be referred to by female pronouns like sie (she, female). Thus, it is not advisable to exclude two markables as being coreferent only because they have a different gender. This problem already shows one kind of difficulty in modeling coreference in natural language with the use of such features. By using an ordered set of link features a link can be represented as a feature vector where each component corresponds to the value of a link feature, for example boolean values like (TRUE/FALSE 3 Chapter 1. Introduction (i.e. 1 or 0)) for a link feature like Both markables have the same number. There are also numeric values conceivable for a link feature like the sentence distance between the two markables or the edit distance between the markables’ heads (i.e. the number of predefined string operations for transforming one head into the other). The SUCRE system (german: Semi-Überwachte-Ko-Referenz-Erkennung) described in chapter 3 uses such link features to train a classifier with the resulting feature vectors. The main aim of this diploma thesis is to improve the performance of SUCRE’s coreference system for German by link feature engineering. Thereby, the focus is set to the linguistic background of misclassified links. Therefore, these markable pairs (i.e. the links) are considered in their context to see what linguistic phenomenon is responsible for the disreference (or coreference) of the respective markables. Afterwards, the goal is to model this phenomenon in a link feature in order to provide a feature set that is able to rightly classify the links and thereby improve SUCRE’s performance. The basis for this improvement is the evaluation of the resulting partition of equivalence classes. For this evaluation, SUCRE uses the four main evaluation measures described in (2.6). Before analyzing the misclassifications (false positives and false negatives) of the markable links in SUCRE in chapter 4, chapter 2 presents a brief overview of the coreference resolution task in general, the three main coreference models in supervised machine learning methods, an unsupervised coreference approach, the progress for German coreference resolution in the past 10 years and a short survey over the main evaluation scores used for coreference resolution. Chapter 3 presents the SUCRE project with its parts and goals. The goal in chapter 5 is to implement some features in a specific regular definition language defined for SUCRE to model the linguistic phenomena detected in chapter 4. Chapter 6 evaluates the feature engineering that is done in the chapters 4 and 5 and gives an answer to the question, whether the performance of SUCRE increases with the new link feature set. Chapter 7 finally summarizes the previous chapters and the results and gives an outline for future work. 4 CHAPTER 2 The coreference resolution task and its progress in German In this chapter the coreference resolution task is presented. First, the definitions for coreference and coreference resolution are sketched briefly. One difficult task with end-to-end-coreferent systems (i.e. systems that take a raw text as input and return the final coreference partition) is the markable detection. This is described in (2.2). In (2.3), the most common coreference models for supervised machine learning approaches are outlined. Thereafter, (2.4) presents an unsupervised method treating coreference resolution as a pure clustering task. In (2.5), the progress in German coreference resolution in the past 10 years and its the state of the art are presented. In (2.6) the four most commonly used evaluation scores (i.e. MUC, B3 , CEAF and BLANC) are introduced and exemplified with an illustrating example. In (2.7) the competition in coreference resolution, SemEval-2010, is briefly outlined. 2.1. What is coreference resolution? If people talk about things they have different possibilities for denoting them. This is the result of variety in natural language, which human speakers exploit to avoid repetitions (Cardie and Wagstaff, 1999). In the literature, such expressions which can refer to a real-world entity are called markables, mentions, REs (referring expressions) or CEs (coreference elements). Subsequently these expressions will be called markable, regardless of the term used in the cited literature. If two markables refer to the same entity, they are called coreferent, otherwise disreferent. Examples of coreferent markables are given in (1): (1) a. b. c. Angela Merkel ⇔ die Bundeskanzlerin Barack Obama ⇔ der US-Präsident eine alte Frau ⇔ sie Examples of disreferent markables are given in (2): (2) a. b. c. der Mann ⇔ sie die Frauen ⇔ er Frau Müller ⇔ Herr Müller There are different kinds of markables. They can be definite or even indefinite noun phrases, proper names, appositives, any kinds of pronouns and so on. The coreference between two markables can be seen as an equivalence relation. That means that every markable is coreferent with itself (reflexivity). If markable a is coreferent with markable b, then markable b is coreferent with markable a (symmetry) and if the markables a, b as well as b, c are coreferent, then a, c are also coreferent (transitivity). So, clusters of coreferent markables which are disreferent to all markables outside this cluster are equivalence classes. 5 Chapter 2. The coreference resolution task and its progress in German To create a partition of equivalence classes over the set of all markables in a context is called coreference resolution. From a more local perspective, coreference resolution is the task of determining whether two expressions in natural language (denoting an entity) refer to the same entity in the world (Soon et al., 2001). While human listeners have just little trouble to do this job of assigning each markable to the appropriate entity, it is a very tough challenge to an NLP system (Cardie and Wagstaff, 1999). Entities might be expressed only once in the text and thus constitute a singleton cluster. Such a cluster does not contain coreferent markables. Other entities are expressed several times. These multi-markable entities contain coreferent markables. The search space for the right partition is very huge as the number of different partitions, given n markables, equals the Bell number B(n), also called Exponential numbers (Bell, 1934; Hartrumpf, 2001). Table 2.1 shows some examples of number of markables n and the resulting number of different possible partitions B(n): Number of markables n 1 2 3 4 5 10 15 20 25 Number of partitions B(n) 1 2 5 15 32 115,975 ≈ 1.38 × 109 ≈ 5.17 × 1013 ≈ 4.64 × 1018 Table 2.1.: The Bell numbers B(n) In this thesis, the focus lies on identity coreference resolution of any kind of noun phrase. Other possibilities are part-whole or similar complex semantic relations rather than identity and the coreference of situations expressed by clauses rather than just noun phrases (Hartrumpf, 2001). Much work has been done in the past in this area and the related area of anaphora resolution (2.1.1) but “most of the work on supervised coreference resolution has been developed for English [. . . ] due to the availability of large corpora such as ACE” (Broscheit et al., 2010b). In (2.5) the development of and some approaches about the coreference resolution in German are presented. 2.1.1. Coreference vs. Anaphora As coreference resolution has been often confused with anaphora resolution (Elango, 2005), a clear distinction has to be made. Two markables are coreferent, if they refer to the same entity. This can also be true for a markable m1 that is the anaphoric antecedent to markable m2 . This might be the reason for the frequent confusion of both tasks. But a markable A is said to be the anaphoric antecedent of markable B if and only if it is required for getting the meaning of B (Elango, 2005) . This relation ((A, B) ∈ R ⇔ A is the anaphoric antecedent to B) is neither reflexive, nor symmetric, nor transitive, thus being-the-anaphoric-antecedent-of is no equivalence relation. No partition into equivalence classes is possible. Coreferential links could be anaphoric relations (3a) but there are coreferential links where the first markable is not required for the interpretation of the second markable (3b). Some anaphora relations such as bound anaphora are not coreferent (3c) (Elango, 2005). Another kind of anaphoric relation which is not coreferent is that of bridging, where the relation of antecedent and anaphora is that of meronomy or holonomy, like in (3d), where den Raum and die Tür are in this anaphoric relation. 6 2.2. Detection of markables (3) a. b. c. d. Der Manni sieht sichi im Spiegel. Ein alter Manni schläft auf einer Parkbank. Am Morgen wachte der Manni auf. Jeder Hund hat seinen Tag. Der Junge trat in den Raum. Die Tür schloss sich automatisch. 2.1.2. NLP tasks which use coreference resolution Many natural language processing (NLP) applications require coreference resolution (Cardie and Wagstaff, 1999). Numerous NLP tasks detect attributes, actions and relations between entities. For this purpose all information about a given entity has to be discovered. Therefore, a first step to this is to group all markables referring to a given entity together. Thus, coreference resolution is an important requirement to such tasks like textual entailment and information extraction (Bengtson and Roth, 2008). It also has applications in areas like question answering, machine translation, automatic summarization and named entity extraction (Elango, 2005). 2.2. Detection of markables “The ultimate goal for a coreference system is to process unannotated text.” (Bengtson and Roth, 2008) Such a system is called end-to-end. Developing such a resolver requires the detection of markables. But there are problems concerning markable detection, for example markables are often nested, their boundaries mismatch with the gold standard, they are missed or additional markables are detected (Bengtson and Roth, 2008). In order to get the input for a coreference classifier, the markables in the text have to be extracted. To obtain these, some preprocessing steps have to be taken. Soon et al. (2001) propose the following NLP modules to be used as shown in figure (2.1). Figure 2.1.: NLP modules for markable detection in (Soon et al., 2001) The mentioned steps are tokenization, sentence segmentation, morphological processing, POS tagging, NP identification, NER, nested NP extraction and determination of semantic class. The result of these steps are well-defined boundaries of the markables and information about the markables which are used for subsequent feature generation. Soon et al. (2001) used a POS tagger, an NP identifier and a named entity recognizer all based on the Hidden Markov Model (HMM). NPs and named entites are merged in such a way that if an NP overlaps with a named entity, their boundaries will be adjusted so that the NP subsumes the named entity. The nested NP extraction module determines nested noun phrases for each noun phrase identified so far. It divides nested NPs into two groups: 7 Chapter 2. The coreference resolution task and its progress in German 1. Nested NPs from possessive NPs: (e.g. {{his}N P long-range stategy}N P , {{Eastern’s}N P parent}N P ) 2. Nested NPs that are modifier nouns or prenominals: (e.g. {{wage}N P reductions}N P , {{Union}N P representatives}N P ) Finally, the set of markables is the union of the set of extracted NPs, named entities and nested NPs. For non-named-entities, the semantic class will be determined (Soon et al., 2001). 2.3. Supervised coreference models based on machine learning There are three most common coreference models that have been developed for implementing coreference resolution as a supervised machine learning task. 2.3.1. Mention-Pair Model Issues with mention-pair models This model is based on a binary classifier which determines whether two markables are coreferent or not (Ng, 2010). Although this approach is very popular, it has some disadvantages. As the coreference relation is an equivalence relation, it is transitive (i.e. (coref (A, B) ∧ coref (B, C)) ⇒ coref (A, C)). This property cannot be modeled as it is possible to classify A and B as coreferent and B and C as coreferent but A and C as disreferent. Therefore it is necessary to do a separate clustering step in order to achieve a final coreference partition (Ng, 2010). A second issue that arises is the task of generating training instances. A training instance is a pair of markables and the corresponding class (i.e. coreferent/disreferent). A simple approach for this may be generating all possible pairs of markables within a training document. But this method yields an extremely unbalanced class distribution as most markable pairs are not coreferent. Thus, there have to be some training instance creation methods for the reduction of class skewness (Ng, 2010). Training instance creation models A possible way of reducing class skewness in the training instances has been proposed by Soon et al. (2001): for a given markable mk a positive instance is created between mk and its closest preceding antecedent, mj . Negative instances are created between mk and each markable that succeeds mj . An example given by Soon et al. (2001): if we have a context with six markables in the sequential order: A0 , · · · , A1 , a, b, B1 , A2 where A0 , A1 and A2 are coreferent with each other and A1 is the closest preceding antecedent of A2 , then Soon et al. (2001) create a positive training instance of the markable pair <A1 ,A2 > (rather than of <A0 ,A2 >) and negative instances of the markable pairs <a,A2 >, <b,A2 > and <B1 ,A2 >. Ng and Cardie (2002) modifies this method: if the anaphor mk is non-pronominal, then a positive training instance is created between mk and its closest non-pronominal antecedent. By this modification, a coreference link between a pronominal markable m1 and a non-pronominal markable m2 is impossible. The reason for this is that it is not easy for a machine learner to learn from an instance where the antecedent of a non-pronominal markable is a pronoun (Ng, 2010). Another possibility for reducing class skewness is the usage of a prefilter (i.e. a hard constraint), that filters out markable pairs that are very unlikely to be classified as coreferent because of obvious incompatibility like in gender or number agreement. 8 2.3. Supervised coreference models based on machine learning One possible representation for a training instance is a feature vector as used by Soon et al. (2001). Here, each dimension corresponds to a feature. Table 2.2 shows the feature set proposed by Soon et al. (2001). Feature (abbreviated in (Soon et al., 2001)) Sentence distance (DIST) Pronominality of the antecedent (I_PRONOUN) Pronominality of the anaphor (J_PRONOUN) String match (STR_MATCH) Definiteness of the anaphor (DEF_NP) Demonstrative anaphor (DEM_NP) Number agreement (NUMBER) Semantic Class Agreement (SEMCLASS) Gender Agreement (GENDER) Both-Proper-Names (PROPER_NAME) Alias (ALIAS) Appositive anaphor (APPOSITIVE) Description The distance between m1 and m2 in terms of sentences Returns TRUE if that m1 is a pronoun Returns TRUE if m2 is a pronoun Returns TRUE if m1 and m2 string matches Returns TRUE if m2 starts with the Returns TRUE if m2 starts with this, . . . Returns TRUE if m1 and m2 agree in number Returns TRUE if m1 and m2 are in the same class Returns TRUE if m1 and m2 agree in gender Returns TRUE if both markables are proper names Returns TRUE if m1 is the alias of m2 or vice versa Returns TRUE if m2 is in apposition to m1 Table 2.2.: Link feature set used in (Soon et al., 2001) Given the excerpt “. . . Frank Newman, 50, vice chairman and . . . ”, Soon et al. (2001) illustrate their feature set with the feature vector corresponding to the markables Frank Newman and vice chairman, shown in table 2.3: Feature DIST I_PRONOUN J_RONOUN STR_MATCH DEF_NP DEM_NP NUMBER SEMCLASS GENDER PROPER_NAME ALIAS APPOSITIVE Value 0 + 1 1 + Comments m1 and m2 are in the same sentence m1 is not a pronoun m2 is not a pronoun m1 and m2 do not match m2 is not a definite noun phrase m2 is not a demonstrative noun phrase m1 and m2 are both singular m1 and m2 are both persons m1 and m2 are both males Only m1 is a proper name m2 is not an alias of m1 m2 is in apposition to m1 Table 2.3.: Example of a feature vector in (Soon et al., 2001) Coreference classifiers After creating a training set, a learning algorithm can be trained. The most popular algorithms for this task are the decision tree induction systems (e.g. C5, (Quinlan, 1993)). Alternatives to this are rule learners (e.g. RIPPER (Cohen, 1995)) and memory-based learners (e.g. TiMBL, (Daelemans et al., 2003)), beside statistical learners such as maximum entropy models (Berger et al., 1996), voted perceptrons (Freund and Schapire, 1999) and support vector machines (Joachims, 1999). 9 Chapter 2. The coreference resolution task and its progress in German Clustering algorithms After training a classifier and applying a test sample to it, the classifier’s decisions have to be coordinated and transformed into a coreference partition (Ng, 2010). The two most common coreference clustering algorithms are closest-first clustering (Soon et al., 2001) and best-first clustering (Ng and Cardie, 2002). For a given markable mk the closest-first clustering chooses the markable that is closest to and preceding mk and classified as coreferent to it. If there is no suitable antecedent for mk , the coreference chain ends. In order to improve the precision of closest-first clustering, best-first clustering chooses that markable as antecedent to mk that is most probably the antecedent and coreferent with it. One problem concerning those two clustering algorithms is that they are too greedy: “clusters are formed based on a small subset of pairwise decisions made by the model” (Ng, 2010). Furthermore, coreference classifications are preferred to disreference classifications: consider the markables m1 , m2 and m3 occurring in this order. If m2 is chosen as closest (or best) preceding antecedent to m3 and m1 as closest (or best) preceding antecedent to m2 , then all three markables are assigned to the same cluster regardless of a possible disreference between m1 and m3 . There are several algorithms addressing these problems. For example, the correlation clustering (Bansal et al., 2002) creates a partition with respect to as many classification decisions as possible. Graph partitioning algorithms (Nicolae and Nicolae, 2006) are used in a weighted graph. In this graph, each vertex corresponds to a markable and an edge has a weight which corresponds to the coreference probability. The Dempster-Shafer rule (Dempster, 2008) combines both coreference and disreference classification decisions for creating a partition (Ng, 2010). Although there are a lot of coreference clustering algorithms, only few researchers tried to compare their effectiveness. Ng and Cardie (2002) account for best-first clustering outperforming closest-first clustering, while Nicolae and Nicolae (2006) show that their minimum-cut-based graph partitioning algorithm performs better than best-first clustering. Combining classification and clustering One problem that occurs when using classification and clustering as two separated steps is that they are trained independently from each other. If the classification is improved, it might not take effect in clustering-level accuracy (Ng, 2010), that means “overall performance on the coreference task might not improve”. Mccallum and Wellner (2004) and Finley and Joachims (2005) remove the classification and treat coreference resolution as a supervised clustering task. A similarity metric is learned in order to maximize the clustering accuracy (Ng, 2010). The flaw of the mention-pair model In contrast to the entity-mention model (2.3.2) or the ranking model (2.3.3), the mention-pair model has some weak points. The first problem that Ng (2010) mentions is that every candidate antecedent for a markable is examined independently of each other. The model does not assess how good one candidate antecedent relates to a markable compared against other candidates. There is no way for finding the most probable candidate antecedent among all candidates. The second problem is the insufficiency of expressiveness. The information that is only contained in the two markables may not be enough for the decision whether the two markables are coreferent or not. In particular, if one markable is a pronoun or any other noun phrase that “lacks descriptive information such as gender” (e.g. “Clinton”) (Ng, 2010). 10 2.3. Supervised coreference models based on machine learning 2.3.2. Entity-Mention Model The advantage of the entity-mention model The entity-mention model attacks the second problem of the mention-pair model (i.e. the lack of expressiveness). Mccallum and Wellner (2003) use the following example for presenting the mentionpair-model’s shortcoming: assume there are three markables Mr Clintonm1 , Clintonm2 and shem3 . With respect to proximity and no mismatch in atomic features like gender or number, m2 and m3 are classified as coreferent. Concerning an exact string matching feature, m1 and m2 are classified as coreferent. But now, due to transitivity, the markables Mr Clintonm1 and shem3 will end up in the same cluster. The reason for this is the independence of markable-pair classifications (Ng, 2010). If the model knows that Mr Clintonm1 and Clintonm2 are considered to be coreferent, it will not classify the markable pair (m2 , m3 ) to be coreferent. This is the basic idea of an entity-mention model. It can classify whether a markable mk is coreferent with a cluster Cj of markables mj1 . . . mjn that precede mk . The training instances A training instance of the entity-mention model corresponds to a markable mk and a cluster Cj of markables preceding mk . It is labeled positive in the case that mk should be added to Cj and negative otherwise. Such an instance can be represented by so-called cluster-level features. These features can be regarded as a combination of a link feature (i.e. features that are used in the mention-pair model) and a quantifier (e.g. all, most, any). For example the link feature Both markables have the same gender can be combined with the quantifier all to create a cluster-level feature that has the value YES if mk agrees with all markables mji in Cj with respect to gender; otherwise its value is NO. These cluster-level features increase the expressiveness over the mention-pair model. 2.3.3. Ranking Model The advantage of the ranking model Alternative titles for this kind of model are tournament model or twin-candidate model. The entitymention model solves the mention-pair model’s problem of lacking expressiveness but does not address the comparison of one candidate antecedent to others. This problem is attacked by the ranking model. It enables to determine which candidate antecedent is the most probable. Ranking embodies a more natural coreference resolution than classification does since all preceding candidate antecedents are considered simultaneously. A markable is resolved to that candidate antecedent that has the highest rank (Ng, 2010). The training and testing of a ranking model A training instance for the ranking model corresponds to a markable mk and two candidate antecedents mi and mj , one of which is an antecedent to mk and the other is not (Ng, 2010). Its label indicates whether the first or second markable fits better as the antecedent to mk . In the testing step, given a markable mk , each pair of candidate antecedents to mk are applied to the ranking model. The final antecedent for mk is the candidate that was classified most of the time as better for mk . Variants of ranking models Thanks to the advance of machine learning, all candidate antecedents can be considered simultaneously. This so-called mention-ranker consistently outperforms the mention-pair model although both models have the same expressiveness (Ng, 2010). 11 Chapter 2. The coreference resolution task and its progress in German Rahman and Ng (2009) propose a cluster-ranking model for exploiting cluster-level features in order to increase expressiveness. This model ranks clusters Cji of markables mjik that precedes a given markable ml in order to resolve ml . Such kinds of ranking models address both shortcomings of the mention-pair model: (1) independent considerations of candidate antecedents and (2) insufficient expressiveness due to only using link features (Ng, 2010). 2.4. Unsupervised coreference resolution The result of the coreference resolution task is a partition of coreference sets. Thus, it is obvious to assume a clustering algorithm that puts all coreferent markables into one cluster creating the final partition without using a pairwise classifier before. In the subsequent section, this method is exemplified by the approach of Cardie and Wagstaff (1999) “Noun Phrase Coreference as Clustering”, that is focussed on the resolution of base (i.e. simplex) noun phrases. Advantages of the clustering approach In contrast to other learning and non-learning approaches, the clustering method has several advantages: • Clustering is unsupervised. Thus, there is no need for annotated training data. • The method is domain-independent. • The method enables to flexibly combine local constraints with global constraints. Local constraints: These constraints are only used within one closed markable pair. Global constraints: They also regard the correlation to other markables in its environment (e.g. the cluster). Instance representation and feature set Every markable (rather than every markable pair) in the corpus is represented as a feature vector. For this representation Cardie and Wagstaff (1999) extract markables and their feature values automatically. They use 11 features (i.e. local constraints), each corresponding to a dimension in the search space: 1. Words: The words in the markable are treated as a feature. 2. Head word: The last word in a markable is regarded as the head noun (cf. right-hand head rule). 3. Position: The numeric ID of the markable: all markables are enumerated from the beginning of the text. 4. Pronoun type: Pronouns get a specific type (e.g. POSSESSIVE); other markables get None. 5. Article: The article is Indefinite, Definite or None. 6. Appositive: This binary feature returns TRUE in the case that the markable is an apposition and FALSE otherwise. 7. Number: The number value of the markable can be singular or plural. 8. Proper name: This binary feature returns TRUE in the case that the markable is a proper name and FALSE otherwise. 9. Semantic class: The semantic class of the markable is based on WordNet (e.g. Human, Company, . . . ). 12 2.4. Unsupervised coreference resolution 10. Gender: The grammatical gender of the markable can have the values Masculine, Feminine, Neuter or Either. 11. Animacy: Markables with the semantic class Human or Animal are animated, all others are inanimated. Given the sample text: John Simon, Chief Financial Officer of Prime Corp. since 1986, saw his pay jump 20%, to $1.3 million, as the 37-year-old also became the financial-services company’s president. Cardie and Wagstaff (1999) extract 11 markables and their features. The resulting feature vectors (each dimension as a column; the first two dimensions are combined) are shown in figure 2.2. Words/HeadH Position Article Appositive Number 1 2 Pronoun Type None None John SimonH Chief Financial OfficerH Prime Corp.H 1986H hisH payH 20%H $1.3 millionH the 37-year-oldH the financial-services companyH presidentH Semantic Class human human Gender Animacy sing sing Proper Name TRUE FALSE None None FALSE FALSE masc either anim anim 3 4 5 6 7 8 9 10 None None poss None None None None None None None None None None None def def FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE sing plural sing sing plural plural sing sing FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE company number human payment percent money human company neuter neuter masc neuter neuter neuter either neuter inanim inanim anim inanim inanim inanim anim inanim 11 None None FALSE sing FALSE human either anim Figure 2.2.: Some feature vectors used in (Cardie and Wagstaff, 1999) The distance measure The basic idea of clustering coreferent markables is that they are somehow similar to each other, i.e. they have a small distance to each other in the search space. The clusterer may create a cluster with all markables that are close to each other (e.g. that are within a coreference radius r). Cardie and Wagstaff (1999) define the distance between two markables m1 and m2 as shown in formula 2.1. X distance(m1 , m2 ) = wf · incompatiblef (m1 , m2 ) (2.1) f ∈F Here, F is the set of features described above. The function incompatiblef returns a value between 0 and 1 denoting the incompatibility of m1 and m2 with respect to feature f . The weight wf corresponds to the importance of the feature. It ranges between −∞ and +∞. The weight +∞ is used when impossible markable pairs have to be filtered out (e.g. in a number/proper_name/animacy-mismatch). On the other hand, the weight −∞ indicates certain coreference between the respective markables (e.g. in an appositive-match). In the case of both addends (i.e. . . .+(−∞)+. . .+∞+. . .), the positive addend 13 Chapter 2. The coreference resolution task and its progress in German (i.e. ∞) indicating certain disreference has more importance and overlays the negative one. Thus, the distance-value becomes ∞. Another remarkable weight is the coreference radius r. This weight indicates a preference for regarding m1 and m2 as disreference. But it might be clustered together anyway, “if there is enough other evidence that they are similar”. Table 2.4 shows for each feature f its weight wf and its incompatiblef -value. Feature f Words Head Noun Position Pronoun Article Words–Substring Appositive Number Proper Name Semantic Class Gender Animacy Weight wf 10.0 1.0 5.0 r r −∞ −∞ ∞ ∞ ∞ ∞ ∞ incompatiblef -value (# of mismatching words) / (# of words in the longer NP) 1 if the head nouns differ; else 0 (difference in position) / (maximum difference in document) 1 if NPi is a pronoun and NPj is not; else 0 1 if NPj is indefinite and not appositive; else 0 1 if NPi subsumes (entirely includes as a substring) NPj 1 if NPj is appositive and NPi is its immediate predecessor; else 0 1 if they do not match in number; else 0 1 if both are proper names, but mismatch on every word; else 0 1 if they do not match in class; else 0 1 if they do not match in gender (allows either to match masc or fem); else 0 1 if they do not match in animacy; else 0 Table 2.4.: Weights and incompatiblef -values for each feature used in (Cardie and Wagstaff, 1999) The clustering algorithm The clustering algorithm combines the global constrains and the local constraints and creates a partition of coreference clusters. It is given in algorithm 1 and 2. Algorithm 1 The clustering algorithm Coreference_Clustering(mn ,mn−1 ,...,m1 ) 1: Let r be the coreference radius 2: Each markable mi belongs to its own cluster ci : ci ← {mi } 3: for mj := mn to m1 do 4: for mi := mj−1 to m1 do 5: d ← distance(mi , mj ) 6: ci ← cluster_of (mi ) 7: cj ← cluster_of (mj ) 8: if d < r ∧ All_m0 s_compatible(ci , cj ) then 9: cj ← ci ∪ cj 10: end if 11: end for 12: end for First, every markable constitutes its own cluster. Each markable is compared with every preceding markable. If the distance between two markables mi and mj is less than the coreference radius, their clusters are checked for merging (cf. algorithm 2): if the distance between every markable in the cluster of mi and every markable in the cluster of mj is unlike ∞, then the clusters are merged. 14 2.4. Unsupervised coreference resolution Algorithm 2 All_m’s_compatible All_m’s_compatible(ci ,cj ) 1: for mα ∈ ci do 2: for mβ ∈ cj do 3: if distance(mα , mβ ) = ∞ then 4: return FALSE 5: end if 6: end for 7: end for 8: return TRUE This way, the clustering algorithm automatically includes the transitive closure (i.e. distance(mi , mj ) < r ∧ distance(mj , mk ) < r ∧ All_m0 s_compatible(ci , cj ) ⇒ cluster_of (mi )=cluster_of (mj )= cluster_of (mk )). Certainly, it might be the fact that distance(mi , mk ) ≥ r and thus, mi and mk would not have been considered coreferent within the local perspective but as long as the distance is less than ∞, they can be added to the same cluster. So, the clustering algorithm uses a global perspective by creating a coreference chain. One problem concerning this algorithm is its greed. Every markable is linked with every compatible preceding markable within the radius r. To solve this problem, Cardie and Wagstaff (1999) propose the following modifications: 1. For every markable mj , the algorithm stops, when the first compatible antecedent mi is found. 2. For every markable mj , the algorithm ranks all possible candidate antecedents and chooses only the best one. 3. The algorithm ranks the coreference links and proceeds in the ranked order. Results of the clustering approach in (Cardie and Wagstaff, 1999) Cardie and Wagstaff (1999) use two different corpora in two different variants for the evaluation: the “dry-run”-MUC-6-coreference-corpus and the “formal-evaluation”-MUC-6-coreference-corpus with 30 documents each. The variants are “official” and “adjusted”. In the first variant, all possible markables are considered. Cardie and Wagstaff (1999) only extract “base markables” (i.e. simplex markables, without appositions or the like) and thus, the recall is too low. In the other variant, they adjust the results on the pure resolution of “base markables”. As evaluation score, they use MUC (Vilain et al., 1995) (cf. (2.6.1)). The coreference radius r is set to 4. In table 2.5, the MUC-f-measures for all four settings are shown. The clustering results are compared to three baseline systems: (1) all markables are coreferent with each other; (2) two markables corefer, if they match in at least one word; (3) two markables corefer, if they match in their head word. The clustering algorithm outperforms all baseline system. Algorithm Clustering All One Class Match Any Word Match Head Noun Dryrun Data Set Official Adjusted 52.8 64.9 44.8 50.2 44.1 52.8 46.5 56.9 Formal Run Data Set Official Adjusted 53.6 63.5 41.5 45.7 41.3 48.8 45.7 54.9 Table 2.5.: F-measure results for the clusterer and some baselines on the MUC-6 datasets 15 Chapter 2. The coreference resolution task and its progress in German 2.5. Coreference resolution for German - approaches in the past 10 years Most work about supervised coreference resolution has been done for English like in (Soon et al., 2001), (Ng and Cardie, 2002) or (Yang et al., 2003). One reason for preferring English was the availability of coreferentially annotated corpora like ACE (Walker et al., 2006) and OntoNotes (Weischedel et al., 2008). Given suitable German corpora like TüBa-D/Z (Hinrichs et al., 2005b) it becomes possible to research the German coreference resolution. Broscheit et al. (2010b) mention some increasing efforts to the development of a robust coreference resolution system for German in the past years as in (Stuckardt, 2004), (Schiehlen, 2004), (Kouchnir, 2004) and (Hinrichs et al., 2005a), in particular for anaphora resolution. Versley (2006) researched names and definite noun phrases. The full coreference resolution task has been worked on by Hartrumpf (2001), who uses a collection from the German newspaper Süddeutsche Zeitung, that has been annotated according to MUC guidelines, Strube et al. (2002), who use a corpus containing 242 short German texts about sights, historic events and persons in Heidelberg, Klenner and Ailloud (2009) and Broscheit et al. (2010b), who use TüBa-D/Z. As this diploma thesis focusses on the full coreference resolution, this section only regards the last five sources in detail and will subsequently present the respective approaches. 2.5.1. Coreference Resolution with Syntactico-Semantic Rules and Corpus Statistics - (Hartrumpf, 2001) Hartrumpf (2001) presents a hybrid approach (“CORUDIS - COreference RUles with DIsambiguation Statistics”) that combines “syntactico-semantic” rules with statistics that are derived from an annotated corpus. He uses the full MUC coreference task (Hirschman and Chinchor, 1997) in this hybrid approach. As reason for using syntactico-semantic rules Hartrumpf (2001) mentions the exploitation of traditional linguistic knowledge. One vote for using corpus statistics is the disambiguation of many alternatives that would emerge with a purely rule-based approach. Coreference rules The syntactico-semantic rules define whether two markables are able to corefer. Thus, they “license possible coreference”. The rules can be language-dependent or universal. Hartrumpf (2001) focusses on a rule adaption for German. Figure 2.3 shows an example of a rule Hartrumpf (2001) used for German coreference resolution. Each rule has a unique name (id) (e.g. ident.n perspro for the identity between a noun and a personal pronoun) and a premise (pre.), which is a conjunction of several constraints. Some of these concern one constituent (i.e. a markable) and are called constituent constraints (c-constraints), whereas some others concern both markables and are called interconstituent constraints (ic-constraints). The first two constraints in figure 2.3 are c-constraints, whereas the other six ones are ic-constraints. These constraints use among others the feature CAT that describes the syntactic category (e.g. n for noun or perspro for personal pronoun) and the features NUM, PERS and GEND that describe the grammatical attributes number, person and gender. The last feature used in figure 2.3 is called entity and describes the “semantic classification comprising the semantic sort [. . . ] and semantic Boolean features” (Hartrumpf, 2001). Beside these features, there are several predicates used in the coreference rules. Two of them are used in figure 2.3: =/2 and c-command/2. The first takes two values and returns TRUE in the case that they are unifiable. The second also takes two arguments and returns TRUE, if the first argument c-commands (cf. Government and Binding Theory) the second. The function not negates the value returned by a predicate (e.g. TRUE→FALSE). 16 2.5. Coreference resolution for German - approaches in the past 10 years id pre. desc. exam. ident.n perspro (c1 cat) n (c2 cat) perspro (= (c1 num) (c2 num)) (= (c1 pers) (c2 pers)) (= (c1 gend) (c2 gend)) (= (c1 entity) (c2 entity)) (not (c-command c1 c2)) (not (c-command c2 c1)) same gender - anaphoric Der Mann liest [das Buch]i . Er versteht [es]i nicht. Figure 2.3.: A coreferene rule used by Hartrumpf (2001) So, the coreference rule ident.n perspro licenses coreference between a noun headed markable and a personal pronoun if they are unifiable with respect to number, person, gender and entity and none of the markables c-command each other. Coreference annotated corpus Hartrumpf (2001) uses a collection from the German newspaper Süddeutsche Zeitung, that is annotated with coreference information according to MUC guidelines adapted from English to German by inserting SGML tags into the corpus. An example of this is given in figure 2.4 decoding the sentence “Das Mädchen liest die Zeitung; danach geht sie mit ihr ins Büro”: <s><coref id=”125t129”><w>Das</w><w>Mädchen </w></coref> <w>liest</w> <coref id=”143t147”><w>die</w> <w>Zeitung</w></coref> <w>;</w> <w>danach</w> <w>geht</w> <coref ref=”125t129” type=”ident” ><w>sie</w></coref> <w>mit</w> <coref ref=”143t147” type=”ident” ><w>ihr</w></coref> <w>ins</w> <w>Büro</w> <w>.</w></s> Figure 2.4.: Coreference annotation in (Hartrumpf, 2001) The algorithm for the coreference resolution system CORUDIS The basis for the algorithm is constituted by three kinds of objects: • All possible anaphors (i.e. all detected markables) • All candidate antecedents for a markable mj (i.e. all markables preceding mj ) • All coreference rules (Hartrumpf (2001) uses 18 rules) 1. Markable detection and feature extraction: Each sentence is parsed independently. If it fails, a chunk parser is used instead. In this case, no full parse is available and thus, predicates like c-command/2 are ignored. 17 Chapter 2. The coreference resolution task and its progress in German 2. Collection of all possible coreference rule activations: All rule premises are tested on all markable pairs, assuming that c1 precedes c2. As the rules have disjoint premises, for each markable pair, there is at most one coreference rule activated. 3. Selection of one antecedent candidate for each anaphor: a) All possible and licensed partitions are created incrementally: The start point is a singleton anaphor. For this singleton cluster, each licensed antecedent candidate is added separately to that cluster in order to get an extended partition. This process stops when all possible anaphors have been investigated. b) As the number of possible partitions given a non-tiny number of markables (cf. table 2.1) is very huge, the partitions have to be filtered at an early stage in generation. Hartrumpf (2001) mentions four criteria for this pruning step: sentence distance The distance between two markables measured in sentences must be below the limit for the respective coreference rule paragraph distance The distance between two markables measured in paragraphs must be below the limit for the respective rule. Usually, pronouns are at most 2 paragraphs apart from its antecedent, whereas there is no limit for proper names. semantic compatibility All markables in a cluster have to bear compatible semantics (e.g. the entity feature). partition scoring Alternatives with a low score will be discarded. c) Hartrumpf (2001) describes the score for a partition in the following way: it is the sum of all estimated probabilites for merging a currently investigated anaphor m with one antecedent candidate out of C = hc1 , c2 , . . . , ci , . . . , ck i, where the index i indicate the distance to m. Each coreference between m and ci is licensed by a rule ri . These three items can be represented in a coreference alternative (m, ci , ri ). In order to weight those coreference alternatives, they have to be transformed into a more abstract version that can be compared with those in the corpus. This transformation from the triple (m, ci , ri ) to a type-based representation is done by an abstraction function a: a(m, ci , ri ) := (i, ri ) = ai (∈ A). (2.2) Consider A to be a list of abstracted coreference alternatives for a possible anaphor m. Then, the probability that ai is the closest correct antecedent for m can be estimated by the relative frequency (formula (2.3)), where f (i, A) is the absolute frequency of ai winning as the closest correct antecedent in the context of abstracted coreference alternatives A: f (i, A) rf (i, A) := Pk l=1 f (l, A) (2.3) For sparseness problems a backed-off estimation can be used: if for a context A, there are no statistical values available, then the context gets scaled-down one by one until the frequency becomes positive. If this is not the case at all, all candidates receive equal scores. Finally, the (possibly backed-off) estimation rf b (i, A), where b indicates the number of backoffs (starting with b = 0), is used as estimation for the probability that ci is the closest correct antecedent for m given the antecedent candidates C: p(i | C) ≈ rf b (i, A) 18 (2.4) 2.5. Coreference resolution for German - approaches in the past 10 years The evaluation of CORUDIS The evaluation is done by using a 12-fold cross-validation for 502 anaphors (Hartrumpf, 2001). Table 2.6 shows precision, recall and the harmonic f-measure for predicted coreference links. Three different methods have been evaluated: (1) the full coreference task including the markable detection, (2) the “markable-relative” method, that only uses successfully identified markables and (3) the baseline model: selection of the closest licensed candidate that fulfills the aforementioned distance and compatibility constraints. method evaluation results (1) coreference (incl. markable identification) (2) markable-relative coreference evaluation (3) baseline: always closest candidate precision 0.82 0.82 0.42 recall 0.55 0.76 0.46 f-measure 0.66 0.79 0.44 Table 2.6.: Coreference resolution results in (Hartrumpf, 2001) As there have not been any German evaluation results for the MUC coreference task yet, a comparison to other approaches has not been possible. But, as Hartrumpf (2001) argued, the presented results are competitive compared with the f-measure (≈ 60%) for English in MUC-7. Conclusion This paper shows one of the first approaches to German coreference resolution. Nevertheless, Hartrumpf (2001) achieves impressive results. There are few similarities between Hartrumpf (2001)’s architecture and the one used for SUCRE (3.2). He implements coreference rules that discard markable pairs that cannot represent coreference links. This and the features for pruning the possibilities of partitions can be recovered in a way in the prefilters in SUCRE. Instead of using pure identity of feature values, Hartrumpf (2001) implements a unification check. This could be a good way of solving some complications in SUCRE that come up with the feature value unknown. Hartrumpf (2001) uses some complex features, for instance the semantic sort, the extension type feature (it returns 0 in the case of an individual, 1 in the case of a set and 2 in the case of a set of a set) or other complex features based on “extensional and intensional layer features like CARD (cardinality)” (Hartrumpf, 2001). For implementing such features in SUCRE, there is need for some external information sources. Hartrumpf (2001) also uses some distance features, for instance the distance between the respective markables in terms of sentences. 2.5.2. The Influence of Minimum Edit Distance on Reference Resolution (Strube et al., 2002) Strube et al. (2002) use a coreference system that adjusts the algorithm from (Soon et al., 2001) for German data. They present some experiments of coreference resolution based on all anaphoric expressions including definite noun phrases, proper names and personal, possessive and demonstrative pronouns. They evaluated the performance of a given feature set on different types of NP-forms (e.g. pronouns, proper names, definite NPs, . . . ). By adding two further features based on edit distance, Strube et al. (2002) outperforms their first attempt with a significant improvement. The data used in (Strube et al., 2002) Strube et al. (2002) use a corpus containing 242 short German texts about sights, historic events and persons in Heidelberg. The corpus has a total number of 36,924 tokens and the texts have an average 19 Chapter 2. The coreference resolution task and its progress in German length of 151 tokens. In the first part of the annotation, automatic POS-tagger and NP-Chunker have been used. The POStagging of the texts was done by using TnT (Brants, 2000). Afterwards, the markables have been detected by the NP-Chunker Chunkie (Skut and Brants, 1998). The markables have been labeled with several attributes like NP-form using TnT. In the second part, the annotation is corrected manually and the coreference information as well as further features like semantic class are annotated. In the third part, Strube et al. (2002) create a suitable input to a machine learning algorithm by combining each anaphor with all potential antecedents. Afterwards, all pairs are discarded, if they fall into one of the groups described below: • The second markable is an indefinite noun phrase. • One markable is embedded into the other one. • Both markables have different semantic class annotations (given that none of the expressions is a pronoun). • Either markable is not annotated with 3rd person singular or plural. • Both markables have different agreement values (given that the anaphor is a pronoun, as German allows cases where a non-pronominal anaphor disagrees in grammatical gender) After the filtering step, each pair consisting of an anaphor mj and its closest antecedent mi is labeled as positive instance, whereas each pair of anaphor and non-antecedent that is closer to it than its closest antecedent is labeled as negative instance. The other markable pairs (i.e. anaphors and non-antecedents or true antecedents being further apart from each other than the closest antecedent) were not considered at all (cf. (Soon et al., 2001)). This results in 242 texts with 72,093 valid instances of markable pairs. The initial feature set Table 2.7 shows the initial feature set used by Strube et al. (2002). They divide their features into three groups: (1) one feature on the document level with the respective document number, (2) four features for antecedent and anaphor each, that check for the grammatical function, the form of the noun phrase, the agreement attributes like person, gender or number and the semantic class like human, concrete object and abstract object. There is need for the last feature since in German, the gender and the semantic class do not always agree as it is the case in English (objects can be annotated as male or female in German). This feature achieves the same as the gender feature in English. 20 2.5. Coreference resolution for German - approaches in the past 10 years 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. Document level features doc_id document number (1 . . . 250) NP-level features ante_gram_func grammatical function of antecedent (subject, object, other) ante_npform form of antecedent (definite NP, indefinite NP, personal pronoun, demonstrative pronoun, possessive pronoun, proper name) ante_agree agreement attributes for person, gender, number ante_semanticclass semantic class of antecedent (human, concrete object, abstract object) ana_gram_func grammatical function of anaphor (subject, object, other) ana_npform form of anaphor (definite NP, indefinite NP, personal pronoun, demonstrative pronoun, possessive pronoun, proper name) ana_agree agreement attributes for person, gender, number ana_semanticclass semantic class of anaphor (human, concrete object, abstract object) Coreference-level features wdist distance between anaphor and antecedent in words (1 . . . n) ddist distance between anaphor and antecedent in sentences (0, 1, > 1) mdist distance between anaphor and antecedent in markables (1 . . . n) syn_par anaphor and antecedent have the same grammatical function (yes, no) string_ident anaphor and antecedent consist of identical strings (yes, no) substring_match one string contains the other (yes, no) Table 2.7.: The initial feature set by (Strube et al., 2002) The first evaluation As classifier, Strube et al. (2002) use a C5.0 decision tree “with standard settings for pre and post pruning”. Features that have discrete values, are used in a binary way (e.g. ante_npform, here, a binary feature can ask Is ante_npform in {PPER,PPOS,PDS}?). They used a 10-fold cross validation and achieved a f-measure of 59.97% (precision ≈ 88.6%, recall ≈ 45.32%). As this result is not satisfying, Strube et al. (2002) investigated the performance of the features and figured out that the feature no. 7, ana_npform, is the most important one. In the next step, they split the entire dataset into subsets that only contain markable pairs with anaphors that belong to a particular form. The classifier is trained on each of those data sets. As table 2.8 shows, the worst performance is achieved by definite noun phrases (defNP) and demonstrative pronouns (PDS) with about 15% f-measure. The results for proper names (NE) are moderate with 65.14%. Only the personal pronouns (PPER) and possessive pronouns (PPOS) perform well with an f-measure of 82.79% and 84.94%. Considering the three problematic data sets, the demonstrative pronouns can be ignored as they appeared in only 0.87% of the positive cases, whereas definite noun phrases occur in 38.19% and the proper names in 31.05% of the positive cases. defNP NE PDS PPER PPOS all Precision 87.34% 90.83% 25.00% 88.12% 82.69% 88.60% Recall 8.71% 50.78% 11.11% 78.07% 87.31% 45.32% F-measure 15.84% 65.14% 15.38% 82.79% 84.94% 59.97% Table 2.8.: The first evaluation in (Strube et al., 2002) 21 Chapter 2. The coreference resolution task and its progress in German Revision of the inital feature set As a first step in the revision, Strube et al. (2002) figure out why the dataset of the definite noun phrases performs that bad. One reason, they suppose, is that the algorithm is based on surface features and has no access to world knowledge. Another point is, that the string-based features like no. 14 and 15 (string_ident and substring_match) have a high precision but a low recall. Thus, they try to find a way of improving the recall but not losing too much precision. Considering some examples like <“Philips”,“Kurfürst Philip”>, <“vier Schülern”,“die Schüler”>, <“die alte Universität”,“der alten Universität”> or <“diese hervorragende Bibliothek”,“dieser Bibliothek”>, they attempt to weaken the string-based features. As they prefer cheap features (without the need for a syntactic analysis), they decide to use the minimum edit distance (MED) (Wagner and Fischer, 1974). It computes the similarity between two strings by the minimum number of edit operations (insertion, deletion, substitution) that are necessary for transforming one string into the other. They append two further features concerning MED to their initial feature set. One feature for each editing direction. Although both directions have the same number of edit operations, they have different values in the case that antecedent and anaphor have a different length. The new features are given in table 2.9, where m represents the length of the antecedent and n the length of the anaphor. (i + d + s) is the total number of edit operations. 16. 17. New coreference-level features ante_med minimum edit distance to anaphor ante_med = 100 · m−(i+d+s) m ana_med minimum edit distance to antecedent ana_med = 100 · n−(i+d+s) n Table 2.9.: Revision of the feature set in (Strube et al., 2002) The second evaluation The second evaluation reveals some significant improvements. These are shown in parentheses in table 2.10. The overall f-measure gains about 8%. This improvement can be deduced to the much better performance of the data sets defNP and NE with improvements in the f-measure about 18% and 11%. For the demonstrative pronoun, there is no change and for the other two pronominal forms, there is a slight deterioration. defNP NE PDS PPER PPOS all Precision 69.26% 90.77% 25.00% 85.81% 82.11% 84.96% Recall 22.47% 65.68% 11.11% 77.78% 87.31% 56.65% F-measure 33.94% (+18.1%) 76.22% (+11.08%) 15.38% (±0.0%) 81.60% (−1.19%) 84.63% (−0.31%) 67.98% (+8.01%) Table 2.10.: The second evaluation in (Strube et al., 2002) 22 2.5. Coreference resolution for German - approaches in the past 10 years Conclusion In contrast to (Hartrumpf, 2001), the approach of Strube et al. (2002) show many similarities to SUCRE. Both are guided by (Soon et al., 2001), i.e. they constitute a mention-pair model (cf. (2.3.1)). As corpus, Strube et al. (2002) use a self-annotated corpus of German texts including the annotation of semanticclasses. SUCRE on the other hand does not provide semantic class annotation within the TüBa-D/Z corpus. As pre-filters, Strube et al. (2002) use several features that have some disadvantages, when applied on TüBa-D/Z: no markable pairs are considered in which m2 is indefinite. Although there is a clear vote for disreference in this case, there are some coreference links with an indefinite anaphor in TüBa-D/Z, for instance the markable pair <Robert Musil, eines Robert Musils> (cf. (4.3.1)). Moreover, Strube et al. (2002) discard all markable pairs in which a markable is not in third person and thus ignoring all direct speeches. Furthermore, they argue that a disagreement in gender with a pronominal anaphor indicates disreference. Considering a pair like <ein Mädchen, sie>, this restriction cannot hold. By adding semantic class annotion to SUCRE’s German dataset, the system could adapt Strube et al. (2002)’s filter for semantic class disagreement. The feature set used in (Strube et al., 2002) show several similarities with the one used in SUCRE (cf. (4.1.3)). There are features for checking the NP form (e.g. whether the markable’s part-of-speech tag is a pronoun, a named entity or a common noun) or for checking the syntactic function of the markables. Strube et al. (2002) use three features for the distance between m1 and m2 , one in terms of words (or tokens), one in terms of sentences and one in terms of markables. As will be shown in (4.2), the distance features lead to problems in SUCRE. Strube et al. (2002) show the necessity of a more complex string matching feature (e.g. by the use of minimum edit distance). In the feature development provided by this diploma thesis, this insight will be confirmed. 2.5.3. A Constraint-based Approach to Noun Phrase Coreference Resolution in German Newspaper Text - (Versley, 2006) General remarks Versley (2006) presents a state of the art system that is based on hard constraints (i.e. constraints that filter out impossible candidate antecedents) and soft constraints (indicators that, if cumulated, can vote against a candidate being the right antecedent). Those constraints are weighted and the weights are estimated with a maximum entropy model. Versley (2006) uses this system to get new insights in the impact of “commonly used resolution constraints” and to explore new constraints that might support “the resolution of non-same-head anaphoric definite descriptions”. His exploration is based on the question, why there are so many differences in the resolution of pronominal and non-pronominal anaphora with respect to accuracy. Is it due to the minor complexity of pronominal resolution or does non-pronominal resolution contain "different kinds of non-anaphoricity"? One reason for this difference is that some definite descriptions are unique in the context and thus not anaphoric in the sense of anaphoricity (cf. (2.1.1)). However it is possible that these markables corefer if they are mentioned several times. Using some heuristics for determining whether a markable is unique, the resolution results can be improved (Versley, 2006). Therefore, Versley (2006) focusses on coreference resolution of proper names and definite noun phrases. He works on the TüBa-D/Z corpus (Hinrichs et al., 2005b). The table of results In order to see the impact of hard constraints, Versley (2006) uses upper and lower bounds for precision and recall (Pmax , Rmax , Pmin , Rmin ) for each variant, based on the dataset that comes up with the filtering by the hard constraint. In addition to the values for precision and recall, the system also returns the perplexity of the classifier decisions. 23 Chapter 2. The coreference resolution task and its progress in German The complete result table of (Versley, 2006) is given in appendix D. Subsequently, results about diverse steps are mentioned sporadically. The weights of the soft constraints After filtering impossible candidates using the hard constraints, the remaining candidates are ranked with the use of weighted soft constraints. Therefore, each candidate y is represented as a vector of numerical features f (y), where f represents a function that maps the candidate to its feature vector representation. These vectors are multiplied with the vector of feature weights w in a dot product in euclidean space, h·, ·i, to get a final score for a candidate. Now, the task is to choose the candidate with the largest score out of the candidate set Y (see formula 2.5): ŷ = P̂ (y) := arg maxhw, f (y)i y∈Y ehw,f (y)i P hw,f (y0 )i e (2.5) (2.6) y 0 ∈Y “To choose the constraint weights”, he interprets the score as a loglinear model (see formula 2.6). For more details on this loglinear model, see (Versley, 2006). The soft constraints and their performance In knowledge-poor approaches like in (Strube et al., 2002) (cf. (2.5.2)), the coreference of nouns and proper names are determined by string or substring match. Versley (2006) uses both knowledge poor and knowledge rich (i.e. using semantic classes and relations) constraints for the resolution of nominals. Due to German compounding and morphology, Versley (2006) considers two markables having the same head if they share at least one letter-4-gram. This leads to an upper bound recall of 76.5%. In order to raise upper bound precision, he adds a constraint for checking the number agreement, that yields a precision of 52,1% (which is the same as for identical head). Additionally, Versley (2006) uses some heuristics due to Vieira and Poesio (2000) for filtering out markable pairs that share the same concept but have different modifiers (das blaue Auto vs. das rote Auto). This improves the upper bound precision to 56.4%. As proper names are different to common nouns (e.g. they uniquely refer to an entity), they will be treated differently. They only match, if they both share the same name and not only the same common noun (e.g. Bundeskanzler Helmut Kohl vs. Bundeskanzler Gerhard Schröder). Proper names occur very often “with modifiers that are indicative for uniqueness” and thus, modifiers are disregarded in the case of a proper name. Apart from ranking candidates by their sentence distance, it is possible to ignore them, if they are two far apart from the anaphor, using an appropriate hard constraint, which improves the overall precision by more than 5%, up to total of 67.8%. Another possibility is used by Vieira and Poesio (2000) (“a loose segmentation”). Here, an antecedent is considered, if it is within a window of a certain number of sentences or it has been mentioned several times. This yields a similar precision but a smaller loss of recall (Versley, 2006). Besides the resolution of same-head cases, Versley (2006) involves the so-called coreference bridging (Vieira and Poesio, 2000) that enables matching markables with different heads (cf. example (4)). The entities mentioned in m1 and m2 are again mentioned in m3 and m4 but this time, they are neither pronominal, nor in a same-head relation but in a “semantically poorer form” (e.g. the female pedestrian vs. the woman) or synonymous (the car vs. the automobile). 24 2.5. Coreference resolution for German - approaches in the past 10 years (4) a. b. Lebensgefährliche Körperverletzungen hat sich [eine 88jährige Fußgängerin]m1 bei einem Zusammenstoß mit [einem Pkw]m2 zugezogen. [Die Frau]m3 hatte [das Auto]m4 beim Überqueren der Waller Heerstraße offensichtlich übersehen. In the coreference bridging, the grammatical gender may differ as it is the case with m2 (D ER Pkw) and m4 (DAS Auto). A further problem is given with unique descriptions, that only corefer “if they are repeated verbatim” (Vieira and Poesio, 2000). But this restriction might be loosened by the discourse context or world knowledge as it is shown in example (5). Here, m5 and m6 are unique descriptions and not verbatim identical but corefer with respect to world knowledge. (5) a. b. [Nikolaus W. Schües]m5 bleibt Präsident der Hamburger Handelskammer. [Der Geschäftsführer der Reederei “F. Laeisz”]m6 wurde gestern für drei Jahre wiedergewählt. There is no chance for treating the case exemplified in (5) but semantic relatedness as in (4) can be handled in the following way: Versley (2006) classifies all markables into five semantic classes (PERSON, ORGANIZATION, EVENT, TEMPORAL-ENTITY and OTHERS) and involves features for semantic class match. Additionally, he enables a “more fine-grained lexical knowledge” with the use of a graph distance measure for the hypo/hypernymy in the GermaNet graph together with a “hard recency limit of 4 sentences, number agreement” and some heuristics for unique descriptions like in (Vieira and Poesio, 2000). Another feature, that involves an approximation of the information status, is one that checks for the syntactic role. Discourse-given referents usually occur sentence-initially (i.e. in the canonical subject position). Thus, by approximating the theme (rather than the rheme) in a sentence, one can conclude that the subject is more likely to be discourse-old than objects or preposition phrases. At last, Versley (2006) uses a statistical model for the selectional preferences for verbs based on “the intuition that it should be possible to exchange two coreferent descriptions against each other” in the same context: 11 million sentences are parsed with a PCFG parser (Versley, 2005) in order to get the pairs <subject,verb> and <object,verb>. Afterwards, models for both relations are trained with the use of the soft clustering system LSC1 , based on the EM-algorithm. Versley (2006) computes the logarithm q of the probability of how well the anaphor fits in the contexts of the antecedent and vice versa. If the antecedent is likely occurring in the context of the anaphor or vice versa, then q is near or even above zero, whereas if one markable does not fit in the other’s context, then q is negative. Example (6) shows a negative and a positive instance. In (6a), the noun Arbeiterwohlfahrt as a subject for the verb entlassen cannot be exchanged by the noun Mark as a subject for the verb fliessen. This leads to a negative q-value of −5.9. On the other hand, the noun Siegerin as an object of the verb disqualifizieren can be exchanged by a person name as subject of the verb landen. This leads to a positive q-value of +1.0. (6) a. b. <ArbeiterwohlfahrtSU BJ , entlassen> <MarkSU BJ , fliessen> q = −5.9 <SiegerinOBJ , disqualifizieren> <PERSONSU BJ , landen> q = +1.0 The results compared to (Strube et al., 2002) The final version of Versley (2006)’s coreference system has 70% overall recall and 61.0% overall precision. The complete results are listed in table D.1 in appendix D. 1 http://www.ims.uni-stuttgart.de/tcl/SOFTWARE/LSC.html 25 Chapter 2. The coreference resolution task and its progress in German Table 2.11 shows a comparison of the performance of (Strube et al., 2002) (cf. table 2.10 in (2.5.2)) and (Versley, 2006) with respect to proper names and definite noun phrases. Proper names Definite noun phrases f-score (Strube et al., 2002) (Versley, 2006) 76.22% 87.5% 33.94% 46.9% Table 2.11.: Performance comparison of (Strube et al., 2002) and (Versley, 2006) So, Versley (2006) outperforms Strube et al. (2002) in the categories proper names and definite noun phrases. Conclusion Among the approaches presented in this diploma thesis, (Versley, 2006) firstly uses the TüBa-D/Z corpus for German coreference resolution. He implements hard and weighted soft constraints that are reminiscent of the prefilter features and the vector features used in SUCRE (cf. (4.1.2),(4.1.3)). Versley (2006) focusses on the feature design of proper names and definite noun phrases and thereby disregarding pronouns, whereas within this diploma thesis, SUCRE performs a full coreference resolution. Here again, the substring matching feature is more specified than just having an exact string match or a substring match. Versley (2006) checks for a common 4-gram between the heads of m1 and m2 . So, he also realized that a simple string matching feature is not expressive enough. One interesting soft constraint, Versley (2006) implements, is the checking for noun phrases with same concept but different modifiers (das blaue Auto vs. das rote Auto) using a heuristic of Vieira and Poesio (2000). This extension could also raise the performance of SUCRE, although the tripartite markable word access (i.e. first word, head word, last word) might be insufficient (e.g. for the prenominal adjectives). Versley (2006) uses a very impressive implementation of the so-called “coreference bridging”, including semantic classes and semantic relations by applying a distance measure for the hyponomy relation in GermaNet. This implemention can also be useful for detecting semantic relations in SUCRE (cf. (4.4.2)). By following Versley (2006), as discourse-given entities usually occur in the canonical subject position, an approximation of the information status can be done by checking whether m2 is a subject. This feature will be addressed again in chapter 5. 2.5.4. Optimization in Coreference Resolution Is Not Needed: A Nearly-Optimal Algorithm with Intensional Constraints - (Klenner and Ailloud, 2009) The architecture Klenner and Ailloud (2009)’s architecture contains two parts. The first part constitutes the memory-based pairwise classifier, TiMBL, (= Tilburg Memory-Based Learner), (Daelemans et al., 2003), whose output is the input for the second part, a modification of a Zero-One Integer Linear Programming system (ILP) based on the Balas Algorithm (Balas et al., 1965). This system is used as clustering method for searching for a global optimum of consistency of the resulting coreference partition. The corpus The data is extracted from the TüBa-D/Z corpus (Hinrichs et al., 2005b). They use a corpus version with about 1,100 German newspaper texts and about 25,000 sentences. In their statistics, Klenner and Ailloud 26 2.5. Coreference resolution for German - approaches in the past 10 years (2009) mention 13,818 anaphoric relations, 1,031 cataphoric relations (that is the case when the anaphor precedes the antecedent) and 12,752 true coreferent markable pairs. Considering the pronouns, there are 3,295 relative pronouns, 8,929 personal pronouns, 2,987 reflexive pronouns and 3,021 possessive pronouns. The feature set For the use of the pairwise classifier TiMBL, Klenner and Ailloud (2009) use the following features: • The distance in terms of sentences and markables • The part of speech tag of the markables’ heads • The grammatical functions (i.e. subject/object/. . . ) • The agreement in grammatical functions (e.g. <subject + subject>, <object + object>, . . . ) • String match between the markables’ heads • Which markable (if any) is a pronoun? • The word form, given that the part of speech tag of a markable is a pronoun • The salience of non-pronominal markables • The semantic class of the markables’ heads Figure 2.5.: Feature set used in (Klenner and Ailloud, 2009) Training instance creation As the dataset contain several long texts, the question comes up, how to create training samples (i.e. coreference and disreference links). Since, for example two personal pronouns er, one at the beginning and one at the end of a text, do not indicate coreference, given that there are not enough further mentions of that referent inbetween, building “a long chain of coreference ’renewals’ that lead somehow from the first” er to the second. Thus, the instance generation algorithm of Klenner and Ailloud (2009) uses a 3-sentence window within which the candidate pairs are generated. The need for global constraints A main advantage of the ILP approach is that “prescriptive linguistic knowledge” can be modeled to global constraints over a set of possible solutions. ILP finds the optimal result out of this set taking the global constraints into account. One such global constraint is transitivity (cf. 2.1, 2.3.2). The following example (a translation of an example given by Klenner and Ailloud (2009)) illustrates the need for the transitivity constraint, in particular when other pairwise restrictions like the binding theory are involved: (7) Erm1 erzählt ihmm2 , dass erm3 ihnm4 zutiefst bewundert. Based on the binding theory, m1 and m2 as well as m3 and m4 are disreferent. That can be modeled as a feature for a pairwise classifier (e.g. it checks for a violation of a binding constraint). One weak point of the pairwise classifier is its local perspective. It cannot control the consistency of three or more markables’ coreference state. So, the classifier can rightly predict m3 and m4 to be disreferent but it might falsely predict both (m1 and m3 ) and (m1 and m4 ) to be coreferent, since there 27 Chapter 2. The coreference resolution task and its progress in German is no binding constraint preventing that step. Thus, keeping in mind the symmetric nature of coreference (i.e. coref (m1 , m3 ) ⇔ coref (m3 , m1 )), there is need for the global constraint transitivity: (coref (m3 , m1 ) ∧ coref (m1 , m4 )) ⇒ coref (m3 , m4 ). But coref (m3 , m4 ) violates the binding constraint. Therefore, it is not possible that both m3 and m4 are coreferent with one and the same third markable. A brief once-over in ILP The basic idea of ILP is to minimize “a weighted linear function” (the so-called objective function) of variables xi : F (x1 , . . . , xn ) = w1 x1 + . . . + wn xn (2.7) In the case of Zero-One-ILP, the variables xi are binary (i.e. they can only be instantiated by 0 or 1). Balas’ approach (Balas et al., 1965) organizes the addends in F in a sorted way according to the weights wi . This way, two basic principles can be followed by minimizing the objective function: • A solution with as few 1’s as possible is preferred. • If a global constraint makes a variable xi getting 1, then the index i should be very small. The algorithm uses a depth-first strategy in the search tree and checks for any constraint violation “of the branches partially explored”. Additionally, it balances the minimal cost min so far and the cheapest solution yielded by following the current branch. If setting all variables xm for m > i to 0 returns a cheaper solution than min, it is usefull to follow that branch, otherwise, the algorithm uses backtracking for finding other solutions. This leads to an exponential run time complexity in the worst case. The constraint-based ILP model of Klenner and Ailloud (2009) Within the ILP framework, the probabilities for the markable pairs, that are returned by TiMBL, are used as weights. Klenner and Ailloud (2009) define the classification costs in formula 2.8, where |negij | is the number of negative samples that are similar to a markable pair hi, ji according to TiMBL’s metric, whereas |posij | is the respective number for positive examples: wij = |negij | |negij ∪ posij | (2.8) If there are few or even no negative instances, the cost wij is small, but if there are much more negative instances than positive ones, the cost gets high. Now, Klenner and Ailloud (2009) propose an objective function (fomula 2.9), where O0.5 is the set containing the pairs hi, ji with a weight less or equal to 0.5 and cij represents the binary variable of setting the markable pair hi, ji (i < j) to coreferent, whereas cji is the complement of cij (i.e. the boolean expression for the disreference of hi, ji). The weights for the disreference variables are the complement to 1.0 (i.e. (1 − wij )). This enables the penalization of setting cji to 1, when wij ≤ 0.5. min : X wij · cij + (1 − wij ) · cji (2.9) hi,ji∈O0.5 The global constraints Given the notions cij and cji , Klenner and Ailloud (2009) can describe some global constraints in terms of equations of sums of cij (cf. table 2.12). 28 2.5. Coreference resolution for German - approaches in the past 10 years # 1 2 3 4 Constraint Exclusivity Clause bound NP bound Transitivity 5 BEC Equation ∀hi, ji ∈ O0.5 (cij + cji = 1) ∀i, j(clause_bound(i, j) ⇒ cji ) ∀i, j(np_bound(i, j) ⇒ cji ) ∀i, j, k(i<j<k) (cij + cjk ≤ cik + 1) ∀i, j, k(i<j<k) (cik + cjk ≤ cij + 1) ∀i, j, k(i<j<k) (cij + cik ≤ cjk + 1) ∀j(pos(j) ∈ {P PP OSAT, P RELS} ⇒ cij ≥ 1) Description A pair hi, ji is either co- or disreferent. A pair hi, ji is disreferent if clause_bound A pair hi, ji is disreferent if np_bound (coref (i, j) ∧ coref (j, k)) ⇒ coref (i, k) (coref (i, k) ∧ coref (k, j)) ⇒ coref (i, j) (coref (j, i) ∧ coref (i, k)) ⇒ coref (j, k) Boundness enforcement constraint: Every rel. or poss. pronoun has an antecedent i Table 2.12.: The global ILP constraints in (Klenner and Ailloud, 2009) The exclusivity constraint enforces that any markable pair has to be either coreferent or disreferent. If it is not coreferent, it has to be disreferent and vice versa. For the case of binding theory, Klenner and Ailloud (2009) introduce two new predicates: clause_bound and np_bound. If two markables occur in the same subclause (see m3 and m4 in example (7)) and none of them is a reflexive pronoun, a possessive pronoun or is an apposition of the other, they are clause bound and thus disreferent. If two markables occur in the same noun phrase (e.g. ihri Autoj ), they are np bound and also disreferent. As shown before, the transitivity can be used for some shortcomings concerning the local perspective of a pairwise classifier. As symmetry cannot be modeled in terms of cxy (since cyx = ¬cxy ), the second global constraint, transitivity, is split up in three equations. One problem comes up by treating this n! constraint extensional, as one have to check it over the whole candidate set. This results in 3!(n−3)! ·3 equations, given n markables. The so-called boundness enforcement constraint forces an anaphor that constitutes a relative or possessive pronoun to be coreferent with at least one antecedent. Klenner and Ailloud (2009) figured out that in the case of a generic ILP model (Klenner and Ailloud, 2009; Althaus et al., 2004) most of the constraints in table 2.12 can be used intensionally and thus run time complexity can be reduced. For instance, one can treat transitivity intensionally by only maintaining evolving clusters: if the algorithm tries to add a new markable to an existing cluster, it has to be compatible with all members. Compatibility can be defined in terms of grammatical attributes and the binding theory: two markables are compatible, if they agree in several grammatical attributes like person, number or gender and if they are neither clause bound nor np bound. Thus, instead of checking all markable pairs, Klenner and Ailloud (2009) check the markable pairs only “on demand”. Optimization is not needed The experiments concerning the optimization described by Klenner and Ailloud (2009) are skipped, as they do not provide relevant information for this diploma thesis. In short, Klenner and Ailloud (2009) compare the result of the first iteration of Balas’ algorithm (i.e. “Balas-First”) with the following results in the optimization step with respect to the objective function in formula 2.9. They figured out “that in more than 90% of all cases, Balas-First already constitutes the optimal solution” , which means “that the time-consuming search for a less expensive solution ended without further success” (Klenner and Ailloud, 2009). Moreover, the F1 -score of the optimal solution was even slightly worse than the F1 score of Balast-First (about 0.086%). Therefore, Klenner and Ailloud (2009) argue that there is no need for optimization in coreference resolution using ILP. For more details on their argumentation and on their experiments leading them to this controversial conclusion, see Klenner and Ailloud (2009). 29 Chapter 2. The coreference resolution task and its progress in German Evaluation For the evaluation, Klenner and Ailloud (2009) use a 5-fold cross-validation with two variants: all markables vs. true markables. The results are computed for the evaluation score CEAF (cf. (2.6.3)) and compared with the TiMBL baseline, obtained by adding all markables appearing in a predicted coreference link to the same cluster. Table 2.13 shows the results. The improvement in terms of F1 -score in the variant all markables is 2.4% and in the variant true markables, it is even 7.43%. Here, Balas-First obviously outperforms the results of the sole pairwise classifier TiMBL. Precision Recall F-measure all markables TiMBL B-First 66.52 72.05 57.76 58.00 61.83 64.27 true markables TiMBL B-First 73.81 84.10 69.28 74.31 71.47 78.90 Table 2.13.: CEAF-Results in (Klenner and Ailloud, 2009) Conclusion The approach by Klenner and Ailloud (2009) impressively show the necessity of a good clustering method, when using a mention-pair model. As pairwise classifier, they apply the memory-based classifier TiMBL. Afterwards, they use an ILP framework for creating coreference chains that are in line with some global constraints which cannot be regarded in clustering methods like the best-first clustering, that SUCRE uses. For instance, the global constraints clause_bound cannot be captured by best-first clustering: given the markables m1 , m2 and m3 , where m2 and m3 are in the same subclause, and the classifier’s output: <(m3 ,m2 ),0.2>, <(m3 ,m1 ),0.6> and <(m2 ,m1 ),0.7>. Then, best-first clustering would chain m3 and m1 and then m2 and m1 . Transitively m2 and m3 end up in the same cluster and thereby violate a clause bound constraint. Therefore, a further improvement in SUCRE might be instead of best-first clustering the use of an ILP framework or any other clustering method providing the inclusion of global constraints. 2.5.5. Extending BART to provide a Coreference Resolution System for German - (Broscheit et al., 2010b) In contrast to coreference systems that concern English and those systems of Hartrumpf (2001) (2.5.1), Strube et al. (2002) (2.5.2), Versley (2006) (2.5.3), Klenner and Ailloud (2009) (2.5.4) and others, Broscheit et al. (2010b) provide a freely available system that enables researcher to explore new coreference techniques for German. The architecture of BART The architecture is based on the toolkit BART (Versley et al., 2008), that was originally implemented as a “modularized version” of (Versley, 2006) and others. It combines these state-of-the-art approaches with features based on syntax and semantics. Thus, its design is very modular and enables to separately do feature engineering with the use of different knowledge sources and improve coreference resolution as a machine learning problem. Broscheit et al. (2010b) extend BART for the use of German coreference resolution. 30 2.5. Coreference resolution for German - approaches in the past 10 years The corpus For the extension of BART for coreference resolution in German, there is need for a German dataset. For this reason, Broscheit et al. (2010b) use version 4 of TüBa-D/Z corpus (Hinrichs et al., 2005b). This version of TüBa-D/Z comprises 32,945 sentences with 144,942 markables. These markables constitute 52,386 coreference links and 14,073 clusters (Broscheit et al., 2010b). The classifier Broscheit et al. (2010b) use a pairwise classifier whose input are feature vectors of markable pairs as proposed by Soon et al. (2001). They apply several methods for classifiying markable pairs. Beside J48, they apply an implementation of the C4.5 decision tree learning algorithm, a Maximum entropy classifier and an architecture that contains a separate classifier for pronouns and non-pronouns (“split”). Training instance creation In the preprocessing step, they convert the corpus into the MMAX2 data format (Müller and Strube, 2006), that is used in BART. The markables and their grammatical attributes are extracted by using the information given in the parse trees (i.e. minimal and maximal noun projections, number, gender, person, semantic class, . . . ). Markables whose grammatical function is among the ones below are excluded from the final markable set: • Appositions and additional parts of a name (e.g. doctoral degree): [Ute Wedemeier]m1 ,[stellvertretende Vorsitzende der AWO]m2 becomes one markable with respective spans. • Expressions constituting predicates in a copula construction. Those have the dependency label PRED: [John]m1 ist [ein Bauer]m2 • NPs that are governed by the comparative or predicative conjunction als: [Peter]m1 arbeitet [als Bauarbeiter]m2 • Vorfeld-es and other non-referring es-pronouns [Ich]m1 finde [es]m2 schade, . . . Pronouns as it for English and es for German are very often non-referring. Broscheit et al. (2010b) create the feature vectors as described in Soon et al. (2001). The feature set Broscheit et al. (2010b) reimplement the feature set used in (Klenner and Ailloud, 2009) (cf. figure 2.5) with distance, part-of-speech, grammatical function and string matching. Features for binding theory, that had been implemented as ILP constraints in (Klenner and Ailloud, 2009), are reimplemented as features for the binary classifier. Additionally, they use the semantic class approach, proposed by Versley (2006) (cf. (2.5.3)). Broscheit et al. (2010b) use three methods for determining the semantic class: (1) they look up the semantic class in an appropriate lexicon like GermaNet; (2) in case of a proper name, the markables are checked for honorifics (e.g. Dr. phil.), organizational suffixes (e.g. GmbH) and the like. Finally, a gazetteer lookup is done (Broscheit et al., 2010b). (3) At the end, the markables are checked for morphological patterns like acronyms (e.g. CDU) or binnen-I gender-neutral forms (e.g. SchneiderInnen). Broscheit et al. (2010b) add some further features for the coreference in German: 1. 1/2 person: returns TRUE, if both markables are first or second person, and FALSE otherwise. 31 Chapter 2. The coreference resolution task and its progress in German 2. Speech: returns TRUE, if both markables are inside quoted speech, and FALSE otherwise. 3. Node distance: returns the number of clause and PP nodes along the path between the markables in the parse tree 4. Partial match: returns TRUE, if there is a substring match between the markable’s heads, and FALSE otherwise. 5. GermaNet relation: returns the relation between the markables in GermaNet (i.e. NOT_RELATED, SIGNIFICANTLY_RELATED or STRONGLY_RELATED) (for more details, see (Broscheit et al., 2010b)). The evaluation results In the creating of the final partition, Broscheit et al. (2010b) use closest-first clustering of the instances classified as coreferent. They apply the first 1100 documents from the TüBa-D/Z corpus and evaluate using 5-fold cross validation. The evaluation results of Broscheit et al. (2010b) are given in table (2.14): Feature set Best baseline (MaxEnt “split”) + 1/2 Person + Node distance + Partial match + GermanNet relation + all features + Klenner and Ailloud (2009) MUC scorer P R F1 75.6 80.8 78.1 76.2 80.9 78.4 75.7 80.9 78.2 77.8 81.3 79.5 76.4 80.6 78.5 78.4 82.2 80.2 - P 63.2 63.6 63.3 64.4 63.0 66.3 69.3 CEAF R 67.0 67.4 67.1 68.3 66.8 70.3 73.8 F1 65.0 65.4 65.1 66.3 64.8 68.3 71.5 Table 2.14.: Evaluation results of Broscheit et al. (2010b) This table shows that given the best baseline feature set using Maximum entropy classifier separated by markable class (“split”) the additional features increase the performance but nevertheless the system does not outperform the one developed by Klenner and Ailloud (2009) (cf. table 2.13). Conclusion Among SUCRE and four other systems, BART was participant on the SemEval-2010 coreference resolution competition (cf. (2.7)). It cannot be compared with SUCRE as it uses external information sources like GermaNet and gazetteer lookups, whereas SUCRE only performs “closed”. It is based on the TüBa-D/Z corpus and also uses semantic class information. The best performance is done with the maximum entropy classifier (rather than the decision tree method used in SUCRE). A possible prefilter used in (Broscheit et al., 2010b) is the check for predicates in a copula construction. Such a feature will be checked in chapter 5. Moreover, Broscheit et al. (2010b) mention an easy way for filtering non-referring es-pronouns from the markable set. However, this is not applicable in SUCRE and thus, based on this lack of annotation for es-pronouns, there will be false positives as described in (4.3.6). Broscheit et al. (2010b) implements the binding theory mentioned in (Klenner and Ailloud, 2009)(see 2.5.4) as a feature for the pairwise classifier. Such a feature based on the clause-bound property described in (Klenner and Ailloud, 2009) will be checked in chapter 5. Furthermore, the feature that checks, whether both markables are in first or second person, will also be implemented in chapter 5. 32 2.6. Evaluation scores The feature which checks, whether the respective markable is in quoted speech, cannot be implemented in SUCRE as the relational database model described in (3.2.3) does not provide information about the context of a markable (i.e. what tokens (e.g. quotation marks) are before or after). 2.6. Evaluation scores There is a trend towards using several scores for evaluating a coreference system. This has the advantage of working against the bias inherent in a particular score (Ng, 2010). None of the scores discussed below is fully adequate and their measures are not commensurate (Recasens and Hovy, 2010). Assume the following example (borrowed from (Recasens and Hovy, 2010)) of the output of a coreference system (figure 2.6) and the corresponding gold partition in (figure 2.7). Recasens and Hovy (2010) treat the term entity as a set of coreferent markables. Figure 2.6.: Example of a system partition Figure 2.7.: Example of a gold partition Given 14 true markables m1 , . . . , m14 , the coreference system returns singleton entities for m1 , m2 , m3 , m8 , m10 , m11 and m13 . It predicts the markables m4 to be coreferent with m6 and m5 to be coreferent with m12 . Moreover, the markables m7 , m9 and m14 occur in a three-markable entity. If this output is compared with the gold partition, there are some obvious errors: some links are missed and others are wrongly predicted. For example: the entity S9 misses the markable m14 , that is present in the corresponding true entity G11 . On the other hand, the predicted entity S10 contains this wrong assigned markable m14 . The disreferent markables m4 and m6 (cf. G4 and G5 ) are erroneously linked in S8 . The issue of evaluating the performance of such a coreference system is the uncertainty of how comparing the true set of entities (figure 2.7) with the predicted set of entities (figure 2.6). The following questions about evaluation lead to different evaluation measures (Recasens and Hovy, 2010): • Shall the measure focus on the number of correct coreference links? • Shall the measure use each equivalence class (entity) as the unit of evaluation? • Are singleton entities rewarded the same way as it is done for multi-markable entities? 2.6.1. MUC The official MUC-scoring (Message Understanding Conference) algorithm is developed by Vilain et al. (1995). It is the oldest and most widely used measure. MUC was defined within the MUC-6 and MUC-7 evaluation tasks on coreference resolution (Recasens and Hovy, 2010). Its f-score is the harmonic mean 33 Chapter 2. The coreference resolution task and its progress in German of precision and recall which are based on the identifcation of unique coreference links (Stoyanov et al., 2009): F1M U C = 2 · PM U C · R M U C PM U C + RM U C (2.10) MUC is based on the idea that the minimum number of links that is necessary for setting the predicted set of entities or the true set of entities (i.e. the sum of numbers of arcs in the spanning trees of the set’s implicit equivalence graphs (Vilain et al., 1995)) is the total number of markables minus the number of entities: minimum number of links (system) = pl = X (|Si | − 1) (2.11) (|Gi | − 1) (2.12) Si minimum number of links (gold) = tl = X Gi where Si and Gi refer to the predicted and true entities (cf. figure 2.6, 2.7). Based on these minimum numbers of links, pl and tl , MUC-score counts each necessary coreference link that occurs both in the predicted set of entities and in the true set of entities (i.e. the number of common links). “To obtain recall, this number is divided by the minimum number of links required to specify” the gold partition. “To obtain precision, it is divided by the minimum number of links required to specify” the predicted partition (Recasens and Hovy, 2010). Although the minimum number of needed links is constant, “there are combinatorially many [...] spanning trees for a given equivalence class” (Vilain et al., 1995). Thus, the notion of common links is not unique. One way of computing recall is proposed by Vilain et al. (1995): for each true entity Gi the number of missing links missed(Gi ) is computed as in equation (2.13): missed(Gi ) = |p(Gi )| − 1 (2.13) where p(Gi ) is the system’s partition of Gi (i.e. the set of predicted entities covering all markables in Gi ). The number of correct links in Gi , correct(Gi ) (cf. 2.12), is computed by |Gi | − 1. The recall for Gi is computed as stated in (2.14): RM U C (Gi ) = correct(Gi ) − missed(Gi ) correct(Gi ) Summing over all true entities, the overall MUC recall is given in (2.15) P (|Gi | − |p(Gi )|) Gi RM U C = tl (2.14) (2.15) For scoring precision, Vilain et al. (1995) compute the same score but switch their “notion of where the base sets come from”, i.e. they use Si instead of Gi in all equations: 34 missed(Si ) = PM U C (Si ) = PM U C = |p(Si )| − 1 correct(Si ) − missed(Si ) correct(Si ) P (|Si | − |p(Si )|) Si pl (2.16) (2.17) (2.18) 2.6. Evaluation scores This means, that MUC-precision (PM U C ) counts the number of links that has to be removed in order to get a graph in which no disreferent markables are connected, whereas MUC-recall (RM U C ) counts the minimum number of links that has to be added to ensure that all markables referring to a given entity are connected in the graph (Bengtson and Roth, 2008). The MUC-score has two weak points. Since it is based on the minimum number of missing resp. wrong links, the result is often counterintuitve: if one classifies a markable into the wrong entity, it will be penalized with one error in PM U C and RM U C . On the other hand, if one falsely merges two entities, it only counts as one error in RM U C , although it is further away from the real answer than the first case (Recasens and Hovy, 2010). This problem results in a too lenient penalization with systems that return “overmerged entities” (i.e. equivalence classes with too many markables). The second problem arises based on the fact that the MUC-score only considers coreference links. The score does not reward singleton clusters that are correctly identified, because there are no coreference links in these clusters (Ng, 2010). If one adds a singleton entity to the predicted set of entities, the MUC-score does not show any effect unless the added markable is misclassified in a multi-markable entity. 2.6.2. B3 (B-Cubed) The B3 (B-Cubed) f-measure (Bagga and Baldwin, 1998) scales the overlap of predicted clusters and true clusters. It is the harmonic mean of precision (P) and recall (R): PB 3 = RB 3 = 1 X X cm ( ( )) N pm d∈D m∈d 1 X X cm ( ( )) N tm (2.19) (2.20) d∈D m∈d F1B 3 = 2 · PB 3 · R B 3 PB 3 + R B 3 (2.21) where cm is the number of markables both appearing in the predicted and in the true cluster of m; pm is the number of markables in the predicted cluster of m and tm is the number of markables in the true cluster of m. The documents d are out of a document set D and N represents the number of markables in D. The B3 f-measure is able to measure the effect of singleton entities and penalizes the clustering of too many markables in the same entity. It gives more weight to the splitting or merging of larger entities. Moreover, B3 gives equal weights to all types of entities and markables. (Bengtson and Roth, 2008) Stoyanov et al. (2009) mention complications using B3 : it presumes that the gold standard and the coreference system response are clusterings over the same set of markables. But this is not absolutely true in the case when the system uses a markable detector (2.2) as it is the case with an end-to-end coreference system. Stoyanov et al. (2009) propose to tag each markable m with twin(m) if it appears both in the predicted partition and in the gold standard. Untagged markables are regarded as twinless. 3 They suggest two ways of using B3 with twinless markables. One way is called Ball : here, all markables are retained but for twinless extracted markables, the precision fraction is p1m and the recall fraction is 1 3 tm . The other way is called B0 : it rejects all twinless extracted markables but penalizes recall by setting the corresponding recall fraction to 0. Back to true markables, Recasens and Hovy (2010) describe the following shortcoming of B3 : the score “squeezes up” too high in the case there are many singletons. As it rapidly approaches 100%, there is little numerical space for comparing clusterings. Considering the high amount of singleton entities, this issue becomes more substantial: in the Spanish AnCora-Es corpus, about 86% of all markables are singletons and in the English ACE-2004 corpus, the proportion is about 61%. 35 Chapter 2. The coreference resolution task and its progress in German 2.6.3. CEAF The CEAF (Constrained Entity-Alignment F-measure) algorithm was developed by Luo (2005). He “considers that B3 can give counterintuitive results due to the fact that an entity can be used more than once when aligning the entities” in the predicted set of entities and the true set of entities (Recasens and Hovy, 2010). In example (2.6.5) the B3 -recall for the system’s output (c) (figure 2.11) is 100%, although the true entities have not been found. On the other hand, in the predicted partition (d) (figure 2.12) the precision is 100% although there are wrong predicted entities. In CEAF, as Luo (2005) argues, there is the best one-to-one mapping between the entities of both partitions. Every predicted cluster is mapped on at most one true entity. The best of such an alignment is the one that maximizes a given similarity measure. Depending on such a similarity measure, Luo (2005) distinguishes between the mention-based CEAF (CEAF-M) and the entity-based CEAF (CEAF-E). CEAF-M is the most widely used CEAF-score. It uses the φ3 similarity function employed by Luo (2005). In the case of true markables, the precision PCEAFM and the recall RCEAFM are identical. They correspond to the number of common markables between every two aligned entities divided by the total number of markables. Let φ be the alignment function that maps each predicted cluster Si on the most similar true entity and let N be the total number of markables. Then precision, recall and thereby f-score is defined as: PCEAFM = RCEAFM = F1CEAFM = X |Si ∩ φ(Si )| Si N (2.22) Recasens and Hovy (2010) mention that CEAF lacks a good dealing with singleton entities as B3 does. This can be seen in the fact that the B3 and CEAF results are higher than MUC with corpora that contain singleton markables. Another problem occurs with CEAF-E: in this way of alignment, correct coreference links might be ignored if the entity finds no corresponding entity in the true set of entities (Recasens and Hovy, 2010). A third problem is that all entites have equal weights, regardlessly of the number of markables they contain. This results in an equal penalization for a wrong entity composed of two small entities or composed of a small and a large entity (Recasens and Hovy, 2010). 2.6.4. BLANC The “BiLateral Assessment of Noun-Phrase Coreference” is a variation of the rand index (Rand, 1971) created to suit the coreference task addressing some observed shortcomings to obtain a fine granularity that allows a better discrimination between coreference systems (Recasens and Hovy, 2010). It rewards both coreference and disreference links by averaging the corresponding F-measures. It gives weight on singletons (the main problem with MUC) and does not inflate the score with the singleton’s presence as it is the case with B3 and CEAF (Recasens and Hovy, 2010). BLANC is based on the rand index, that divides the sum of all coreferent (N11 ) and disreferent (N00 ) links that come up both in the predicted set of entities and in the true set of entities (i.e. N00 + N11 ) by the number of all coreferent and disreferent links (i.e. the constant N (N2−1) , where N is the total number of markables). Rand = N00 + N11 N (N − 1)/2 (2.23) BLANC modifies this approach “such that every decision of coreferentiality is assigned equal importance” (Recasens and Hovy, 2010). This way, it addresses the disequilibria between coreferent markables and singletons. In contrast to other evaluation measures which have to compare partitions with different numbers of clusters (B3 ) or different numbers of coreference links (MUC), BLANC uses the fact that the 36 2.6. Evaluation scores number of coreference links together with the number of disreference links constitute a constant value across predicted set of entities and the true set of entities (Recasens and Hovy, 2010). There are two kinds of “decisions” that best describe the intuition of BLANC: • Coreference decision: 1. Coreference link (c): if the markable pair contains coreferent markables 2. Disreference link (d): if the markable pair contains disreferent markables • Correctness decision: 3. Right link (r): if the markable pair is coreferent or disreferent both in the predicted set of entities and the true set of entities 4. Wrong link (w): if the markable pair is coreferent in the predicted set of entities and disreferent in the true set of entities or vice versa These two decisions can be combined to a judged coreference system output of a markable pair, that resembles a binary classifier’s output (true-positive, true-negative, . . . ): rc, wd, wc, rd. Table (2.15) shows the BLANC confusion matrix containing these combinations. L corresponds to the constant number of markable pairs (i.e. coreference + disreference links): L= N (N − 1) = rc + wc + rd + wd 2 (2.24) Predicted set of entities True set of entities Sums Coreference Disreference Coreference rc wc rc + wc Disreference wd rd wd + rd Sums rc + wd wc + rd L Table 2.15.: The BLANC confusion matrix Considering the big amount of singletons the BLANC score is bilateral: the precision, recall and f-measure is computed separately for coreference links and disreference links. Finally, the average of both (i.e. the arithmetic mean) is the final score. Thus, neither coreference links nor disreference links contribute more than 50% to the final score (Recasens and Hovy, 2010). The formulas for the BLANC score are given in table (2.16): Score PBLAN C RBLAN C F1BLAN C Coreference rc Pc = rc + wc Rc = F1c = rc rc + wd 2 · Pc · R c Pc + R c Disreference rd Pd = rd + wd Rd = F1d = rd rd + wc 2 · P d · Rd Pd + Rd PBLAN C = Pc + Pd 2 RBLAN C = Rc + Rd 2 F1BLAN C = F1c + F1d 2 Table 2.16.: The BLANC scores 37 Chapter 2. The coreference resolution task and its progress in German In some baseline partitions like in the case when the predicted set of entities or the true set of entities contains only singletons or only a single entity, the denominators of Pc /Rd or Pd /Rc are zero and thus these scores are undefined. For this reason, there are some special variations: • If the predicted set of entities contains only a single entity and – the true set of entities also contains a single entity ⇒ F1BLAN C = 100% – the true set of entities contains only singletons ⇒ F1BLAN C = 0% – the true set of entities contains links of both types ⇒ Pd , Rd , F1d = 0% • If the predicted set of entities contains only singletons and – the true set of entities also contains only singletons ⇒ F1BLAN C = 100% – the true set of entities contains a single entity ⇒ F1BLAN C = 0% – the true set of entities contains links of both types ⇒ Pc , Rc , F1c = 0% • If the true set of entities contains both coreference and disreference links and – the predicted set of entities contains no right coreference link (rc = 0) ⇒ Pc , Rc , F1c = 0% – the predicted set of entities contains no right disreference link (rd = 0) ⇒ Pd , Rd , F1d = 0% • If the predicted set of entities contains both coreference and disreference links and – the true set of entities contains a single entity ⇒ F1BLAN C = Pc , Rc , F1c – the true set of entities contains only singletons ⇒ F1BLAN C = Pd , Rd , F1d There is still one weak point in BLANC that comes up when there are partitions near such baselines sketched above. Assume that all links in the true set of entities are disreferent but one. The predicted set of entities contains only disreferent links. Given a large set of markables, this should result in a good score. The issue is that BLANC uses equal importance for the two types of links and thus the single coreference link in the true set of entities gets equal weight as the disreferent ones. This leads to a too strict penalization. One way of solving this problem is the introduction of a weighted BLANC-score with a parameter α: BLAN Cα = α · Fc + (1 − α) · Fd (2.25) In the default version of BLANC, the α parameter would be 0.5. By increasing α, the weight for coreference links gets larger, by decreasing α (i.e. increasing (1 − α)) the weight for disreference links gets larger. For the case above, using α = 0.1 would relax the severity. 2.6.5. A comparative example (borrowed from (Luo, 2005)) In the following section, an example of a true partition and four system partitions will be presented. The coreference-link-based metric, M U C and the cluster-based metrics B 3 and CEAF as well as the coreference/disreference link averaged score (BLAN C) are used to measure the performance of each system. 38 2.6. Evaluation scores Figure 2.8.: True partition Figure 2.9.: System partition (a) Figure 2.10.: System partition (b) Figure 2.11.: System partition (c) Figure 2.12.: System partition (d) The comparison of the evaluation metrics when applied on the system outputs in figures (2.9) up to (2.12) with the true entities in figure (2.8) is given in table (2.17). MUC The MUC-score considers the missing or wrong coreference links. In system partition (a) and system partition (b) there are 9 links in common with the true partition. One further link is added and no link is missing. Thus, the measures will be the following for system partition (a): P PM U C = RM U C = F1M U C = (|Si | − |p(Si )|) Si pl P (|Gi | − |p(Gi )|) Gi tl 2 · PM U C · R M U C PM U C + RM U C = (5 − 1) + (7 − 2) 9 = = 90% 10 10 (5 − 1) + (2 − 1) + (5 − 1) 9 = = 100% 9 9 2 · 0.9 · 1 = ≈ 94.7% 0.9 + 1 = (2.26) (2.27) (2.28) 39 Chapter 2. The coreference resolution task and its progress in German and for system partition (b): P (|Si | − |p(Si )|) (10 − 2) + (2 − 1) 9 Si PM U C = = = = 90% pl 10 10 P (|Gi | − |p(Gi )|) (5 − 1) + (2 − 1) + (5 − 1) 9 Gi = = = 100% RM U C = tl 9 9 2 · 0.9 · 1 2 · PM U C · R M U C F1M U C = = ≈ 94.7% PM U C + RM U C 0.9 + 1 (2.29) (2.30) (2.31) In system partition (c) all markables refer to one entity. There are also 9 links in common with the true partition but two further links are added and again no link is missing: P (|Si | − |p(Si )|) PM U C = RM U C = F1M U C = Si pl P (|Gi | − |p(Gi )|) Gi = (12 − 3) 9 = ≈ 82% 11 11 (2.32) (5 − 1) + (2 − 1) + (5 − 1) 9 = = 100% 9 9 2 · 0.82 · 1 ≈ 90.0% = 0.82 + 1 = tl 2 · PM U C · R M U C PM U C + RM U C (2.33) (2.34) Here, it becomes apparent that overmerged clusters are underpenalized. In system partition (d) all markables are singletons, that refer to unique entities. There are no links in common, no further links added and all true coreference links are missing. In this case, all measures (precision, recall, f-score) are defined to be 0%. P (|Si | − |p(Si )|) PM U C = RM U C = F1M U C = Si pl P (|Gi | − |p(Gi )|) Gi tl = = (1 − 1) + . . . + (1 − 1) = 0% 0 (5 − 5) + (2 − 2) + (5 − 5) = 0% 9 2 · PM U C · RM U C = 0% P M U C + RM U C (2.35) (2.36) (2.37) B3 (B-Cubed) The B3 -score counts the overlapped markables for a given entity and returns the average over all markables. Thus, in contrast to MUC, the coreference links are ignored. Subsequently, the true entity containing (1, . . . , 5) is called Gold1 , the true entity containing 6 and 7 is called Gold2 and the one containing the other markables (8, . . . , C) is called Gold3 . In system partition (a), the true entites Gold2 and Gold3 are merged. This has no effect on the B3 ’s recall but on its precision: PB 3 = RB 3 = 1 X X cm 1 5 2 5 ( ( )) = 5· +2· +5· = N pm 12 5 7 7 d∈D m∈d 1 X X cm 1 5 2 5 ( ( )) = 5· +2· +5· = N tm 12 5 2 5 d∈D m∈d 40 1 64 · ≈ 76.19% 12 7 (2.38) 1 12 · = 100% 12 1 (2.39) 2.6. Evaluation scores F1B 3 2 · 0.7619 · 1 2 · PB 3 · RB 3 = ≈ 86.5% PB 3 + R B 3 0.7619 + 1 = (2.40) In system partition (b), the true entites Gold1 and Gold3 are merged. Again, this merging has no effect on the recall but on the precision. This time, the number of wrong assigned markables is greater. Thus, the precision is smaller than in the case above: PB 3 = RB 3 = 1 X X cm 1 5 2 5 1 ( ( )) = 5· +2· +5· = · 7 ≈ 58.33% N pm 12 10 2 10 12 d∈D m∈d 1 X X cm 1 5 2 5 1 12 ( ( )) = 5· +2· +5· = · = 100% N tm 12 5 2 5 12 1 (2.41) (2.42) d∈D m∈d F1B 3 2 · 0.5833 · 1 2 · PB 3 · RB 3 = ≈ 73.7% PB 3 + R B 3 0.5833 + 1 = (2.43) In system partition (c), every true entity (i.e. Gold1 , Gold2 and Gold3 ) is merged. Like in the cases before, the recall remains steady on 100%. The precision is again smaller than in the cases above: PB 3 = RB 3 = 1 5 2 5 1 54 1 X X cm )) = 5· +2· +5· = · = 37.5% (2.44) ( ( N pm 12 12 12 12 12 12 d∈D m∈d 2 5 1 X X cm 1 5 1 12 ( ( )) = · = 100% (2.45) 5· +2· +5· = N tm 12 5 2 5 12 1 d∈D m∈d F1B 3 2 · PB 3 · R B 3 2 · 0.375 · 1 = ≈ 54.5% PB 3 + RB 3 0.375 + 1 = (2.46) In system partition (d), there are no merged true entities. Quite the contrary - each predicted entity is a singleton. In this case, the precision is on 100% and the recall decreases rapidly: B3 P = RB 3 = 1 X X cm 1 1 1 1 5· +2· +5· = ( ( )) = N pm 12 1 1 1 d∈D m∈d 1 X X cm 1 1 1 1 ( ( )) = 5· +2· +5· = N tm 12 5 2 5 d∈D m∈d F1B 3 = 1 · 12 = 100% 12 (2.47) 1 · 3 = 25% 12 (2.48) 2 · PB 3 · RB 3 2 · 1 · 0.25 = = 40% PB 3 + R B 3 1 + 0.25 (2.49) To illustrate the effect of only singletons in the true set of entities, assume that figure (2.12) is the true set of entities and figure (2.11) is the system’s output. The true set of entities and the system’s output are completeley converse baselines that “correspond to very bad coreference resolution systems and, ideally, should be given low scores on an adequate evaluation metric” (Kobdani and Schütze, 2010b) (i.e. every markable is a singleton vs. every markable is coreferent with each other). Nonetheless, the B3 recall is 100% and the f-score achieves about 15,4%: PB 3 = RB 3 = 1 X X cm 1 1 1 ( ( )) = 12 · = · 1 = 0.0833% N pm 12 12 12 d∈D m∈d 1 X X cm 1 1 1 ( ( )) = 12 · = · 12 = 100% N tm 12 1 12 (2.50) (2.51) d∈D m∈d F1B 3 = 2 · P B 3 · RB 3 2 · 0.0833 · 1 = ≈ 15.4% PB 3 + RB 3 0.0833 + 1 (2.52) 41 Chapter 2. The coreference resolution task and its progress in German CEAF As CEAF uses a one-to-one alignment, there are predicted entities that have no corresponding true entity. These entities are marked below as N0 . In system partition (a), the entity containing (6, . . . , C) is most similar to the true entity Gold3 . Thus, the two markables 6 and 7 are ignored. The CEAF-M value for precision and recall and thus also for the F1 -score is: X |Si ∩ φ(Si )| 5 0 5 10 = + + = ≈ 83.33% (2.53) P/R/F1 CEAFM = N 12 12 12 12 Si In system partition (b), the entity containing (1, . . . , 5, 8, . . . , C) is most similar Gold1 or Gold3 . The predicted cluster containing the two markables 6 and 7 is aligned to Gold2 and, since CEAF uses a one-to-one alignment, Gold1 or Gold3 is ignored. The CEAF-M values are: X |Si ∩ φ(Si )| 5 2 0 7 = + + = ≈ 58.33% (2.54) P/R/F1 CEAFM = N 12 12 12 12 Si In system partition (c), all true entities are merged. Thus, only one true entity can be aligned with the merged predicted entity. As this can be the true entities with most markables (i.e. Gold1 or Gold3 ), this time, seven markables will be ignored: X |Si ∩ φ(Si )| 5 0 0 5 P/R/F1 CEAFM = = + + = ≈ 41.7% (2.55) N 12 12 12 12 Si In system partition (d), every predicted entity is a singleton cluster. This is most problematic for the one-to-one alignment in CEAF as only three markables can be considered: X |Si ∩ φ(Si )| 1 0 3 P/R/F1 CEAFM = = 3· +9· = = 25% (2.56) N 12 12 12 Si BLANC Since BLANC uses both coreference links and disreference links, one convenient way of computing the values rc, rd, wc, wd is using a script. In appendix C, the python code for computing these values is presented. In system partition (a), Gold2 and Gold3 are merged. Therefore, all true coreference links are present but further false coreference links are added. Thus, the recall of coreference links is 100% and the precision is less. With respect to the disreference links, no true disreference link is a predicted coreference link. Hence, the precision of disreference links is 100% and the recall is less because of the two merged true entities: Pc = Rc = F1c = Pd 42 = rc 21 = ≈ 67.742% rc + wc 21 + 10 rc 21 = ≈ 100% rc + wd 21 + 0 2 · Pc · R c 2 · 0.67742 · 1.0 = ≈ 80.77% Pc + Rc 0.67742 + 1.0 rd 35 = ≈ 100% rd + wd 35 + 0 (2.57) (2.58) (2.59) (2.60) 2.6. Evaluation scores Rd = F1d = rd 35 = = 77.78% rd + wc 35 + 10 2 · Pd · Rd 2 · 1.0 · 0.7778 = ≈ 87.5% Pd + R d 1.0 + 0.7778 (2.61) (2.62) 0.8077 + 0.875 F1c + F1d = ≈ 84.41% (2.63) 2 2 System partition (b) shows the same effect as system partition (a) does: two true entities are merged and so the recall of coreference links and the precision of disreference links is 100%, whereas the corresponding precision and recall is less: BLAN C = Pc = Rc = F1c = Pd = Rd = F1d = 21 rc = ≈ 45.65% rc + wc 21 + 25 rc 21 = ≈ 100% rc + wd 21 + 0 2 · 0.4565 · 1.0 2 · Pc · Rc = ≈ 62.68% Pc + R c 0.4565 + 1.0 rd 20 = ≈ 100% rd + wd 20 + 0 20 rd = = 44.44% rd + wc 20 + 25 2 · P d · Rd 2 · 1.0 · 0.4444 ≈ 61.53% = Pd + Rd 1.0 + 0.4444 (2.64) (2.65) (2.66) (2.67) (2.68) (2.69) F1c + F1d 0.6268 + 0.6153 = ≈ 62.11% (2.70) 2 2 In system partition (c), the predicted set of entities comprises only one single entity and the true entity set contains both coreference and disreference links. Thus, as described in (2.6.4), Pd , Rd and F1d get zero. As all markable pairs in the predicted set of entities constitute a coreference link, the recall of coreference links is 100%: rc 21 Pc = = ≈ 31.82% (2.71) rc + wc 21 + 45 21 rc = ≈ 100% (2.72) Rc = rc + wd 21 + 0 2 · Pc · Rc 2 · 0.3182 · 1.0 F1c = = ≈ 48.28% (2.73) Pc + R c 0.3182 + 1.0 BLAN C = F1c + F1d 0.4828 + 0 = = 24.14% (2.74) 2 2 In system partition (d), the predicted set of entities consists of only singleton entities but the true partition does also contain coreference links. In this special case, the scores Pc , Rc and F1c get zero. As the predicted partition only comprises disreference links, the recall of disreference links is 100%: BLAN C = Pd = Rd = F1d = 45 rd = ≈ 68.18% rd + wd 45 + 21 rd 45 = = 100% rd + wc 45 + 0 2 · P d · Rd 2 · 0.6818 · 1.0 = ≈ 81.08% Pd + Rd 0.6818 + 1.0 (2.76) F1c + F1d 0 + 0.8108 = = 40.54% 2 2 (2.78) BLAN C = (2.75) (2.77) 43 Chapter 2. The coreference resolution task and its progress in German Comparison of the four evaluation metrics System response (a) (b) (c) (d) MUC-F1 94.7 94.7 90.0 0.0 B3 -F1 86.5 73.7 54.5 40.0 CEAF 83.3 58.3 41.7 25.0 BLANC 84.41 62.11 24.14 40.54 Table 2.17.: Comparison of evaluation metrics (Luo, 2005) When evaluating system partitions (a) and (b), MUC scores 94.7%, indicating wrong coreference links. It does not make a distinction what entities (i.e. coreference chains) are falsely connected. This distinction is made by the other three scorers. As B3 counts the overlapped markables for the entity containing a certain markable, the partitions (a) and (b) show no negative effect on the recall but on the precision. The precision and thereby the f-score is smaller in (b) than in (a), as the number of falsely assigned markables is greater. This bias is also discernable with CEAF and BLANC. CEAF uses a oneto-one alignment of entities and thereby ignores those alignments with the smaller markable overlap. Generally, CEAF scores these partitions more critically than B3 . BLANC rewards both coreference and disreference links. It scores (a) better than (b), as the number of false links is smaller in (a) than in (b). So, MUC is the only scorer that is not able to give (b) more penalty (e.g. for merging more incompatible markables) than (a). The two system baseline partitions (c) and (d) (all in one cluster; all singletons) show the most divergent scores. As MUC only regards coreference links, a cluster comprising all markables has only two wrong coreference links (compared to the gold partition). Therefore, MUC scores 90.0%. In the case of only singletons, there are no coreference links and MUC does not work. Here, it is defined to be 0.0%. Other than MUC, B3 is able to evaluate these baselines in a more realistic manner: (c) is not evaluated very good and (d) gets a score greater than 0. As in the partitions (a) and (b), again, CEAF scores more critically and give (c) and (d) scores that are about 13.0 − 15.0 smaller than with B3 . BLANC scores these baselines different. As the disreference links outnumbers the coreference links, the balanced BLANC (α = 0.5) gives more penalty to (c) than to (d), as there are more misclassified links in (c) as in (d). 2.7. SemEval-2010 Task 1: Coreference Resolution in Multiple Languages The subsequent section gives a brief overview about the SemEval-2010 task 1, described in (Recasens et al., 2010) with a slight focus on the participants for German, in particular SUCRE. All information about the competition is extracted from (Recasens et al., 2010). The main goal of SemEval-2010 task 1: “Coreference Resolution in Multiple Languages” was to perform and evaluate the results of coreference resolution for six languages: Catalan, Dutch, English, German, Italian and Spanish. There were four evaluation settings based on the properties closed/open and gold-standard/regular. SemEval-2010 provides the four most commonly used evaluation scores (cf. (2.6)): MUC, B3 , CEAF and BLANC. Questions that should be answered with SemEval’s results are: 1. Is it possible to construct a coreferent resolver for several languages without a huge amount of tuning? 44 2.7. SemEval-2010 Task 1: Coreference Resolution in Multiple Languages 2. Is an optimal linguistic annotation a must for getting good coreference resolution results or is automatically annotated information sufficient? 3. To what extent are the different evaluation scores similar and do these provide the same ranking? The four evaluation settings that are used mean: Closed: In this setting, the systems are only allowed to use the information given within the dataset. Open: The participants are allowed to use external resources like Wikipedia or WordNet to improve the preprocessing information used for the coreference resolution task. Gold-standard: Here, the systems used information from the gold-standard annotation of grammatical attributes like lemma, part-of-speech, dependency relations or morphological features like gender, number, case, . . . and additionally true markables. Regular: The systems only used the information extracted from automatic predictors. For example, for German, the lemmas were predicted by TreeTagger (Schmid, 1995), the part-of-speech and the morphology by RFTagger (Schmid and Laws, 2008) and the dependency relations by MaltParser (Hall and Nivre, 2008). For this purpose of coreference resolution, results have been submitted by six participants: SUCRE (Kobdani and Schütze, 2010a), RelaxCor, TANL-1 (Attardi et al., 2010), UBIU (Zhekova and Kübler, 2010), Corry-(B,C,M) and BART (Broscheit et al., 2010a). For each language a unique corpus with different annotation is used (table 2.18). Language Catalan and Spanish Dutch English German Italian Corpus The AnCora corpora: a Catalan and Spanish treebank of 500k words, source: newspapers The KNACK-2002 corpus: 267 documents from the Flemish weekly magazin “Knack” The OntoNotes Release 2.0 corpus: 300k words from The Wall Street Journal and 200k words from the TDT-4 collection The TüBa-D/Z corpus: 794k words from the newspaper “die tageszeitung (taz)” The LiveMemories corpus: texts from the Italien Wikipedia, blogs, news articles and the like Table 2.18.: Corpora used for each langauge in SemEval-2010 The corpora have been transformed into a specific data format in order to get a common representation across all six languages. One excerpt of the task dataset is given in figure 2.13. This data format contains several columns with gold-standard and regular annotation with morphological, syntactic and semantic information: • The first column corresponds to the token-ID. • The second column is the word form of the token. • Columns 3/4 correspond to the gold/automatic annotation of the lemma. • Columns 5/6 correspond to the gold/automatic annotation of the part-of-speech-tag. • Columns 7/8 correspond to the gold/automatic annotation of some morphological features like case, number or gender. 45 Chapter 2. The coreference resolution task and its progress in German 1 Frau Frau Frau NN NN cas=n|num=sg|gend=fem cas=n|num=sg|gend=fem 3 3 SUBJ SUBJ _ _ _ _ (815 2 K. K. K. NE NE cas=n|num=sg|gend=fem cas=n|num=sg|gend=* 1 1 APP APP _ _ _ _ 815) 3 hörte hören hören VVFIN VVFIN _ per=3|num=sg|temp=past|mood=ind 0 0 ROOT ROOT _ _ _ _ _ 4 zu zu zu PTKVZ PTKVZ _ _ 3 3 AVZ AVZ _ _ _ _ _ 5 . . . $. $. _ _ 4 4 -PUNCT--PUNCT-_ _ _ _ _ Figure 2.13.: A sentence in the original SemEval-2010 task dataset • Columns 9/10 correspond to the gold/automatic annotation of the ID of the syntactic head (this is 0 in the case of a tree root). • Columns 11/12 correspond to the gold/automatic annotation of the dependency relation to head described in columns 9/10. • Columns 13/14 correspond to the gold/automatic annotation of the named entity type in open-close notation (if available). • Columns 15/16 correspond to the gold/automatic annotation of the predicate semantic class (if available). • The last column corresponds to the coreference relation in the open-close notation. For example, in figure 2.13, the markable 815 starts with token 1, “Frau”, and ends with token 2, “K.”. Thereafter, the dataset is divided into training set, development set and test set. Table (2.19) shows the sizes of these sets for the German corpus, TüBa-D/Z. Training set #documents #sentences 900 19,233 Development set #documents #sentences 199 4,129 Test set #documents #sentences 136 2,736 #tokens 331,614 #tokens 73,145 #tokens 50,287 Table 2.19.: The training, development and test set of TüBa-D/Z in SemEval-2010 The systems have different architectures and machine learning methods. Table 2.20 compares the participants in terms of differences in architecture for German. The second column indicates the system architectures. BART uses closest-first clustering whereas SUCRE uses best-first clustering. Further description to SUCRE is given in chapter 3. One significant difference between BART and the other mentioned systems for German is the usage of external resources. BART uses GermaNet and gazetteers and the others do not use such resources at all. As a first step in the evaluation, two baseline systems were analyzed with respect to the language (i.e. the corpus). Table 2.21 shows these scores for German and English. 46 2.7. SemEval-2010 Task 1: Coreference Resolution in Multiple Languages BART SUCRE TANL-1 UBIU System architecure Closest-first model Best-first clustering, relational database model and Regular feature definition language Highest entity-mention similarity Pairwise model ML Methods MaxEnt Decision trees, Naive Bayes, SVM and MaxEnt External Resources GermaNet & gazetteers - MaxEnt - MBL - Table 2.20.: Comparison of architectures of BART, SUCRE, TANL-1 and UBIU in SemEval-2010 CEAF MUC R P F1 R P F1 R S INGLETONS : Every markable constitutes each a single entity German 75.5 75.5 75.5 0.0 0.0 0.0 75.5 English 71.2 71.2 71.2 0.0 0.0 0.0 71.2 A LL - IN - ONE : All markables are grouped together into one cluster German 8.2 8.2 8.2 100 24.8 39.7 100 English 10.5 10.5 10.5 100 29.2 45.2 100 B3 P BLANC P F1 F1 R 100 100 86.0 83.2 50.0 50.0 49.4 49.2 49.7 49.6 2.4 3.5 4.7 6.7 50.0 50.0 0.6 0.8 1.1 1.6 Table 2.21.: The baseline scores for German and English in SemEval-2010 These show some limitations of the used evaluation scores. These have been further described in (2.6). The differences in these baseline scores reveals differences in the distribution of the entities in the respective corpus. Kobdani and Schütze (2010b) describe this indication as follows: “the system tendency to incorrectly generate larger clusters is penalized in B3 and CEAF-M metrics, and to incorrectly generate singleton clusters is penalized in MUC metric”. This means, in the case of German and English, the corpora TüBa-D/Z and OntoNotes turn out to be slightly different in the entities’ distribution: As in the S INGLE TON -baseline the scores are slightly better in German, TüBa-D/Z contains more singleton entities than OntoNotes. On the other hand, in the case of the A LL - IN - ONE-baseline, the values for English are better and thus OntoNotes contains more coreferent markables. The baseline scores were hard to beat by the participating system. Table 2.22 shows the results of SemEval-2010 for German. Here, SUCRE, TANL-1 and UBIU only participate in the closed setting, whereas BART only participates in the open setting. Therefore, SUCRE can only be compared with TANL-1 and UBIU. It turns out that SUCRE performs best in closed × regular for the languages English, German and Italian. Surprisingly, SUCRE did not outperform the values in the S INGLETON-baseline (cf. tables 2.21 and 2.22) for CEAF (72.9 vs. 75.5) and B3 (81.1 vs. 86.0). TANL-1 usually wins with respect to these scores. Considering the three posed questions above, the following results have been discovered: 1. With respect to the language, English is the one that has most participants with fifteen entries. German comes second place with eight entries (cf. table 2.22). Less entries have Catalan/Spanish, Italian and Dutch. English was the winner in ranking the overall results, followed by German on the second place. Reasons for this ranking are the differences in the respective corpora (e.g. the size; here, German has the largest corpus) and the fact that most systems are originally developed for English. 47 Chapter 2. The coreference resolution task and its progress in German R German closed × gold SUCRE 72.9 TANL-1 77.7 UBIU 67.4 closed × regular SUCRE 60.6 TANL-1 50.9 UBIU 39.4 open × gold BART 67.1 open × regular BART 61.4 CEAF P F1 72.9 77.7 68.9 R B3 P F1 R 58.4 25.9 21.9 90.4 77.2 73.7 73.6 96.7 77.9 81.1 85.9 75.7 78.2 54.4 60.0 61.8 75.1 77.2 66.4 57.4 64.5 35.0 31.5 11.4 40.9 15.4 10.4 69.1 47.2 41.2 60.1 54.9 53.7 64.3 50.7 46.6 52.7 50.2 50.2 59.3 63.0 54.4 53.6 44.7 48.0 70.5 40.1 51.1 85.3 64.4 73.4 65.5 61.0 62.8 61.4 36.1 45.5 75.3 58.3 65.7 55.9 60.3 57.3 R MUC P F1 72.9 77.7 68.2 74.4 16.4 22.1 48.1 60.6 21.7 59.2 48.2 51.9 59.9 49.5 44.8 49.3 10.2 9.5 66.7 66.9 61.2 61.3 BLANC P F1 Table 2.22.: Official results of SemEval-2010 for German 2. As the gold-standard performed significantly better than regular, an optimal linguistic annotation turns out to be necessary for good coreference resolution results. However, Recasens et al. (2010) relativizes this insight as a direct effect on the markable detection. This falls rapidly in the regular setting. RelaxCor for English is the only participant that reveals a slight improvement by using external resources (open). The corresponding values are given in table 2.23. Therefore, if SUCRE’s annotation is extended with several external resources like GermaNet, there also might be a slight improvement. 3. The rankings of the participants differ with respect to the considered evaluation score. For example, in German, closed × gold (cf. table 2.22), with respect to CEAF and B3 the ranking is: TANL1>SUCRE>UBIU, whereas with respect to MUC, SUCRE outperforms both: SUCRE>TANL1>UBIU and finally with respect to BLANC, TANL-1 is the weakest: SUCRE>UBIU>TANL-1. In general, there is a correlation between CEAF and B3 but a lack of correlation between CEAF and MUC in respect of recall. Therefore, the evaluation score has to be defined appropriately or combined with others (cf. MUC-B3 -score in chapter 4). R English closed × gold RelaxCor 75.6 open × gold RelaxCor 75.8 CEAF P F1 75.6 75.8 R B3 P F1 R 33.7 74.8 97.0 84.5 57.0 83.4 61.3 34.2 75.2 96.7 84.6 58.0 83.8 62.7 R MUC P F1 75.6 21.9 72.4 75.8 22.6 70.5 Table 2.23.: closed vs. open in SemEval-2010 48 BLANC P F1 CHAPTER 3 The SUCRE system The term SUCRE (on German SÜKRE) is an acronym for the German title Semi-Überwachte KoReferenz-Erkennung which is in English semi-supervised coreference resolution (Kessler, 2010). This chapter is addressed to this coreference system and is organized as follows: in the first section, some main facts about the SUCRE project are presented. The second section introduces the architecture of SUCRE as described in (Kobdani and Schütze, 2010a). The third section shows a short overview of the idea of Self Organizing Maps (SOMs) for the visualization of coreference features. In the fourth section, the multi-lingual aspect of SUCRE is presented. Here, Kobdani and Schütze (2010b) show that the architecture of SUCRE providing a relational database model and a regular definition language is capable of implementing features that can be used in several languages. The fifth section repeats and summarizes the results of SUCRE in SemEval-2010, introduced in (2.7). Finally, the sixth section presents the dataset extracted from the TüBa-D/Z corpus, that is used for German coreference research in SUCRE. 3.1. The project The SUCRE project is financed by the German Research Foundation (Deutsche Forschungsgesellschaft (DFG)). The project heads are Prof. Dr. Hinrich Schütze, Prof. Dr. Hans Kamp and Prof. Dr. Gunther Heidemann. Two institutes of the University of Stuttgart1 take part in this project: the Institute for Natural Language Processing2 (Institut für Maschinelle Sprachverarbeitung (IMS)) and the Institute for Visualization and Interactive Systems3 (Institut für Visualisierung und Interaktive Systeme (VIS)). The beginning of the project was in September 2009 and has a first duration of two years (Kessler, 2010). One goal of the project described by Kessler (2010) is the progress in interactive visualization of coreference features. As described in (3.3) the visualization with Self Organizing Maps simplifies the semi-supervised annotation of large documents. Additionally, insights for new features can be drawn out of the visualization results (cf. (3.3)). Kessler (2010) presents the modules of the project (figure 3.1), where the visualization module has been worked out by the Institute for Visualization and Interactive Systems, the feature extraction module is part of both Institutes’ works and the remaining modules are tasks of the Institute for Natural Language Processing. In the subsequent chapter, the focus lies on the modules developed by the Institute for Natural Language Processing. 1 http://www.uni-stuttgart.de 2 http://www.ims.uni-stuttgart.de 3 http://www.vis.uni-stuttgart.de 49 Chapter 3. The SUCRE system Figure 3.1.: The module architecture in the SUCRE project 3.2. The architecture Beside the resolution of nouns and pronouns, SUCRE performs full coreference resolution. The unique architecture of SUCRE reveals a new approach to feature engineering of coreference resolution with the use of a relational database model and a regular feature defintion language (Kobdani and Schütze, 2010a). SUCRE enables a flexible feature engineering by converting a raw text within a preprocessing step into a relational database model. This model provides fast and flexible ways of implementing new features (Kobdani et al., 2010). Feature engineering takes an important part in coreference resolution. Thus, a system with which researchers can additionally implement features using a regular definition language has a great advantage concerning effort. It is possible to extract features from the text as well as to import external features (e.g. semantic relationships from an ontological information source like GermaNet). The modular architecture provides a clear partition of data storage, feature engineering and machine learning algorithms (Kobdani et al., 2010). As a result, SUCRE allows to use any externally available classification method (Kobdani and Schütze, 2010a). Figure 3.2.: The coreference architecture of SUCRE Figure 3.2 shows the architecture of the full coreference resolution task that ends up in a coreference partition (“Markable Chains”). The architecture can be divided into two main steps: preprocessing (3.2.1) and coreference resolution (3.2.4). The latter comprises the “Pair Estimation” (figure 3.3) and the “Chain Estimation” (the final Decoding-step). Figure 3.3.: The pair estimation of SUCRE 50 3.2. The architecture 3.2.1. Preprocessing In the preprocessing step the text corpus is processed and transformed into the relational database model (Kobdani et al., 2010). There are two kinds of preprocessing executions: prelabeled and unlabeled (Kessler, 2010). In the first case, the system uses an annotated corpus and extracts all information out of it. In the second case, all information is gained by using NLP tools like a tokenizer or a part-of-speech tagger, as it has been the case in the regular evaluation settings provided in SemEval-2010 (cf. (2.7)). In the course of the feature engineering provided in this diploma thesis, true markables with gold annotation are used instead. Preliminary text conversion Here, the raw input text is transformed into a format in which tokens (i.e. words, punctuation marks, . . . ) and sentences are recognized and marked up. • Tokenization • Sentence boundary detection Extracting atomic word features Here, the atomic word features (cf. (3.2.2)) are extracted from the tokens identified in the previous step. For instance: • Part-of-speech tags • Lemmas • Grammatical/natural gender • Grammatical number • Parse information Markable detection In this step, the markables are identified based on the previous step (e.g. all noun phrases from the parse information are regarded as potential markables (Kobdani and Schütze, 2010b)). For more details on the issue of markable detection, see (2.2). Extracting atomic markable features After all markables are identified, for each markable its atomic markable features (cf. (3.2.2)) are extracted using the information from the previous steps (e.g. atomic word features). • Named Entity • Alias • Syntaxtic role • Semantic class 51 Chapter 3. The SUCRE system 3.2.2. Features in SUCRE There are two kinds of features available in SUCRE: atomic features and link features. Atomic Features: SUCRE defines atomic features for words and markables. Atomic word features are the position in the corpus, the numeric ID of document, paragraph or sentence. In addition to that, atomic word features might be the part-of-speech tag (e.g. NN for common noun), the grammatical or natural gender (i.e. male, female, neuter), the number (e.g. singular or plural), the semantic class, the word type (e.g. in the case of a pronoun, the type of pronoun), the case (nominative, genitive, dative or accusative) or the person (i.e. first, second, third). Atomic markable features can be the number of words in the markable, named entity (i.e. whether the markable is a proper name), alias (i.e. whether the markable constitutes an alias form e.g. an acronym), the syntactic role (e.g. the subject or object) or the semantic class (e.g. person, organization, event, . . . )(Kobdani and Schütze, 2010a). Link Features: Link features are defined over a pair of markables m1 and m2 . In many cases, the most important word in a markable is its head word (e.g. pronouns), but sometimes the head word is not expressive enough for resolving a markable, for example in the markable pair (das Buchm1 , ein Buchm2 ), the distinguishing feature is the beginning word, or with the markable pair (ein Student aus Deutschlandm1 , ein Student aus Frankreichm2 ) it is the last word, that differs and indicates disreference. In some cases, all words in a markable has to be considered. SUCRE’s feature definition language uses keywords to specify the word selection of a markable: m1 and m2 refer to the first and second markable, m1b and m2b, m1h and m2h and m1e and m2e refer to the first word, the head word and the last word in the first and second markable. More details on markable keywords are provided in appendix B, (B.1). Some of the functions for each keyword are exact- and substring matching (case-sensitive and case-insensitive), edit distance, alias, word relations, markable parse tree path and absolute value (Kobdani and Schütze, 2010a). Some examples of link features (extracted from the feature set given for feature engineering in German coreference resolution in section 4.1) are presented in example (8): (8) a. b. c. {abs(m2b.stcnum-m1b.stcnum)>1} → The distance between the two markables is bigger than one sentence {alias(m1h,m2a)||alias(m1a,m2h)} → The head of one markable is an alias of the other markable {(seqmatch(m1h,m2h))} → Exact string match of both markables’ heads Further information about the feature definition language is given in appendix B. 3.2.3. The Relational Database Model The result of the preprocessing step (3.2.1) is a relational database model. It is a common structure that includes all data used for the coreference resolution, for instance: the text corpus, the results of the preprocess, relations between textual entities, classification results and the like. As it is usually the case in NLP, here, the values of attributes of textual entities and the relationship between those entites form the base for features. Thus, the relational database model constitutes the “natural formalism for supporting the definition and extraction of features” (Kobdani et al., 2010). A minimal model for running the system consists of three tables: word table (table 3.1), markable table (table 3.2) and link table (table 3.3). In the word table, the Word-ID constitutes the word’s index (its numeric ID), i.e. the position of the token in the corpus. This ID uniquely identifies the word. Foreign keys are Document-ID, Paragraph-ID 52 3.2. The architecture Word Table Word-ID Primary Key Document-ID Foreign Key Paragraph-ID Foreign Key Sentence-ID Foreign Key Word-String Attribute Word-Feature-0 Attribute Word-Feature-1 Attribute .. .. . . Word-Feature-N Attribute Markable Table Markable-ID Primary Key Begin-Word-ID Foreign Key End-Word-ID Foreign Key Head-Word-ID Foreign Key Markable-Feature-0 Attribute Markable-Feature-1 Attribute .. .. . . Markable-Feature-N Attribute Table 3.2.: Markable Table Table 3.1.: Word Table Link Table Link-ID Primary Key First-Markable-ID Foreign Key Second-Markable-ID Foreign Key Coreference-Status Attribute Confidence-Status Attribute Table 3.3.: Link Table and Sentence-ID. They point to the primary keys of the corresponding document, paragraph or sentence table. Containing the word ID and the word string, the word table is able to reconstruct the raw text of the corpus, as it knows the words’ linear order, and any other (tagged) format using the word features (part-of-speech tag, number, gender, . . . ). The word features can be defined and added to the word table in the preprocessing step. Figure 3.4 shows an excerpt of the word table created for the TüBa-D/Z corpus. The information in the columns are ordered: word-ID, word string, document-ID, paragraph-ID, sentence-ID, part-ofspeech tag, number, gender, case, person. 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 Für diese Behauptung hat Beckmeyer bisher keinen Nachweis geliefert . 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 65 65 65 65 65 65 65 65 65 65 APPR PDAT NN VAFIN NE ADV PIAT NN VVPP $. unknown unknown singular unknown singular unknown unknown singular unknown unknown unknown unknown female unknown male unknown unknown male unknown unknown unknown unknown accusative unknown nominative unknown unknown accusative unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown Figure 3.4.: Example of a word table in the TüBa-D/Z corpus In the markable table, the Markable-ID is the primary key constituting the unique index of each markable. The Begin-Word-ID, Head-Word-ID and End-Word-ID are foreign keys, i.e. they refer to the primary keys in the word tables corresponding to the first word, the head word and the last word in the markable. Like the word features, the markable features can be defined and added in the preprocessing 53 Chapter 3. The SUCRE system step. Figure 3.5 shows the markable table of the three markables in the word table in figure 3.4. The information in the columns are ordered: markable-ID, document-ID, word-ID of the first word, word-ID of the last word, word-ID of the head word, named entity class. For instance, markable m323 corresponds to the proper name Beckmeyer. Therefore, the first, last and head word are the same. As there is no named entity classification available for TüBa-D/Z, the last column is always annotated as unknown. 322 323 324 4 4 4 1203 1206 1208 1204 1206 1209 1204 1206 1209 unknown unknown unknown Figure 3.5.: Example of a markable table in the TüBa-D/Z corpus In the link table, the Link-ID is the primary key. It discriminates the link from others. The foreign keys are the markable-IDs of the first and second markable that are connected via the respective link. They refer to the primary keys in the corresponding markable tables (Kobdani and Schütze, 2010a). The coreference status and the confidence status indicate if the connected markables are coreferent or disreferent and how confident this classification is. Figure 3.6 shows an excerpt of the link table with the markable m323 mentioned in the word and markable table above. The information in the columns are ordered: link-ID, markable-ID of the antecedent, markable-ID of the anaphor, coreference status, confidence status. As shown in figure 3.6, markable m323 , the proper name Beckmeyer, is coreferent with markable m314 , the noun phrase Bremens Häfensenator and with markable m335 , the complex proper name Uwe Beckmeyer but disreferent with markable m322 , the noun phrase diese Behauptung. As this link table is gold standard, the confidence status is always 100%. 1144 1152 1153 314 322 323 323 323 335 coref disref coref 100 100 100 Figure 3.6.: Example of a link table in the TüBa-D/Z corpus 3.2.4. Coreference Resolution After the relational database model is created, the coreference resolution step can be started. Link Generator In the training step of the classifier, SUCRE creates positive and negative training samples. For each adjacent coreferent markable pair, a positive instance is created, say <mi ,mj >. Negative training instances are generated by pairing all preceding markables mα that are disreferent to mj with mj . For the decoding, the system generates all possible markable pairs inside a window of 100 markables (Kobdani and Schütze, 2010a). The output of this module is a list of generated links that is saved within the link table (cf. table 3.3) (Kobdani and Schütze, 2010b). 54 3.3. Visualization with SOMs Filtering As the number of disreference links outnumbers the number of coreference links by far, this step is used for reducing the number of disreference links with the use of prefilters (cf. (4.1.2)). Thereby, links that definitely connect two disreferent markables are filtered out. For instance, if the antecedent and the anaphor mismatch in the number value. Link Feature Extractor In this step, the values of the link features (cf. (3.2.2)) that have been defined are extracted for creating the dataset for training and testing a pairwise classifier. For the definition of link features, the regular definition language is used (see appendix B). The final samples are represented as link feature vectors. Each component of such a vector corresponds to a link feature. Learning (Training) SUCRE uses four classifiers for which the feature vectors are their input: Decision-Tree (Quinlan, 1993), Naive-Bayes, Support Vector Machine and Maximum-Entropy. The best results were achieved with the Decision-Tree-classifier (Kobdani and Schütze, 2010a). Therefore, it is used as basis for the feature engineering in chapter 4 to 6. Classification and Decoding After classification each test sample as coreferent or disreferent, the decoding (clustering) step starts. Here, the coreference chains (i.e. the clusters) are generated based on the pairwise decisions. In SUCRE, the best-first clustering is used. That means, for a given markable mj the best predicted antecedent mii (i.e. with the highest confidence status) is chosen. The starting point is the end of the document, moving leftwards (Kobdani and Schütze, 2010a). If the number of markables within a document exceeds a predefined threshold, there is a limit that yields better efficiency and results (Kobdani and Schütze, 2010b). 3.3. Visualization with SOMs One traditional possibility for annotating a text with coreference information visually is a text based visualization (e.g. the GATE framework (Cunningham et al., 2002)). The problem that occurs with this kind of visualization is that the annotation of large documents (e.g. books or detailed reports) requires a lot of time (Burkovski et al., 2011). Moreover, it does not provide a visualization of the feature space or similarities between links. Furthermore, text based visualizations are limited by the number of lines/colors that a user can distinguish by annotating coreference information (Burkovski et al., 2011). Another kind of coreference visualization is constituted by the Self Organizing Maps (SOMs). Here, unsupervised machine learning methods are combined with visualization and interaction techniques (Burkovski et al., 2011). The SOM visualizes coreference information gained by pairwise models. It provides an interactive presentation of the feature space. Thereby, the user can explore the feature space and is able to annotate data with coreference information in a fast manner. This approach is aimed to boost and advance human coreference recognition (Burkovski et al., 2011). 3.3.1. Self Organizing Map A Self Organizing Map is a “type of artificial neural network”, firstly described in (Kohonen, 1982), with which one can visualize “high dimensional data” (Burkovski et al., 2011). It can be used as an 55 Chapter 3. The SUCRE system unsupervised machine learning method. Burkovski et al. (2011) describe the SOM in a formal way: nodes (i.e. the neurons in this neural network) are connected to other nodes within a low dimensional topology. A neuron ni ∈ N is defined by its particular location ri ∈ Rdtopol , where dtopol is the dimension of the topology, and its weight vector w ~ i ∈ Rdin , where din is the dimension of the input vectors ~x ∈ Rdin . Training At the beginning (training time t = 0), all weight vectors w ~ i (0) are initialized based on some data knowledge or randomly. During training, an input vector ~xk is chosen randomly and is assigned to the node nj (the best matching unit (BMU)) whose weight vector has the smallest distance with respect to, say, euclidean distance (cf. formula 3.1): j = arg min(||~xk − w ~ i (t)||) i (3.1) After each assignment, every weight vector w ~ i is updated with a learning rule (formula 3.2), where α(t) is a “time decreasing learning coefficient” that controls the influence of the input on the training and hij denotes the neighborhood function, a distance measure between the node ni , that correlates to the weight vector w ~ i , and the BMU nj . A common neighborhood function is given in formula 3.3 (cf. (Burkovski et al., 2011)), where σ(t) is the radius of the neighborhood, which also descreases with time t (Kessler, 2010). w ~ i (t + 1) = w ~ i (t) + hij (t)α(t)(~xk − w ~ i (t)) (3.2) ||~rj − ~ri ||2 ) (3.3) 2σ(t)2 “Graphically speaking, the weight vectors change their places in feature space to get closer to the data points and take their neighbors with them” (Burkovski et al., 2011). After each iteration, the training time t is increased by one (t ← t + 1) and thereby, α(t) and hij (t) decrease. Then, the next input vector, ~xk+1 , is chosen. These steps are repeated until a limit of iterations is reached or α(t) is below a predefined threshold. hij (t) = exp(− Visualization There are several ways to visualize this neural network. The most commonly used one is the so-called “U-matrix”, where the position of nodes and their neighborhood relations are represented as components (cf. figure 3.7). Here, ui represents the position of a node and uij is the distance between the nodes ni and nj in the feature space: uij = ||w ~i − w ~ j ||. Figure 3.7.: U-matrix in (Burkovski et al., 2011) High U-matrix values indicate that the weight vectors w ~ i of neighboring nodes ni are far apart in the feature space. Burkovski et al. (2011) represent the U-matrix as a graph (figure 3.8). All U-matrix values are indicated with a gray scale color: black represents a high value and light gray represents a low value. The number with which a node ni is labeled indicates the number of assigned input vectors ~xi for which ni is the BMU. The user can click on each node and receive information about the assigned links (Burkovski et al., 2011). Moreover, it is possible to focus on a single feature and reduce the SOM to a component plane (cf. figure 3.9, 3.10 and 3.11). Thereby, the user can have insights in the impact of a specific feature. 56 3.3. Visualization with SOMs Figure 3.8.: Graph of a U-matrix in (Burkovski et al., 2011) 3.3.2. Application of SOMs in coreference resolution The subsequent section mentions three features, that are described in table 3.4: Head match WordNet distance Markable span 1 if the head of markables matches, otherwise 0. Jaccard coefficient of WordNet hypernym sets for both markables. 1 if one markable spans the other, otherwise 0. Table 3.4.: Three features used by Burkovski et al. (2011) Burkovski et al. (2011) discuss three applications of the visualization with SOMs: 1. SOMs enable to represent “high dimensional coreference data and their features in a low dimensional space”. This way, the user is able to better understand the distribution of coreference data in the feature space. Figure 3.8 shows the U-matrix-graph for the proportion of coreference links (i.e. feature vectors) assigned to the matrix nodes. The dark nodes indicate high assignments of coreference links, whereas light gray nodes comprise none or just a few coreference links. Clusters or regions of nodes are defined as areas with high density of high-numbered nodes, separated by nodes without any link assignments or by black edges (i.e. edges with a high U-matrix value). The regions A − D contain predominantly coreference links, whereas region E also involves disreference links, as there are light gray nodes. If the user wants to know what feature is accountable for a cluster, he or she can use the component planes (figure 3.9, 3.10 and 3.11). 57 Chapter 3. The SUCRE system Figure 3.9.: The component plane for head match Figure 3.10.: The component plane for WordNet distance Figure 3.11.: The component plane for markable span The component plane in figure 3.9 is based on the head match feature. As this is a good indicator for coreference, the regions A, B and D are mainly dark in this component plane. Thus, they are constituted by this feature. Figure 3.10 shows the component plane for the WordNet distance feature. Here, the user can see that this feature is responsible for region C, as here, the region C is predominantly black. The region B does not have high U-matrix values in this component plane. Thus, the WordNet distance feature has no positive impact on the creation of region B. 2. The user can find new insights for designing new features by exploring the SOMs. Some regions in figure 3.8 show both gray nodes and black nodes. Here, the features do not separate coreference 58 3.4. The multi-lingual aspect in SUCRE and disreference links well enough. One way of solving this is a closer look into the gray nodes. So, the user can see details on the links assigned to that node and can find reasons why both coreference and disreference links are assigned to the same node. For instance, figure 3.11 shows the component plane for the markable span feature. This feature is an indicator for disreference. Nonetheless, region E in figure 3.11 mainly contains coreference links. If the user checks some nodes within this region, the reason for this becomes obvious. Most coreference links are assigned due to the fact that the second markable is an apposition to the first markable. This can be solved by introducing an apposition feature. In general, the “inspection of nodes with mixed links helps the user to understand what these links have in common and what new feature may separate them” (Burkovski et al., 2011). 3. As the annotation of large documents in a text based visualization is not time efficient, the SOMs provide a great advantage with the option of annotating similar data with coreference information. Given strong indicators for coreference like the component plane for the head match feature (figure 3.9), the user is able to identify regions. Now it is possible to annotate whole clusters with the right coreference information by checking for nodes in the respective regions, what kind of links (coreference/disreference) are predominantly given. If the user is not sure about the right class, he or she can also label the clusters with confidence values. These confidence values may be used for a supervised learning algorithm, afterwards (Burkovski et al., 2011). 3.4. The multi-lingual aspect in SUCRE The relational database model used in SUCRE provides a flexibility in feature engineering, that makes it possible for using one and the same feature set for several languages. As described in (2.7), SUCRE provides coreference resolution for all six languages presented in SemEval-2010: Catalan, Dutch, English, German, Italian, and Spanish. In (Kobdani and Schütze, 2010b), the question is addressed in how far it is possible “to define a common feature set that can be used for different languages”. Here, they focus on four languages: Dutch, German, Italian, and Spanish. With respect to the multi-lingual aspect, three main categories of features are defined, where only the first two are considered to be relevant for a feature set applicable to different languages: Identical features: An identical feature is a feature that is identical for all languages with respect to concept and definition. For instance, the distance between two markables in terms of sentences or exact or substring match. Kobdani and Schütze (2010b) discriminate three kinds of identical features: 1. String-based features: a) Exact string match of the markables’ heads b) Head of m1 contained in m2 c) Head of m2 contained in m1 d) Any word in m1 is contained in m2 or vice versa e) Substring match of the markables’ heads f) Partial match of the head of m1 with any word in m2 g) Partial match of the head of m2 with any word in m1 h) Partial match of any word in m1 with any word in m2 or vice versa i) Edit distance between the markables’ heads 2. Distance features: 59 Chapter 3. The SUCRE system a) Distance between m1 and m2 in terms of sentences b) Distance between m1 and m2 in terms of words 3. Span features: a) One markable is included in the other one b) m1 overlaps with m2 As the results based on this feature set outperforms the baselines (i.e. SINGLETONS and ONE CLUSTER), Kobdani and Schütze (2010b) argue “that these link features should be in the common feature set of the four languages”. Universal features: A universal feature is a feature that is identical for all languages with respect to concept but often has different realizations due to different annotation styles and lexical/grammatical differences. For example, pronoun type features, semantic class, number or definiteness of a noun phrase. Kobdani and Schütze (2010b) propose four groups of universal features that are defined for each language: 1. Noun type features: m1 /m2 is a) a common noun b) a proper noun c) definite d) indefinite As adding noun type features to the identical feature set increases the MUC-F1 about more than 10% (averaged over all four languages), Kobdani and Schütze (2010b) affirm the adding of noun type features to the feature set. 2. Pronoun type features: m1 /m2 is a) a first person pronoun b) a second person pronoun c) a third person pronoun Although some more language specific variants of these features in German and Spanish are even better, these pronoun type features improves the MUC-F1 score with about 5%. Therefore, the pronoun type features should be part of a multi-lingual feature set. 3. Grammatical features: m1 /m2 is a) a subject b) a direct object c) an indirect object These six features (three for each markable) improve the MUC-F1 score with about 8%. Therefore, it is advisable to include the grammatical features to the common feature set. 4. Agreement features: m1 and m2 agree with respect to a) number b) natural gender c) semantic class These features yield a 7% increase in the MUC-F1 score. Thus, it should be added to the final multi-lingual feature set too. 60 3.5. Evaluation results in SemEval-2010 Language specific features: A language specific feature is defined just for a specific language. For example the grammatical gender for German. The overall results of the four languages are shown in the results of SemEval-2010 - SUCRE gold annotation in table 3.5. This shows that the multi-lingual feature set presented by Kobdani and Schütze (2010b) achieves competitive results for coreference resolution. 3.5. Evaluation results in SemEval-2010 SUCRE participated in SemEval-2010 Task 1 on Coreference Resolution in Multiple Languages for gold and regular closed annotation tracks. It got best results in several categories, including regular closed annotation tracks for English, German and Italian. Further information of this competition is given in (2.7). Table 3.5 shows the results of SUCRE and the best competitor system, TANL-1. The four main evaluation measures (CEAF, MUC, B3 and BLANC) are used. Additionally, the score for markable detection (MD), that is on 100% in the gold annotation, is shown above the other scores. SUCRE’s results for gold closed annotation track of English and German are the best in MUC and BLANC. Language System MD-F1 CEAF-F1 MUC-F1 B3 -F1 BLANC System MD-F1 CEAF-F1 MUC-F1 B3 -F1 BLANC System MD-F1 CEAF-F1 MUC-F1 B3 -F1 BLANC System MD-F1 CEAF-F1 MUC-F1 B3 -F1 BLANC ca 100 68.7 56.2 77.0 63.6 69.7 47.2 37.3 51.1 54.2 100 70.5 42.5 79.9 59.7 82.7 57.1 22.9 64.6 51.0 de en es it SUCRE (Gold Annotation) 100 100 100 98.4 72.9 74.3 69.8 66.0 58.4 60.8 55.3 45.0 81.1 82.4 77.4 76.8 66.4 70.8 64.5 56.9 SUCRE (Regular Annotation) 78.4 80.7 70.3 90.8 59.9 62.7 52.9 61.3 40.9 52.5 36.3 50.4 64.3 67.1 55.6 70.6 53.6 61.2 51.4 57.7 TANL-1 (Gold Annotation) 100 100 100 N/A 77.7 75.6 66.6 N/A 25.9 33.7 24.7 N/A 85.9 84.5 78.2 N/A 57.4 61.3 55.6 N/A TANL-1 (Regular Annotation) 59.2 73.9 83.1 55.9 49.5 57.3 59.3 45.8 15.4 24.6 21.7 42.7 50.7 61.3 66.0 46.4 44.7 49.3 51.4 59.6 nl 100 58.8 69.8 67.0 65.3 42.3 15.9 29.7 11.7 46.9 N/A N/A N/A N/A N/A 34.7 17.0 8.3 17.0 32.3 Table 3.5.: Results of SUCRE and the best competitor system, TANL-1, in SemEval-2010 Task 1 This shows that SUCRE has been optimized in order to achieve good results on the four evaluation measures (Kobdani and Schütze, 2010a). For the improvement of SUCRE’s performance within this 61 Chapter 3. The SUCRE system diploma thesis, the bold values in table 3.5 (i.e. German, Gold annotation, SUCRE and best competitor, TANL-1) has to be considered. 3.6. The dataset for German coreference resolution As the results with the feature research in this diploma thesis should be comparable to the results of SemEval-2010, the dataset of SemEval is used. As mentioned in (2.7), for German coreference resolution, the TüBa-D/Z corpus is used (“Tübinger Baumbank des Deutschen / Zeitungskorpus”), developed at the University of Tübingen (Hinrichs et al., 2005a). It comprises 794k words from the newspaper “die tageszeitung (taz)”. It is a syntactically hand-annotated corpus. The annotation scheme is divided into four levels of syntactic constituence: 1. lexical level 2. phrasal level 3. level of topology 4. sentence level The annotation contains information about morphology, part-of-speech-tags, lemmas, grammatical functions, named-entities and anaphora resp. coreference relations [SfS Universität Tübingen (2010)]. For the purpose of SUCRE, the dataset extracted from TüBa-D/Z contains about 455k words (cf. table 2.19 in (2.7)). The training and development set of SemEval-2010 is merged to the new training set used in this study. Table 3.6 shows for both sets the number of documents, sentences and tokens: Training set #sentences 23,362 Test set #documents #sentences 136 2,736 Total #documents #sentences 1235 26,098 #documents 1099 #tokens 404,759 #tokens 50,287 #tokens 455,046 Table 3.6.: The training and test set of TüBa-D/Z in this study The dataset is annotated with the gold annotation of the original SemEval-2010 dataset (cf. figure 2.13 in (2.7)) and already transformed into the relational database model (cf. (3.2.3)). Figure 3.12 shows a further excerpt of the original task dataset from SemEval-2010 (cf. (2.7)). 1 Er er er PPER PPER cas=n|num=sg|gend=masc|per=3 per=3|cas=n|num=sg|gend=masc 2 2 SUBJ SUBJ _ _ _ _ (191) 2 wird werden werden VAFIN VAFIN _ per=sg|num=pres|temp=ind 0 0 ROOT ROOT _ _ _ _ _ 3 wissen wissen wissen VVINF VVINF _ _ 2 2 AUX AUX _ _ _ _ _ 4 ,,,$,$,_ _ 3 3 -PUNCT--PUNCT-_ _ _ _ _ 5 warum warum warum PWAV PROAV _ _ 3 3 S ADV _ _ _ _ _ 6 . . . $. $. _ _ 5 5 -PUNCT--PUNCT-_ _ _ _ _ Figure 3.12.: Another sentence in the original SemEval-2010 task dataset The token ID (column 1), the word form of the token (column 2), the gold part-of-speech-tag (column 5) and the gold morphological features (column 7) are inserted in the word table. 62 CHAPTER 4 Linguistic error analysis In this chapter, the classification results of SUCRE are considered. The pairwise classifier labels true coreferent/disreferent links with a confidence value. If the value is below 50%, the link is classified as disreferent. If the value is greater or equal to 50%, the link is predicted to be coreferent. As the links have true labels (coref/disref with 100% confidence status - cf. link table 3.3), the classification decisions can be divided into true positive (TP), true negative (TN), false positive (FP) and false negative (FN). Table 4.1 shows the confusion matrix for these classification judgements. Gold annotation Calc-Prob≥50 Calc-Prob<50 Sums Coreference TP FN TP+FN Disreference FP TN FP+TN Sums TP+FP FN+TN TP+FP+TN+FN Table 4.1.: Confusion matrix for classification judgements Misclassifications are the false positives (i.e. disreference links misclassified as coreferent) and false negatives (i.e. coreference links misclassified as disreferent). Each class of misclassification is analysed in a linguistical manner. Thus, the errors are inspected and reasons for the misclassifications are searched. Moreover, this analysis tries to answer the following questions: 1. What are frequent problems occurring with false positives and false negatives including some examples? 2. What features lead to this problem? 3. Is there any linguistic background for this misclassification? 4. How could those problems be solved (i.e. is it possible to implement a new link feature for this linguistic phenomenon)? 5. If necessary, what modifications/extensions of the pseudo-language have to be done in order to implement the new feature? SUCRE’s output includes a list of all correct and incorrect link classifications and the information about the markables and their positions. For a better error analysis each link gets a unique id. This unique link-id consists of the first and second markable-ID separated by an x. Depending on the calculation 63 Chapter 4. Linguistic error analysis probability and the Gold-value the links are divided into TP, FP, TN and FN (cf. table 4.1). The optimal goal would be getting every link from false positive to true negative and from false negative to true positive. In (4.1), the initial configuration is presented in terms of the first results, the initial feature set for filtering definitely disreferent markable links (prefilters) and the feature set for creating feature vectors for the classifier’s input (link features). The features are presented in pseudo language and are described with a paraphrase. The tables 4.4 and 4.5 show how they contribute to the current evaluation scores. (4.2) describes what differences occur in the baseline of the initial configuration and the baseline given by SemEval-2010. One step, that includes the removing of three features, provides a new baseline that is near the one, achieved in SemEval-2010. Afterwards in (4.3) the misclassifications of the false positives are analysed, based on the new baseline introduced in (4.2), in terms of different classes of linguistic or processing problems. The leading question is, why were two disreferent markables classified as coreferent? Some solutions (i.e. new link features) for the problems are proposed. These features will be implemented later in chapter 5. The analysis of the false positives provides further restrictions on the feature set in order to improve precision by not losing too much recall. In section (4.4) a similar analysis is done for false negatives. Here, the leading question is, why are two coreferent markables classified as disreferent? The goal in this step is to increase recall by not losing too much precision. 4.1. The initial configuration 4.1.1. The initial results Running SUCRE with the initial feature settings, the evaluation scores are: MUC-correct MUC-Precision MUC-Recall MUC-f-score 2469 0.403036 0.74909 0.524093 MUC-B3 -f-score CEAFM -all 13446 CEAFM -Precision 0.649784 CEAFM -Recall 0.649784 CEAFM -f-score 0.649784 BLANC-Attraction-f-score 0.204744 BLANC-Repulsion-f-score 0.973537 BLANC-Precision 0.559756 B3 -all B3 -Precision B3 -Recall B3 -f-score 0.617821 CEAFE -correct CEAFE -Precision CEAFE -Recall CEAFE -f-score BLANC-Recall BLANC-f-score RAND-accuracy 13446 0.645682 0.901311 0.752376 6298.88 0.860503 0.62058 0.721109 0.761964 0.645392 0.948779 Table 4.2.: Initial results of SUCRE For more details on the meanings of the evaluation scores, see (2.6). The MUC-B3 -F-measure is the harmonic mean of MUC’s and B3 ’s F-measure. The reason for this combination is described in (2.6). For instance, given the two baseline scores for MUC and B3 in German (cf. table 2.21 from SemEval2010 (2.7)) the MUC-B3 metric is an acceptable tradeoff as shown in table 4.3. In the case that there are no coreference links, MUC as well as MUC-B3 return 0% or even “nan”, whereas in the case that both scores (MUC and B3 ) are non-zero, their harmonic mean rewards both correct coreference links and correct singleton entities: 64 4.1. The initial configuration R MUC P F1 0.0 0.0 0.0 100 24.8 39.7 B3 P R F1 S INGLETONS 75.5 100 86.0 A LL - IN - ONE 100 2.4 4.7 MUC-B3 R P F1 0.0 0.0 0.0 100 4.4 8.4 Table 4.3.: The usage of MUC-B3 4.1.2. The link features in the prefilter Links which cannot be coreferent are caught by some link features which model this incompatibility. Those links are filtered out before creating the feature vectors (cf. (3.2.4)). 1. {(m1h.f1==f1.singular)&&(m2h.f1==f1.plural)} → The first markable is singular but the second markable is plural 2. {(m1h.f1==f1.plural)&&(m2h.f1==f1.singular)} → The first markable is plural but the second markable is singular 3. {(abs(m2b.stcnum-m1b.stcnum)>2)&&((m1h.f0==f0.P~)||(m2h.f0==f0.P~))} → The distance between the two markables is bigger than two sentences and one markable is a pronoun 4. {(abs(m2b.stcnum-m1b.stcnum)>0)&&((m1h.f0==f0.PRF)|| (m2h.f0==f0.PRF)||(m1h.f0==f0.PRELS)||(m2h.f0==f0.PRELS)|| (m1h.f0==f0.PRELAT)||(m2h.f0==f0.PRELAT))} → The markables are not in the same sentence and one markable is a reflexive pronoun or a relative pronoun 5. {(m2b.f0==f0.PIS)||(m2b.f0==f0.PIAT)||(m2b.f0==f0.PIDAT)} → The second markable is an indefinite pronoun 6. {(m1h.f2==f2.female)&&(m2h.f2==f2.male)} → The first markable is female but the second markable is male 7. {(m1h.f2==f2.male)&&(m2h.f2==f2.female)} → The first markable is male but the second markable is female 8. {(m1h.f2!=m2h.f2)&&(m1h.f2!=f2.unknown)&&(m2h.f2!=f2.unknown)&& ((m1h.f0==f0.P~)&&(m2h.f0==f0.P~))} → The two markables differ in gender, which is not unknown, and both markables are pronouns 4.1.3. The link features for the feature vectors 1. {abs(m2b.stcnum-m1b.stcnum)==0} → Both markables are in the same sentence 2. {abs(m2b.stcnum-m1b.stcnum)==1} → The second markable is in the subsequent sentence 3. {abs(m2b.stcnum-m1b.stcnum)>1} → The distance between the two markables is bigger than one sentence 65 Chapter 4. Linguistic error analysis 4. {alias(m1h,m2a)||alias(m1a,m2h)} → The head of one markable is an alias of the other markable 5. {(seqmatch(m1h,m2h))} → Exact string match of both markables’ heads 6. {(strmatchlc(m1h,m2h))} → Case-insensitive substring match of both markables’ heads 7. {strmatchlc(m2b,ein)&&(m2b.f0==f0.ART)} → Second markable starts with an indefinite article 8. {((m1b.txtpos<=m2b.txtpos)&&(m2e.txtpos<=m1e.txtpos))|| ((m2b.txtpos<=m1b.txtpos)&&(m1e.txtpos<=m2e.txtpos))} → The first markable includes the second or vice versa 9. {(m1b.txtpos<=m2b.txtpos)&&(m1e.txtpos<=m2e.txtpos)&& (m1e.txtpos>=m2b.txtpos)} → The first markable precedes the second markable but they overlap 10. {(m2b.txtpos<=m1b.txtpos)&&(m2e.txtpos<=m1e.txtpos)&& (m2e.txtpos>=m1b.txtpos)} → The second markable precedes the first markable but they overlap 11. {(m1h.f0==f0.NE)&&(m2h.f0==f0.NE)} → Both markables are proper names 12. {(m1h.f0==f0.NE)&&(m2h.f0==f0.NN)} → The first markable is a proper name and the second markable is a common noun 13. {(m1h.f0==f0.NN)&&(m2h.f0==f0.NE)} → The first markable is a common noun and the second markable is a proper name 14. {(m1h.f0==f0.NN)&&(m2h.f0==f0.NN)} → Both markables are common nouns 15. {(m1h.rewtag == rewtags.SUBJ) && (m2h.rewtag == rewtags.SUBJ)} → Both markables are subjects 16. {(m1h.rewtag != rewtags.SUBJ) && (m2h.rewtag != rewtags.SUBJ)} → Neither the first nor the second markable is a subject 17. {(m1h.f0==f0.PDAT)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)|| (m2h.f1==f1.unknown))} → The first markable is an attributive demonstrative pronoun and both markables have the same number or one’s is unknown 18. {(m1h.f0==f0.PDS)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)|| (m2h.f1==f1.unknown))} → The first markable is a substituting demonstrative pronoun and both markables have the same number or one’s is unknown 19. {(m1h.f0==f0.PIDAT)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)|| (m2h.f1==f1.unknown))} → The first markable is an attributive indefinite pronoun and both markables have the same number or one’s is unknown 66 4.1. The initial configuration 20. {(m1h.f0==f0.PIS)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)|| (m2h.f1==f1.unknown))} → The first markable is a substituting indefinite pronoun and both markables have the same number or one’s is unknown 21. {(m1h.f0==f0.PPER)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)|| (m2h.f1==f1.unknown))} → The first markable is a personal pronoun and both markables have the same number or one’s is unknown 22. {(m1h.f0==f0.PRF)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)|| (m2h.f1==f1.unknown))} → The first markable is a reflexive pronoun and both markables have the same number or one’s is unknown 23. {(m1h.f0==f0.PPOSS)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)|| (m2h.f1==f1.unknown))} → The first markable is a substituting possessive pronoun and both markables have the same number or one’s is unknown 24. {(m1h.f0==f0.PPOSAT)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)|| (m2h.f1==f1.unknown))} → The first markable is an attributive possessive pronoun and both markables have the same number or one’s is unknown 25. {(m1h.f0==f0.PRELAT)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)|| (m2h.f1==f1.unknown))} → The first markable is an attributive relative pronoun and both markables have the same number or one’s is unknown 26. {(m1h.f0==f0.PRELS)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)|| (m2h.f1==f1.unknown))} → The first markable is a substituting relative pronoun and both markables have the same number or one’s is unknown 27. {(m2h.f0==f0.PDAT)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)|| (m2h.f1==f1.unknown))} → The second markable is an attributive demonstrative pronoun and both markables have the same number or one’s is unknown 28. {(m2h.f0==f0.PDS)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)|| (m2h.f1==f1.unknown))} → The second markable is a substituting demonstrative pronoun and both markables have the same number or one’s is unknown 29. {(m2h.f0==f0.PIDAT)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)|| (m2h.f1==f1.unknown))} → The second markable is an attributive indefinite pronoun and both markables have the same number or one’s is unknown 30. {(m2h.f0==f0.PIS)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)|| (m2h.f1==f1.unknown))} → The second markable is a substituting indefinite pronoun and both markables have the same number or one’s is unknown 67 Chapter 4. Linguistic error analysis 31. {(m2h.f0==f0.PPER)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)|| (m2h.f1==f1.unknown))} → The second markable is a personal pronoun and both markables have the same number or one’s is unknown 32. {(m2h.f0==f0.PRF)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)|| (m2h.f1==f1.unknown))} → The second markable is a reflexive pronoun and both markables have the same number or one’s is unknown 33. {(m2h.f0==f0.PPOSS)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)|| (m2h.f1==f1.unknown))} → The second markable is a substituting possessive pronoun and both markables have the same number or one’s is unknown 34. {(m2h.f0==f0.PPOSAT)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)|| (m2h.f1==f1.unknown))} → The second markable is an attributive possessive pronoun and both markables have the same number or one’s is unknown 35. {(m2h.f0==f0.PRELAT)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)|| (m2h.f1==f1.unknown))} → The second markable is an attributive relative pronoun and both markables have the same number or one’s is unknown 36. {(m2h.f0==f0.PRELS)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)|| (m2h.f1==f1.unknown))} → The second markable is a substituting relative pronoun and both markables have the same number or one’s is unknown 37. {(m1h.f1==m2h.f1)&&(m1h.f1!=f1.unknown)} → Both markables have the same number which is not unknown 38. {(m1h.f2==m2h.f2)&&(m1h.f2!=f2.unknown)} → Both markables have the same gender which is not unknown 39. {(m1h.f3==m2h.f3)&&(m1h.f3!=f3.unknown)} → Both markables have the same case which is not unknown 40. {(m1h.f4==m2h.f4)&&(m1h.f4!=f4.unknown)} → Both markables have the same person which is not unknown 4.1.4. The performance of the 40 features By running a script that iteratively adds one link feature to the link feature set and runs SUCRE on that feature set, one gets evaluation scores for every iteration. Putting them together results in table 4.4 and for reversed iterations in table 4.5. The order of the features 1 − 40 (cf. 4.1.3) is derived from the original SUCRE feature set, which has been provided as a starting point for this study. One drawback of such a huge feature set is that these tables are not very clear. One problem for this is that the link features are dependent on others. So, a decrease of the MUC-B3 -F1 -score after adding a feature to the feature set does not indicate a bad feature performance in general as it might contribute its power in dependence of features that are added later on. Analogously, an increase of the score after a feature adding may also indicate a positive impact on already added features. 68 4.1. The initial configuration Nonetheless, one trend is clear: with very few features, the MUC-score is low (because of a small recall) and the B3 -score is great (because of no or few coreference links). By adding features, the MUCscore increases and therewith the MUC-B3 -score. B3 MUC Features 1 1-2 1-3 1-4 1-5 1-6 1-7 1-8 1-9 1-10 1-11 1-12 1-13 1-14 1-15 1-16 1-17 1-18 1-19 1-20 1-21 1-22 1-23 1-24 1-25 1-26 1-27 1-28 1-29 1-30 1-31 1-32 1-33 1-34 1-35 1-36 1-37 1-38 1-39 1-40 C 0.0 0.0 0.0 0.0 15.0 1108.0 1195.0 1199.0 1199.0 1199.0 1208.0 1251.0 1251.0 1350.0 1692.0 1691.0 1604.0 1604.0 1625.0 1625.0 1650.0 1782.0 1912.0 1912.0 2031.0 2031.0 1875.0 1875.0 1896.0 1896.0 1897.0 1986.0 2063.0 2063.0 2200.0 2203.0 2437.0 2437.0 2456.0 2467.0 2469.0 P 0.0 0.0 0.0 0.0 0.5556 0.4689 0.4518 0.4481 0.4481 0.4481 0.376 0.3826 0.3826 0.3715 0.3693 0.369 0.366 0.366 0.3644 0.3644 0.3601 0.3723 0.3299 0.3299 0.3325 0.3325 0.3206 0.3206 0.3227 0.3227 0.3227 0.3313 0.3331 0.3331 0.381 0.3809 0.4092 0.409 0.4063 0.4013 0.403 R 0.0 0.0 0.0 0.0 0.0046 0.3362 0.3626 0.3638 0.3638 0.3638 0.3665 0.3796 0.3796 0.4096 0.5134 0.513 0.4867 0.4867 0.493 0.493 0.5006 0.5407 0.5801 0.5801 0.6162 0.6162 0.5689 0.5689 0.5752 0.5752 0.5755 0.6025 0.6259 0.6259 0.6675 0.6684 0.7394 0.7394 0.7451 0.7485 0.7491 F1 nan nan nan nan 0.009 0.3916 0.4023 0.4015 0.4015 0.4015 0.3712 0.3811 0.3811 0.3896 0.4296 0.4292 0.4178 0.4178 0.4191 0.4191 0.4189 0.4409 0.4206 0.4206 0.4319 0.4319 0.4101 0.4101 0.4134 0.4134 0.4136 0.4276 0.4348 0.4348 0.4851 0.4853 0.5268 0.5266 0.5259 0.5224 0.5241 C 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 P 1.0 1.0 1.0 1.0 0.9989 0.9114 0.8951 0.893 0.893 0.893 0.8501 0.8478 0.8478 0.8224 0.7599 0.7597 0.7738 0.7738 0.768 0.768 0.7575 0.7443 0.6524 0.6524 0.6276 0.6276 0.6496 0.6496 0.6478 0.6478 0.6477 0.6409 0.6212 0.6212 0.6711 0.6705 0.6614 0.6611 0.6533 0.6449 0.6457 R 0.7549 0.7549 0.7549 0.7549 0.7557 0.8118 0.8177 0.8184 0.8184 0.8184 0.8191 0.8228 0.8228 0.8254 0.8449 0.8449 0.8396 0.8396 0.8416 0.8416 0.8413 0.848 0.852 0.852 0.8585 0.8585 0.8518 0.8518 0.8523 0.8523 0.8547 0.8596 0.8594 0.8594 0.8795 0.8798 0.8936 0.8935 0.8964 0.8985 0.9013 F1 0.8603 0.8603 0.8603 0.8603 0.8605 0.8587 0.8546 0.8541 0.8541 0.8541 0.8343 0.8351 0.8351 0.8239 0.8002 0.8 0.8054 0.8054 0.8031 0.8031 0.7972 0.7928 0.739 0.739 0.7251 0.7251 0.7371 0.7371 0.7361 0.7361 0.737 0.7343 0.7211 0.7211 0.7613 0.761 0.7602 0.7599 0.7558 0.7508 0.7524 MUC-B 3 F1 0.0179 0.5379 0.5471 0.5463 0.5463 0.5463 0.5138 0.5233 0.5233 0.5291 0.559 0.5587 0.5502 0.5502 0.5508 0.5508 0.5492 0.5667 0.5361 0.5361 0.5414 0.5414 0.527 0.527 0.5295 0.5295 0.5298 0.5404 0.5425 0.5425 0.5926 0.5926 0.6223 0.6221 0.6202 0.6162 0.6178 Table 4.4.: Cumulative performance of the 40 original features 69 Chapter 4. Linguistic error analysis B3 MUC Features 40 40-39 40-38 40-37 40-36 40-35 40-34 40-33 40-32 40-31 40-30 40-29 40-28 40-27 40-26 40-25 40-24 40-23 40-22 40-21 40-20 40-19 40-18 40-17 40-16 40-15 40-14 40-13 40-12 40-11 40-10 40-9 40-8 40-7 40-6 40-5 40-4 40-3 40-2 40-1 C 255.0 255.0 255.0 255.0 549.0 549.0 549.0 549.0 549.0 687.0 687.0 687.0 710.0 710.0 736.0 858.0 898.0 898.0 919.0 1001.0 1011.0 1011.0 1017.0 1017.0 1243.0 1258.0 1257.0 1254.0 1254.0 1254.0 1254.0 1254.0 1254.0 1260.0 2379.0 2387.0 2390.0 2496.0 2515.0 2469.0 P 0.7163 0.7163 0.7163 0.7163 0.7439 0.7439 0.7439 0.7439 0.7439 0.6562 0.6562 0.6562 0.6502 0.6502 0.6334 0.5905 0.5995 0.5995 0.5914 0.5753 0.58 0.58 0.5788 0.5788 0.5863 0.5873 0.5888 0.5885 0.5885 0.5885 0.589 0.589 0.5983 0.4651 0.429 0.4613 0.4608 0.4158 0.4082 0.403 R 0.0774 0.0774 0.0774 0.0774 0.1666 0.1666 0.1666 0.1666 0.1666 0.2084 0.2084 0.2084 0.2154 0.2154 0.2233 0.2603 0.2725 0.2725 0.2788 0.3037 0.3067 0.3067 0.3086 0.3086 0.3771 0.3817 0.3814 0.3805 0.3805 0.3805 0.3805 0.3805 0.3805 0.3823 0.7218 0.7242 0.7251 0.7573 0.763 0.7491 F1 0.1397 0.1397 0.1397 0.1397 0.2722 0.2722 0.2722 0.2722 0.2722 0.3164 0.3164 0.3164 0.3236 0.3236 0.3302 0.3613 0.3746 0.3746 0.379 0.3975 0.4013 0.4013 0.4025 0.4025 0.459 0.4627 0.4629 0.4621 0.4621 0.4621 0.4623 0.4623 0.4651 0.4197 0.5381 0.5636 0.5635 0.5368 0.5319 0.5241 C 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 13446.0 P 0.9929 0.9929 0.9929 0.9929 0.9896 0.9896 0.9896 0.9896 0.9896 0.9693 0.9693 0.9693 0.9677 0.9677 0.9637 0.9485 0.9479 0.9479 0.9442 0.9279 0.9292 0.9292 0.9275 0.9275 0.911 0.9089 0.9095 0.9097 0.9097 0.9097 0.9097 0.9097 0.9138 0.8672 0.7004 0.7374 0.7367 0.6526 0.6383 0.6457 R 0.7629 0.7629 0.7629 0.7629 0.784 0.784 0.784 0.784 0.784 0.7933 0.7933 0.7933 0.7953 0.7953 0.7974 0.8049 0.806 0.806 0.807 0.81 0.8108 0.8108 0.8103 0.8103 0.8249 0.8246 0.825 0.8258 0.8258 0.8258 0.826 0.826 0.8256 0.83 0.8967 0.8978 0.898 0.9024 0.9044 0.9013 F1 0.8628 0.8628 0.8628 0.8628 0.8749 0.8749 0.8749 0.8749 0.8749 0.8725 0.8725 0.8725 0.8731 0.8731 0.8727 0.8709 0.8713 0.8713 0.8702 0.865 0.866 0.866 0.8649 0.8649 0.8658 0.8647 0.8652 0.8657 0.8657 0.8657 0.8658 0.8658 0.8674 0.8482 0.7865 0.8097 0.8094 0.7574 0.7484 0.7524 Table 4.5.: Reversed cumulative performance of the 40 original features 70 MUC-B 3 F1 0.2404 0.2404 0.2404 0.2404 0.4152 0.4152 0.4152 0.4152 0.4152 0.4644 0.4644 0.4644 0.4722 0.4722 0.4791 0.5108 0.524 0.524 0.528 0.5447 0.5484 0.5484 0.5494 0.5494 0.6 0.6028 0.6031 0.6026 0.6026 0.6026 0.6028 0.6028 0.6056 0.5615 0.639 0.6646 0.6644 0.6283 0.6218 0.6178 4.2. One problem with distance features 4.2. One problem with distance features As the MUC-F1 -score (58.4%) and B3 -F1 -score (81.1%) in SemEval-2010 would result in the MUC-B3 F1 -score 67.9% (cf. formula 4.1), the initial MUC-B3 -F1 -score (61.78%) is much below this baseline. 2 · 0.584 · 0.811 0.947248 2 · F1M U C · F1B 3 = = ≈ 0.67903 (4.1) F1M U C + F1B 3 0.584 + 0.811 1.395 The best result that is visible in the tables 4.4 and 4.5 is 66.46%. Considering the results in table 4.5, there is a rapid decrease when adding the first three features. These boolean features describe the distance between the markables in terms of sentences: F1M U C _B 3 = 1. {abs(m2b.stcnum-m1b.stcnum)==0} 2. {abs(m2b.stcnum-m1b.stcnum)==1} 3. {abs(m2b.stcnum-m1b.stcnum)>1} But as Hobbs (1986) found out, 98% of the antecedents of a pronoun are in the same or in the previous sentence. Moreover, McEnery et al. (1997) showed that in 86.64% of cases, the antecedent is within a window of three sentences (Kobdani and Schütze, 2010b). Furthermore, all approaches presented in chapter 2 use a feature for sentence distance. Also the German approaches in (2.5) that are based on the TüBa-D/Z corpus engage several features for the distance. Therefore, it is not plausible why these three features perform that bad. One assumption is that there is an inconsistency in the original SemEval dataset and the underlying TüBa-D/Z dataset. This inconsistency has to be investigated in future work. For the research in this diploma thesis, one solution is to remove the first three features about distance in terms of sentences and rerun SUCRE with only 37 features. Thereby, a baseline is created that is similar to the one achieved in SemEval-2010. Table 4.6 shows the scores of the new baseline. MUC-correct MUC-Precision MUC-Recall MUC-f-score 2390 0.460767 0.725121 0.56348 MUC-B3 -f-score CEAFM -all 13446 CEAFM -Precision 0.718132 CEAFM -Recall 0.718132 CEAFM -f-score 0.718132 BLANC-Attraction-f-score 0.315814 BLANC-Repulsion-f-score 0.987133 BLANC-Precision 0.611977 B3 -all B3 -Precision B3 -Recall B3 -f-score 0.664425 CEAFE -correct CEAFE -Precision CEAFE -Recall CEAFE -f-score BLANC-Recall BLANC-f-score RAND-accuracy 13446 0.73673 0.898049 0.80943 7139.44 0.864444 0.703393 0.775647 0.742423 0.670918 0.974741 Table 4.6.: Results of the new baseline So, for the subsequent analysis these scores will be considered as baseline. 4.3. Error analysis in false positives This section focusses on the analysis of false positives. That means it is to figure out why two disreferent markables are classified as coreferent. The most often occurring link errors are divided into some main groups for which a solution could move the links to True Negative (TN). 71 Chapter 4. Linguistic error analysis Each group provides some self-created examples and/or some out of SUCRE’s output. Further examples to each group are given in the appendix E.1. 4.3.1. The second markable is indefinite In some misclassifications the second markable is indefinite. The first markable occurs as a definite common noun whose head exactly matches the head of the second markable. Such a problem occurs when a feature like feature no. 5 (Exact string match of both markables’ heads) outweighs a feature like feature no. 7 (Second markable starts with an indefinite article). The same problem might occur with feature no. 6 (Case-insensitive substring match of both markables’ heads). (9) a. b. c. d. e. 72 Erst in den 70er Jahren entstanden Übersetzungen und Inszenierungen , die jedoch für die grell-grausigen aber poetischen Stücke keine überzeugenden Lösungen fanden . Da dies auch Armin Holz nicht gelungen zu sein scheint ( siehe nebenstehende Rezension ) , bleibt ( Valle-Inclan , der Exzentriker der Moderne )m1 , auch weiterhin ( ein Geheimtip )m2 . D. N. (ID: 60x61); (Calc-Prob:52) Besonders Margit Bendokat als La Tatula , Bärbel Bolle als die verstorbene Juana la Reina und Corinna Harfouch als Mari-Gaila verliehen ( der Inszenierung )m1 wichtige Glanzpunkte .[. . . ] Man hatte sich am Ende einer erfolgreichen Spielzeit von dieser letzten Premiere im Deutschen Theater , mit der übrigens Ignaz Kirchner ursprünglich sein Debüt als neues Ensemblemitglied hätte geben sollen , mehr versprochen . Kirchner wird nun erst im Herbst mit ( einer Langhoff-Inszenierung )m2 seine Arbeit aufnehmen . (ID: 218x250); (Calc-Prob:52) Das Bühnenbild von Peter Schubert erzielt mit großen Stahlkäfigen zwar Wirkung , die lachsfarbene Wandbespannung erscheint aber für die Atmosphäre ( des Stückes )m1 fast wieder zu schick .[. . . ] Besonders Margit Bendokat als La Tatula , Bärbel Bolle als die verstorbene Juana la Reina und Corinna Harfouch als Mari-Gaila verliehen der Inszenierung wichtige Glanzpunkte . Diese wenigen atmosphärischen Momente lassen ( ein dichtes , interessantes Stück )m2 erahnen , das aber in dieser Fassung weit unter dem Möglichen inszeniert scheint . (ID: 208x221); (Calc-Prob:52) Gott guckt uns nicht zu , ( der )m1 hat der Welt längst den Rücken gekehrt “, läßt der spanische Dichter Ramon del Valle-Inclan einen seiner Helden gleich zu Beginn seiner Grotske “Wunderworte “erklären .[. . . ] Und wirklich : viel kann Gott in dem kleinen galizischen Dorf trotz all der katholischen Frömmigkeit seiner Bewohner nicht verloren haben . Als die Witwe Juana la Reina plötzlich auf offener Straße stirbt und ihrem irren Sohn Laureano ( ein einträgliches Geschäft )m2 hinterläßt , möchte sich so mancher in ihrer Verwandtschaft gerne vor den Karren spannen , in dem die Verblichene ihren irren Laureano von Jahrmarkt zu Jahrmarkt geschoben hatte . (ID: 74x92); (Calc-Prob:51) Bei der Polizei erfuhr ( die alte Dame )m1 , daß es sich bei ihrem Fall nicht um ein Vergehen handele , welches von Amts wegen verfolgt werden könne .[. . . ] Sie hat dabei noch Glück gehabt . ( Eine andere alte Dame , der gleiches widerfuhr )m2 , mußte einen Monat auf dem örtlichen Bahnhof nächtigen , sozusagen als Obdachlose . (ID: 389x432); (Calc-Prob:83) 4.3. Error analysis in false positives f. Die sonderbare Art und Weise der Hausbesetzung kommt noch aus Zeiten , als aufgrund bürokratischer Verwicklungen Sozialwohnungen leerstanden , die dann von ( wilden Mietern )m1 besetzt wurden .[. . . ] Inzwischen klagt dieser beim Obersten Gerichtshof , dessen Richter vorsichtshalber auch gleich in Urlaub gefahren sind . Einer von ihnen erklärte einer Zeitung schon mal anonym , es sei durchaus rechtens , wenn man ( wilde Mieter )m2 auf eigene Faust rauswerfe . (ID: 442x492); (Calc-Prob:52) In example (9a), there is no apparent reason for creating a coreference link except that the first and second markable are adjacent. Indeed, there is a copulative construction but such constructions are not annotated as coreferent in the corpus. Example (9b) shows a typical combination of substring match of the markables’ head and an indefinite determiner attending the second markable. This kind of string match will be discussed in (4.3.5). In (9c), the same problem occurs with an exact string match of the heads. The last two examples might be solved by modifying the triggering features (i.e. string-matching features) such that the second markable must not be indefinite. Example (9d) combines a masculine demonstrative pronoun with an indefinite neuter common noun. This mismatch of gender is discussed in (4.3.10). Apart from indefinite pronouns, any kind of pronoun is definite and thus this example again shows a shift from definite to indefinite. In (9e), one interesting detail is the keyword andere, which indicates that the first and the second markable, if string matched, are disreferent. Similar keywords might be neue∼, alternative∼ or spätere∼. The tilde means any inflectional suffix or none. Therefore, a further string-matching feature modification might be a conjunction that returns TRUE in the case that the second markable does not contain any of the aforementioned keywords. The last example, (9f), shows a difficult case of indefiniteness as indefinite common nouns in plural do not bear any determiner starting with ein. Considering all these cases of false positives with an indefinite second markable, the question comes up whether there is any coreferent case where the second markable occurs to be indefinite. To check this, all TPs are analyzed whether the second markable starts with ein. Some results are presented in example (10): (10) a. b. c. Der Gang zum Sozialamt wird zum Spießrutenlauf . “Manchmal stelle ( ich )m1 mir vor , daß alle in den Büros freundlich zu mir sind “, beschreibt ( ein Obdachloser )m2 seinen Tagtraum während der Wartezeit . Freundlichkeit hält auch der Diplom-Psychologe Klaus Hartwig * für eine der wichtigsten Voraussetzungen im Umgang zwischen Sachbearbeitern und Hilfeempfängern . (code: 15067-15067x15083-15084); (prob:53) ( Neue Studie zum Rauchen )m1 [. . . ] Was Columbus vor 500 Jahren von den Indianern als Genußmittel nach Europa brachte , ist längst zum schädlichen Laster degradiert : der Tabak . ( Eine neue Studie des britischen Epidemiologen Richard Peto )m2 hat dieses Risiko jetzt exakt quantifiziert . (code: 18844-18847x18872-18879); (prob:83) Naheliegende Frage : Warum wehrt sie sich dann so vehement gegen ( ein generelles Verbot für Zigarettenreklame )m1 ?[. . . ] Warum wird überhaupt noch geworben , wenn es so nutzlos ist ? Acht von zwölf Gesundheitsministern der EG haben sich im November des Vorjahres für ( ein generelles Verbot von Zigarettenreklame )m2 ausgesprochen , vier ( die Minister der Bundesrepublik , Großbritanniens , Dänemarks und der Niederlande ) waren dagegen . (code: 29128-29132x29159-29163); (prob:83) 73 Chapter 4. Linguistic error analysis d. e. f. in seinen Zeitschriften publizieren so “unbekannte Talente “wie Franz Kafka , Robert Walser , ( Robert Musil )m1 und andere .[. . . ] Darüber hinaus ist noch ein von Anne Gabrisch 1987 herausgegebener Auswahlband erhältlich : Franz Blei - Porträts . Die fehlende Nachwirkung Bleis dürfte ihren Grund darin haben , daß der Schriftsteller Blei keinen wiedererkennbaren Stil , keinen durchgängigen Erzählton besitzt : weder den ironisch-epischen eines Thomas Mann , den essayistisch-analytischen ( eines Robert Musil )m2 noch den skeptisch-melancholischen eines Joseph Roth . (code: 33832-33833x34016-34018); (prob:83) Aber Schmidt-Braul konnte weder ein zeitweilig favorisiertes Management-buy-outVerfahren zuwege bringen , da er keine Finanzierungsquellen aufzuschließen vermochte , noch kompetente Käufer für den Verlag interessieren . Dort , wo ( sich )m1 dennoch ( eine Verkaufsvariante )m2 abzeichnete , schoß die Treuhand quer , so bei der Bewerbung der Volker-Spieß-Gruppe aus Westberlin , die zuvor schon den Morgenbuch-Verlag übernommen hatte . Hier spielte nämlich die Immobilie eine entscheidende Rolle , die eigentlich zu Volk und Welt gehörte , aber inzwischen von der Treuhand für die Bundesregierung beansprucht wird , handelt es sich bei dem Haus in der Nähe des Postdamer Platzes doch um ein lukratives 50-Millionen-Objekt . (code: 36712-36712x36714-36715); (prob:52) Harald Wolf , PDS-Sprecher für Stadtplanung , monierte darüber hinaus , daß es bisher keine seriöse Untersuchung zum Innenstadtring gebe , die ( eine Öffnung der Oberbaumbrücke für den Individualverkehr )m1 nahelegt .[. . . ] An einem » Runden Tisch « sollten Bürger , Initiativen , Verwaltungen und Bezirkspolitiker beteiligt werden , fordert der Umweltverband . ( Eine Öffnung der Oberbaumbrücke für den Autoverkehr )m2 werde die Klimabelastung verschärfen - insbesondere den Sommersmog . (code: 42728-42734x42778-42784); (prob:83) As example (10) shows, there are indeed some coreferent links with an indefinite second markable. In (10a), there is a shift from direct to indirect speech and thus from first person, definite (all personal pronouns are definite) to third person, indefinite. Here, the modifications of the string matching feature described above would not take any effect on this right classification. The example (10b) joins a headline with a first mention of the entity in the subjacent text. Such a headline can be captured as a common noun in singular without any determiner. In example (10c), the coreference can be captured because all words in both markables are the same. This property can be used as additional factor for the modified string-matching feature. Example (10d) uses a special expression of a proper name. eines Robert Musil can be captured as being coreferent with Robert Musil by checking whether the head of the second markable is a proper name. One kind of coreference link with an indefinite second markable is given in (10e). Here, the reflexive pronoun sich is cataphoric and its antecedent is on its right. There is no string match such that any modification of those features would not take any effect on this instance. Example (10f) contains a similar markable pair as in (10c). Only the last words differ but here, both are compound words with the same head. 4.3.2. Wrong assignment of a relative pronoun A relative pronoun always refers to an antecedent in the leftward matrix clause. But it would be too simple to exclude all cases where a relative pronoun as first markable precedes the second markable, since although a relative pronoun always refers to a leftward markable it might be coreferent to a markable in 74 4.3. Error analysis in false positives the same sentence on its right-hand side. This can be the case when the relative pronoun constitutes a grammatical function which is referred to by a reflexive pronoun or a possessive pronoun like in (11). (11) a. b. Der Hund, dem sein Herrchen sein Stöckchen gibt,. . . Der Mann, der sich rasierte, . . . In (11) all underlined markables are coreferent. So, for the relative pronoun as first markable, the features has to enforce the second markable to be a reflexive pronoun or a possessive pronoun. A lot of false positives concerning the association with relative pronouns show a connection between a relative pronoun as first markable and a succeeding common noun as second markable. The examples in (12) show some cases of wrong assignment of the relative pronoun. (12) a. b. c. d. Der neben Garcia Lorca bedeutendste spanische Dramatiker des 20. Jahrhunderts wurde für das deutschsprachige Theater spät entdeckt . Erst in den 70er Jahren entstanden Übersetzungen und ( Inszenierungen )m1 , ( die )m2 jedoch für die grell-grausigen aber poetischen Stücke keine überzeugenden Lösungen fanden . Da dies auch Armin Holz nicht gelungen zu sein scheint ( siehe nebenstehende Rezension ) , bleibt Valle-Inclan , der Exzentriker der Moderne , auch weiterhin ein Geheimtip . (ID: 52x54); (Calc-Prob:51) Aus seinem umfänglichen dichterischen Schaffen ragen vor allem die von ihm kreierten esperpentos heraus , Schauerpossen , die die von Leidenschaft und Gewalt deformierte Gesellschaft wie durch einen Zerrspiegel betrachten . Zu ( diesem Genre )m1 gehören neben den Wunderworten ( ( die )m2 im Original als Tragikomödie untertitelt ist ) auch die Dramen Glanz der Boheme und die Trilogie Karneval der Krieger . Es sind sperrige , sprachgewaltige Grotesken , die Mystik und Mythen karikieren und eine erhebliche Fortschrittsskepsis ausdrücken . (ID: 32x33); (Calc-Prob:51) Denn das darf man nur , wenn man eine Ersatzwohnung ... Aufgrund des gleichen Paragraphen gibt es in Warschau inzwischen Tausende ( kommunaler Sozialmieter )m1 , ( die )m2 ihre Zahlungen eingestellt haben - Kündigung droht ihnen deshalb nicht . Die Stadt hat inzwischen sogar schon private Schuldenjäger beauftragt , die Mieten einzutreiben . (ID: 512x514); (Calc-Prob:51) Kürzungspläne des Senats , etwa die angedachte und schließlich verworfene Schließung der Deutschen Oper , würden nur das Angebot mindern und wirkten sich » schädigend auf den Tourismus aus « . Abseits aller kulturellen Angebote hält Berlin für Busch-Petersen , ( der )m1 bis 1989 » hinter der Mauer ausgehalten hat « , einen gewaltigen Trumpf in der Hand , wie er erst kürzlich wieder feststellen konnte : ( Ein schwedischer Gast , den er durch den Ostteil führte )m2 , sei von den Einschußlöchern an den Häusern » ganz fasziniert « gewesen . Er sei zwar nicht dafür , alles zu konservieren , aber » ein bißchen von dem , was gewesen ist , sollten wir erhalten « . (ID: 1430x1439); (Calc-Prob:53) Example (12a) shows a difficult case since this combination of markables is absolutely possible. The relative pronoun die can be used for markables in any gender in plural and thus, die can refer to both Übersetzungen und Inszenierungen and only to Inszenierungen as both markables are in plural. In general, this problem cannot be solved in an easy manner. Possibly, even by considering the whole context including world knowledge, this uncertainty might remain unanswered. One solution might be 75 Chapter 4. Linguistic error analysis the annotation. If one considers the entry for the relative pronoun and the corresponding antecedent in the word table, the value for gender (i.e. both) leaps to the eye (cf. figure 4.1). 183 184 185 187 Übersetzungen und Inszenierungen die 1 1 1 1 1 1 1 1 10 10 10 10 NN KON NN PRELS plural unknown plural plural female unknown female both nominative unknown nominative nominative unknown unknown unknown unknown Figure 4.1.: The relative pronoun referring to a conjunction One indicator that the relative pronoun refers to both Übersetzungen and Inszenierungen might be the gender value both, which is implemented in various corpora in SUCRE, although it might not be necessary because of the gender value unknown. It is mainly used for die, as it can refer both to plural masculine antecedents and to plural feminine antecedents. As the two antecedent candidates Übersetzungen und Inszenierungen and Inszenierungen provide different syntactic structures, this is a task of syntactical disambiguation rather than pure coreference resolution. Given a syntactic parser with an adequate disambiguation module, by definition, the gender could be set to both for the indication of the reference to a conjunction, as the connected markables might have different gender values (although this is not the case in the example in figure 4.1, where both markables are female). So, a possible feature is one that checks whether m2 is a relative pronoun and then if m1 is annotated as a part of a conjunction. If both is true and the gender value for m2 is both, the feature would vote for a disreference between m1 and m2 . As the gender value both also refers to markables that do not contain entities with a clear gender (e.g. Staatsangehörige), an additional check for this feature might be the gender of m1 . If m1 does not have the gender value both, then the feature votes for a disreference between m1 and m2 , otherwise, it would proceed in the aforementioned way. However, the annotation of the relative pronoun die always contains the gender value both, if it is in plural. Here, the annotation could be advanced for solving this problem. In example (12b), the antecedent for the relative pronoun is preceding but not adjacent. In German, it is unusual to have markables between a relative pronoun and its preceding antecedent (only verb forms, a comma/parenthesis or a preposition is allowed). This problem can be solved by checking whether there is any markable between m1 and m2 or whether the distance between m1 and m2 is greater than 2 (i.e. a comma/parenthesis and a possible preposition). Example (12c) combines the relative pronoun in m2 with its possibly antecedent on the left, which is however embedded in another markable which is the right antecedent to m2 . The head of m1 has the relation tag GMOD indicating a modifier in genitive. This combination can be checked in a feature but it is also possible that this kind of markable connection shows a right coreference link. In (12d) the relative pronoun is found in m1 . As mentioned above with example (11) this can only be the case if the second markable is a possessive or reflexive pronoun. One kind of example that frequently occurs in the true positives was the inclusion of m2 in m1 as it is the case in example (13a). Here, a feature might check whether m2 is within m1 , given that m2 is a relative pronoun. (13) 76 a. 1929 wurde er wegen seiner Gegnerschaft zur Diktatur Primo de Riveras kurzfristig inhaftiert , leitete ab 1933 dann aber die spanische Kunstakademie in Rom . Aus seinem umfänglichen dichterischen Schaffen ragen vor allem die von ihm kreierten esperpentos heraus , ( Schauerpossen , ( die )m2 die von Leidenschaft und Gewalt deformierte Gesellschaft wie durch einen Zerrspiegel betrachten )m1 . Zu diesem Genre gehören neben den Wunderworten ( die im Original als Tragikomödie untertitelt ist ) auch die Dramen Glanz der Boheme und die Trilogie Karneval der Krieger 4.3. Error analysis in false positives . (ID: 31x26); (Calc-Prob:51) 4.3.3. Relative proximity in context One problem that often occurs is a reference between a markable m1 and m2 which is admittedly possible (i.e. there is no incompatibility between them) but nevertheless markable m2 is disreferent to m1 and refers to an entity denoted by a markable between m1 and m2 . Some examples of these false positives are given in (14): (14) a. Gott guckt uns nicht zu , der hat der Welt längst den Rücken gekehrt “, läßt der spanische Dichter Ramon del Valle-Inclan einen seiner Helden gleich zu Beginn ( seiner )m1 Grotske “Wunderworte “erklären . Und wirklich : viel kann Gott in dem kleinen galizischen Dorf trotz all der katholischen Frömmigkeit ( seiner )m2 Bewohner nicht verloren haben . Als die Witwe Juana la Reina plötzlich auf offener Straße stirbt und ihrem irren Sohn Laureano ein einträgliches Geschäft hinterläßt , möchte sich so mancher in ihrer Verwandtschaft gerne vor den Karren spannen , in dem die Verblichene ihren irren Laureano von Jahrmarkt zu Jahrmarkt geschoben hatte . (ID: 80x85); (Calc-Prob:83) b. In der Inszenierung von Armin Holz in den Kammerspielen des Deutschen Theaters scheint dieser bittere Kern hinter viel Regie-Schnickschnack wieder zu verschwinden . Als habe ( er )m1 von Anfang an klarstellen wollen , daß hier lediglich eine “Schauperposse “zu sehen ist , läßt Armin Holz den Figuren des Stücks gar keine Gelegenheit , ( sich )m2 erst einmal sinnhaft vorzustellen . Bernd Stempel als Pedro Gailo muß in seinem viel zu kleinen Anzug und einem zweifarbigen Rauschebart über die Bühne stelzen und sich von der überzeichneten Zuhälterfigur Septimo ( Ulrich Haß ) verhöhnen lassen , seine Frau Mari-Gaila und Schwester Marica agieren gleich zu Beginn so exaltiert , daß einem der später so wüst entbrennende Streit über den lukrativen Karren nicht mehr bizarr , sondern nur noch folgerichtig erscheinen kann . (ID: 156x163); (Calc-Prob:52) c. Hamburg ( ap ) - Ein zwei Jahre alter Schäferhund namens “Prinz “hat im Hamburger Stadtteil Altona eine Wohnung besetzt . ( Der 24jährige Besitzer )m1 hatte dem Tier am Vortag ( sein )m2 zukünftiges Heim gezeigt . Das gefiel dem Hund so gut , daß er unmittelbar hinter der Tür Stellung bezog und niemanden mehr durchließ . (ID: 325x328); (Calc-Prob:51) d. ( Sie )m1 hat dabei noch Glück gehabt .[. . . ] Eine andere alte Dame , der gleiches widerfuhr , mußte einen Monat auf dem örtlichen Bahnhof nächtigen , sozusagen als Obdachlose . Dann starb ( sie )m2 dort . (ID: 428x435); (Calc-Prob:53) e. Bei der Polizei erfuhr die alte Dame , daß es sich bei ihrem Fall nicht um ( ein Vergehen handele , welches von Amts wegen verfolgt werden könne )m1 .[. . . ] Helena begab sich zu Gericht . ( Dieses )m2 gab ihr recht und verurteilte die wilde Mieterin dazu , die Wohnung zu verlassen . (ID: 396x400); (Calc-Prob:52) 77 Chapter 4. Linguistic error analysis f. Daß ( Nawrocki )m1 von dieser bigotten Inszenierung profitiert , ist weder sein Verdienst noch von ihm gewollt . Mit ( ihrem )m2 neuen Geschäftsführer steht die privatwirtschaftliche Marketing GmbH nunmehr gleichgewichtig neben der öffentlich-rechtlichen Olympia GmbH . An finanzieller Potenz und Aktionsradius wird sie ihr bald überlegen sein . (ID: 668x673); (Calc-Prob:51) Example (14a) couples two possessive pronouns that show an exact string match: seiner. But m1 refers to der spanische Dichter Ramon del Valle-Inclan and m2 to dem kleinen galizischen Dorf, a markable that is between m1 and m2 . In (14b) the personal pronoun er is combined with the reflexive pronoun sich. Such a combination in different sentences is excluded by the prefilters (4.1.2) but in the current sentence, there are finite embedded clauses and thus several subjects. A reflexive pronoun can only refer to a subject (cf. (4.3.4)). In this case, the reflexive pronoun refers to an implicit subject, because it occurs in an infinitival subordinated clause, whose subject is controlled by the superordinated accusative object den Figuren des Stücks (cf. control verbs in (4.4.1)). The markable pair in (14c) is a very descriptive example for an ambiguity. Markable m2 (sein) can refer to Der 24jährige Besitzer as well as to dem Tier. With regard to semantics, the interest of an animal has probably more focus on its own home and not on the one of its owner. So, the given reading is unlikely and markable m2 has to corefer with the intermediate markable dem Tier. In example (14e), markable m2 is a demonstrative pronoun. Those have the property to corefer with those compatible expressions that are most salient. One indicator for salience is proximity in context: the closer an expression the more salient it is, if it is compatible. Thus, the markable Gericht is more salient than m1 (ein Vergehen . . . ). One exception for a more salient markable being between the connected markables is given in (14f). Here, m2 is cataphoric and thus corefers with the succeeding antecedent die privatwirtschaftliche Marketing GmbH. At this point it is not possible to implement a feature that captures this phenomenon, since link features are only defined on two markables m1 and m2 . However, a possible feature would be one that returns 1 (i.e. TRUE) if none of the markables between m1 and m2 is compatible with m2 with respect to all distinctive atomic features like gender, number, person or semantic class. Given the constants m1 and m2 referring to antecedent and anaphor, a first order predicate logic representation of this feature might be: ∀m3 ((m1b.txtpos < m2b.txtpos) ∧ (m1e.txtpos < m2e.txtpos) ∧ (m1b.txtpos < m3b.txtpos) ∧ (m1e.txtpos < m3e.txtpos) ∧ (m3b.txtpos < m2b.txtpos) ∧ (m3e.txtpos < m2e.txtpos)) ⇒ ((m3h.f 1 ! = m2h.f 1) ∨ (m3h.f 2 ! = m2h.f 2) ∨ (m3h.f 4 ! = m2h.f 4) ∨ (m3h.semcls ! = m2h.semcls))) It says that for all markables m3 (in a given set of markables) if m1 precedes m2 and m1 precedes m3 and m3 precedes m2 (i.e. there is the linear order m1 < m3 < m2 ), then m3 is incompatible with m2 in at least one of the features number, gender, person or semantic class. If they are compatible in all of those features, the implication’s consequence is false and the feature returns 0 (i.e. FALSE) (as (1 ⇒ 0) ⇔ 0). A possible way of introducing such a universal quantified m3 is the shift from markable pairs to triples containing m1 , m2 and a set M that contains all markables inbetween. The effect of this final feature (with an implicit universal quantified m3 ) will be shown by example (14d), repeated in example (15): (15) 78 ( Sie )m1 hat dabei noch Glück gehabt .[. . . ] Eine andere alte Dame , der gleiches widerfuhr , mußte einen Monat auf dem örtlichen Bahnhof nächtigen , sozusagen als Obdachlose . 4.3. Error analysis in false positives Dann starb ( sie )m2 dort . (ID: 428x435); (Calc-Prob:53) Given the markables ( Sie )m1 and ( sie )m2 , every further markable between m1 and m2 has to be incompatible to m2 in at least one of the atomic features: gender, number, person or semantic class. One possible markable between m1 and m2 is “Eine andere alte Dame”m3 . This markable has the atomic features < gender, f emale >, < number, singular >, < person, third > and < semantic class, person >. All of these atomic features match to ( sie )m2 . Thus the link feature returns 0 - one vote for m1 and m2 being disreferent. One have to remark that this solution just captures anaphoric (rather than cataphoric) coreference relations. Thus, example (14f) would not be captured. But cataphoric relations are less common than anaphoric relations (e.g. in the TüBa-D/Z version used in (Klenner and Ailloud, 2009), there are 13,818 anaphoric relations but just 1,031 cataphoric relations.) and will be ignored in this solution. 4.3.4. Reflexive pronouns and non-subjects According to Canoo.net (2011c), there are three kinds of reflexive verb constructions. True reflexive verbs, reflexive variants of a verb and reflexively used verbs. True reflexive verbs are those that inherently subcategorize a reflexive pronoun (e.g. “sich schämen” (“to be embarrassed”)). Those governed reflexive pronouns are semantically empty and they cannot be exchanged by a personal pronoun. Nevertheless, these pronouns are still caught as markable in the TüBa-D/Z corpus. This is the same issue with reflexive variants of a verb. Although the full verb can be used with nonreflexive pronouns, the meaning of the verb is changed (usually) with the use of a reflexive pronoun (“sich ärgern” vs. “jemanden ärgern” (“to be annoyed” vs. “to bother somebody”)). On the other hand, the reflexively used verbs govern a referring reflexive pronoun (e.g. “sich rasieren” (“to shave”)). These reflexive pronouns always refer to the subject in the same sentence (cf. (Canoo.net, 2011c)). The false positives in (16) have one problem in common: they combine a reflexive pronoun with a non-subject (e.g. an accusative or dative object): (16) a. b. c. Und so mischten sich beim nicht gerade enthusiastischen Schlußapplaus , als Armin Holz zu seinen Schauspielern auf die Bühne kam , unter die wenigen Bravo-Rufe auch lautstarke Unmutsbekundungen . Man hatte ( sich )m1 am Ende ( einer erfolgreichen Spielzeit )m2 von dieser letzten Premiere im Deutschen Theater , mit der übrigens Ignaz Kirchner ursprünglich sein Debüt als neues Ensemblemitglied hätte geben sollen , mehr versprochen . Kirchner wird nun erst im Herbst mit einer Langhoff-Inszenierung seine Arbeit aufnehmen . (ID: 239x240); (Calc-Prob:51) So verteilte die russische Delegation eine Prognose über die Entwicklung der russischen Wirtschaft . Beim Haushaltsdefizit orientierten ( sich )m1 die amtlichen Statistiker noch an der GaidarCamdessus-Vereinbarung : Es soll ( sich )m2 1992 auf deutlich weniger als die festgeschriebenen fünf Prozent , nämlich 2,3 Prozent des Bruttosozialprodukts , belaufen . Gegenwärtig jedoch liegt es bei 17 Prozent . (ID: 8248x8252); (Calc-Prob:83) Es wurde , wie der russische Wirtschaftsstar Gregori Jawlinski anläßlich des Gipfels erinnerte , bereits 1990 und 1991 Michail Gorbatschow als Kreditlinie eingeräumt . Der Kern , um ( den )m1 das Karussell ( sich )m2 drehte , löst sich somit auf . 79 Chapter 4. Linguistic error analysis d. Hat es den Münchner Rummel überhaupt gegeben ? (ID: 8376x8378); (Calc-Prob:52) Nun steht es still - und alle Welt wundert sich , daß es sich nicht von der Stelle bewegt hat . Bei der hohen Drehgeschwindigkeit - immer um den Mittelpunkt jenes 24 Milliarden Dollar schweren Hilfspakets für Rußland - ist es eher ein Wunder , daß eine neue Einsicht dennoch aufspringen konnte : die Erkenntnis , daß ( sich )m1 mit ( einem IWFStandardprogramm )m2 die Schwierigkeiten Rußlands nicht Schlag auf Schlag lösen lassen , wie Boris Jelzin den Kohls und Bushs klarmachen konnte . Rußlands Präsident war deshalb wohl wirklich “sehr zufrieden mit dem Treffen “der G7-Regierungschefs aus den USA , Japan , der Bundesrepublik , Frankreich , Großbritannien , Italien und Kanada - auch wenn es nur die konkrete Zusage gab , die erste IWFKredittranche von einer Milliarde Dollar im August zu überweisen . (ID: 8206x8207); (Calc-Prob:51) Although the right subject precedes the reflexive pronoun in example (16a), the reflexive pronoun is linked with the succeeding genitive modifier of a PP-adjunct. In (16b) two reflexive pronouns are connected. A reflexive pronoun, which is classified as accusative object, cannot be the subject of a sentence. Thus, both reflexive pronouns need a subject. In the first case, the subject is directly adjacent (die amtlichen Statistiker) and in the second case, the subject is the preceding pronoun (Es). Actually, it is possible to combine a relative pronoun with a reflexive pronoun (cf. example (11)) but this relative pronoun has to constitute the subject of the relative clause. This is not the case in (16c). Here, the relative pronoun is an accusative object in a PP-adjunct, whereas the right subject is das Karussell. In example (16d) the correct antecedent of the reflexive pronoun is the markable die Schwierigkeiten Rußlands and not, as proposed, the dative object of a PP-adjunct. To model this issue, the features concerning reflexive pronouns (i.e. the features no. 22 and no. 32 in (4.1.3)) have to enforce the other (non-reflexive) markable to be a subject. 4.3.5. Problems with substring-matches In German compound words are not separated by white spaces as it is the case in English. Thus any compound word AB containing the common nouns A and B substring-matches with A or B: (17) a. b. c. SchäferhundAB - HundB VertragB - ArbeitsvertragAB HundA - HundehalterAB Example (17a) is a possible link of coreferent markables that returns a positive substring-matching feature value. The markables have the same head (both are dogs). On the other hand the markables in example (17b) might be coreferent but it is unlikely since usually the more informative markable precedes its repeated mention, that only contains enough information to be coreferred with the first markable - in an extreme case, this is a pronoun (cf. “Typically, but not always, names and other descriptions are shortened in subsequent mentions” - (Versley, 2006)). In example (17c) there is a case which definitely connects two disreferent markables since their heads are different (i.e. a dog vs. a dog owner). A list of further examples of false positives with a positive substring-matching extracted from the corpus is shown in example (18). (18) 80 a. Das gefiel ( dem Hund )m1 so gut , daß er unmittelbar hinter der Tür Stellung bezog und niemanden mehr durchließ .[. . . ] Erst die Feuerwehr konnte beide durch das Fenster befreien . 4.3. Error analysis in false positives b. c. d. e. Herrchen wollte ( den Hundefänger )m2 holen . (ID: 331x345); (Calc-Prob:52) Um so mehr , als man das Absurde an dieser Praxis noch auf die Spitze treiben kann . Im Fall Helena G. verurteilte das Gericht ( die wilde Mieterin )m1 zur Zahlung von ( Miete )m2 . Doch hätte die nicht gezahlt , hätte Helena G. sie auch nicht rauswerfen können . (ID: 502x503); (Calc-Prob:52) Als ob das irgend etwas mit Sport zu tun hätte , daß Hans Anton ein paar Jährchen in faschistischer Uniform rumgekaspert ist und seine Briefe mit “es grüßt mit ( erhobenem Arm )m1 “unterschrieben hat .[. . . ] Schließlich stehen sie im Dienst einer großen und gerechten Sache , dem Bankkonto des IOC . Also , wenn wir die Briten richtig verstanden haben wollen , handelt es sich bei Juan und seinen 94 Komplizen aus ( dem Lausanner Marmorpalast )m2 um die korrupteste Bande auf Gottes Erdboden . (ID: 577x623); (Calc-Prob:52) Erstmals seit langer Zeit schaffte die Vielspringerin und Aktivistin in Sachen Dopingbekämpfung die 2-m-Marke nicht und scheiterte bei ( 1,98 m )m1 .[. . . ] Der Keniate Yobes Ondieki lief über 5.000 Meter in 13:03,58 Minuten eine neue Jahresweltbestleistung . Damit war er über fünf Sekunden schneller als der hoffnungsvolle Ohrringträger Dieter Baumann ( Leverkusen ) in ( seinem )m2 deutschen Rekordlauf am 6. Juni in Sevilla . (ID: 8433x8441); (Calc-Prob:53) Wie Nawrocki gestern erklärte , erhält Fuchs keine Abfindung , da ( sein Vertrag )m1 regulär am 15. August ausläuft .[. . . ] Nawrocki selbst erhält für seinen Doppeljob kein zusätzliches Salär . ( Sein Geschäftsführervertrag mit der Marketing GmbH )m2 ist unbefristet . (ID: 814x836); (Calc-Prob:52) In (18a) the substring hund matches both markables but as the second markable is a compound word, they have different heads: (dog vs. dog catcher). This mismatch can be handled by checking whether one markable starts or ends with the other markable. For this case, some new functions have to be introduced to the pseudo language. (cf. (5.1.4) and appendix B). One problem that occurs with this approach are markables with the same head but with inflectional suffixes. For this reason, the feature has to be modified in the sense that it returns 0 (FALSE) in the case that one markable starts with the other and does not end with an inflectional suffix. Another case of substring match is the morphological derivation. For example in (18b), the noun Mieterin is a derivation of Miete, thus they have the substring Miete in common but refer to absolutely different entities. But this is not always true as in example (19) one markable is the diminutive of the other and might corefer with it. (19) Kindm1 - Kindleinm2 At this point, it is not exactly clear how to solve this problem. One way could be to take into account inflectional suffixes but no derivational suffixes except diminutives. The example (18c) shows a surprising case of two markables that have accidently three sequential letters in common: Arm vs. Marmorpalast. This problem also occurs with pronouns or other markables that consists of a very short string. An extreme case is given in (18d). Here, the abbreviation for meter, m, allegedly corefers with the possessive pronoun seinem, which ends with the inflectional suffix ∼em. One way of solving this problem might be a comparison of the relative length of each markable. If one markable has length 3 or less, the other markable must not have the fourfold or sixfold length as it is the case above. 81 Chapter 4. Linguistic error analysis In addition, if two markables string-match, none of them may be a pronoun. As discussed with example (17b) it is unlikely that a more informative compound word succeeds a coreferent word. This is also the case in (18e). Here, Vertrag might be coreferent with the succeeding Geschäftsführervertrag but this coreference in unlikely. For this reason, the feature sketched above for example (17a) has to enforce the compound word preceding the non-compound word. Some interesting cases of true coreference between two markables that substring-match are given in (20): (20) a. Daß Einbruch auch strafbar ist , wenn der Einbrecher nicht mit einem Sack auf dem Rücken und einer Maske vor dem Gesicht das Weite sucht , ist eine Erkenntnis , die auch nach Ansicht ( des polnischen Bürgerombudsmanns )m1 die Auffassungsgabe der polnischen Polizei bei weitem übersteigt .[. . . ] Einer von ihnen erklärte einer Zeitung schon mal anonym , es sei durchaus rechtens , wenn man wilde Mieter auf eigene Faust rauswerfe . ( Ombudsmann Tadeusz Zielinski )m2 will indessen nicht einsehen , daß gewöhnliche Einbrecher besser behandelt werden als die rechtmäßigen Eigentümer bzw. Mieter . (ID: 479x494); (Calc-Prob:52) b. Als ob das irgend etwas mit Sport zu tun hätte , daß ( Hans Anton )m1 ein paar Jährchen in faschistischer Uniform rumgekaspert ist und seine Briefe mit “es grüßt mit erhobenem Arm “unterschrieben hat .[. . . ] Na bittschön , Vollbeschäftigung in Herzogenaurach , und irgendwelche Schuhe müssen die Sportler ja anziehen ! ( Der Spanier-Hansl )m2 kümmert sich wenigstens . (ID: 571x593); (Calc-Prob:52) c. ( BUNDESRAT )m1 [. . . ] Berlin . ( Der Bundesrat )m2 soll sich auf Initiative Berlins für einen verbesserten Kündigungsschutz für Gewerbetreibende einsetzen . (ID: 1205x1209); (Calc-Prob:52) Example (20a) shows a combination of valid compound word linkage and an inflectional suffix: Bürgerombudsmanns has the inflectional suffix ∼s but corefers with Ombudsmann, which has no such suffix. In this case, the modified string-matching feature would deny that m1 ends with m2 . The markable pair in (20b) resembles this problem. Here, a familiar form of the proper name Hans with a suffix ∼l is used within a compound word. At this point, it is not clear how to combine these two issues as the pseudo language is not expressive enough to use string concatenation (e.g. ’m2h + s’ or ’m1h + l’) as input for a function returning TRUE in the case, that one string ends with the other. The markables in (20c) reveal a less problematic case. The first markable BUNDESRAT is the capitalized form of the second markable Bundesrat. One way of solving this is to change the case-sensitive exact string matching feature (i.e. seqmatch) to a case-insensitive one (i.e. seqmatchlc). 4.3.6. “Es“(“it“) as expletive pronoun in German The pronoun es differs from others (e.g. er, sie) in several ways. Canoo.net (2011a) mentions 5 different uses of es: Personal pronoun: “es” replaces expressions referring to a real-world entity: (21) 82 Wo ist das Telefon? Es steht auf dem Tisch. 4.3. Error analysis in false positives Placeholder for a clause: “es” can be used as a placeholder for a clause that constitutes the subject or object: Es freut uns, dass unsere Mannschaft gewonnen hat. (22) Provisional subject: The so-called “Vorfeld-es” can be used as a provisional subject that precedes the actual subject in the sentence. This way, the linear order of constituents in a German declarative sentence (e.g. the finite verb at the second position) is ensured: Es steht ein Schrank im Gang. (23) Formal subject: “es” functions as merely formal subject of impersonal uses of verbs. In this kind of use, es is semantically empty and has no reference: (24) a. Es regnet stark. b. Es handelt sich um ein Missverständnis. c. Wie geht es dir? Mir geht es gut. d. Es gibt gute Sachen. Formal object: “es” can also be used as a formal accusative object in some idiomatic expressions: (25) a. Wir hatten es eilig. b. Sie haben es im Leben sehr weit gebracht. The only kind of use of the pronoun “es” that is of particular interest in the coreference resolution task is the personal pronoun. In contrast to English, this kind of usage is less common in German, as a lot of entities with a neuter natural gender are expressed with a common noun that has the grammatical gender female or male (cf. der Computermale or die Glühbirnef emale ). Thus, the other four kinds of es-usages described above do not constitute anaphoric markables in the sense of noun phrase resolution we have defined for SUCRE. Nevertheless, they are detected as markables in the corpus and therefore, a new group of false positives has been discovered. Example (26) shows one instance of this group: (26) ( Es )m1 gibt keine gesellschaftliche Kraft , die die Vorteile eines gemeinsamen Landes nicht herausstellt . Allein schon vom Blick auf die Landkarte , in der die große Stadt Berlin mitten im Land Brandenburg liegt , macht überdeutlich , daß ( es )m2 in Zukunft eine sehr enge wirtschaftliche , verkehrsmäßige , kulturelle , bildungsmäßige Verflechtung der beiden Länder geben wird . Dies ist auch aus alternativer Sicht wünschenswert . (ID: 12125x12135); (Calc-Prob:53) A possible solution for these non-referring pronouns might be a lexical feature that checks whether the governing verb is a verb form of geben, regnen or handeln. However, the expressiveness of SUCRE’s feature definition language is limited to the word-ID of the governing verb, not to its string representation. As soon as the expressiveness has been increased, the performance of such a feature has to be checked. However, up to now there is no clue pointing out that a given pronoun es is a referring personal pronoun or non-referring. Thus the only possibility to capture this phenomenon is to create a feature which returns FALSE in the case that a personal pronoun equals es. 83 Chapter 4. Linguistic error analysis 4.3.7. Units, currencies, month names, weekdays and the like Sometimes the head of a markable is a dimensional unit, a currency, a month name, a weekday or the like. Although it is not impossible that some of these markables are coreferring, this is rather unusual: (27) a. b. c. d. Von dieser Regelung profitiert bespielsweise auch die Bundesbauministerin Irmgard Schwaetzer ( FDP ) selbst mit ( ihren circa 15.000 Mark monatlichen Bruttoeinkommen )m1 und ihrem nicht unerheblichen Steuersatz . Die Ministerin besaß bis Anfang 1991 eine Wohnung in der Bonner Riemannstraße in Bonn , mit deren Erwerb sie ( bis zu tausend Mark Steuern )m2 im Monat sparen konnte . Auf großstädtische Verhältnisse umgerechnet nehmen sich solche Summen noch ganz anders aus . (ID: 1097x1108); (Calc-Prob:83) ( Nur zehn Prozent der umgewandelten Wohnungen )m1 sind an die dort wohnenden Mieter verkauft worden , ein Drittel der Mieter hat die Wohnung verlassen müssen , viele andere haben mit Kündigungsprozessen zu kämpfen .[. . . ] Denn das ist häufig der Fall . ( Zwischen 60 und 70 Prozent der Vermieter )m2 , schätzt Hanka Fiedler , wollen nur den Mieter herausbekommen , um die Wohnung , die dann im Preis steigt , besser verkaufen zu können . (ID: 988x1062); (Calc-Prob:83) Eine Niederlage mußte auch Weltmeister Samuel Matete ( Sambia ) über ( 400 Meter Hürden )m1 hinnehmen .[. . . ] Seine 48,18 Sekunden reichten nicht , um Kevin Young ( USA , 47,97 ) zu schlagen . Über ( 800 Meter )m2 setzte sich der Kenianer William Tanui mit 1:43,62 Minuten gegen den Weltjahresbesten Johnny Gray ( USA , 1:44,19 ) durch . (ID: 8476x8481); (Calc-Prob:83) Mit der gleichen knappen Zeitspanne unterlag im 200-m-Lauf der Afrikaner dem Weltmeister Michael Johnson ( USA ) , der für die halbe Stadionrunde ( 20,10 Sekunden )m1 benötigte .[. . . ] Eine Niederlage mußte auch Weltmeister Samuel Matete ( Sambia ) über 400 Meter Hürden hinnehmen . ( Seine 48,18 Sekunden )m2 reichten nicht , um Kevin Young ( USA , 47,97 ) zu schlagen . (ID: 8473x8478); (Calc-Prob:83) In (27a) the former German currency Mark constitutes the markable. In this case, it only expresses an amount of money that is earned monthly and no specific entity as it might be within the context of, say, a bank robbery. The subsequent examples (27b), (27c) and (27d) contain two allegedly coreferent markables that express information about percentages, local distances and time spans. None of them are usually coreferring. Thus, a simple way of handling this issue is the creation of a link feature that returns FALSE in the case that the head of the first or second markable exact string matches with one string of a particular set of keywords (e.g. {Mark, Dollar, Prozent, Meter, Juli, Sekunden, Mittwoch, . . . }) and TRUE otherwise. An alternative version is to create a prefilter feature that discards all markable pairs which contain a unit or any other keyword as a markable head. 4.3.8. Problems with the alias feature The alias function returns TRUE in the case that one markable is the alias of the other and FALSE otherwise. Two possible TRUE-cases are given in example (28). 84 4.3. Error analysis in false positives (28) a. b. Fußballclubm1 - FCm2 Deutsche Markm1 - DM m2 Another positive example of the alias function is in (29), although here, the alias is even expressed in the corresponding markable: (29) Berlin ( taz ) - ( Die staatliche Zentralstelle für Sicherheitstechnik ( ZFS ) in )m1 Düsseldorf hat einen Riesenauftrag an Land gezogen . Die Wissenschaftler ( der ZFS )m2 dürfen in den nächsten Jahren bis zu 1.000 Atommüllfässer aufmachen und ihren strahlenden Inhalt überprüfen . Die Fässer sollen zur Konditionierung aus dem Atommüllager Gorleben in eine Lagerhalle nach Duisburg-Wanheim transportiert werden . (ID: 6064x6067); (Calc-Prob:50) In the set of false positives, the alias function fires six times and just one time in the set of true positives. Three of those false positives are given in (30): (30) a. b. c. Ivo Knoflicek hat in Österreich endlich einen neuen Verein gefunden : Der frühere CSFRNationalspieler wurde vom ( FC St. Pauli Hamburg )m1 zunächst für ein Jahr an Vorwärts Steyr ausgeliehen .[. . . ] TENNIS. In der ersten Runde ( des Federation Cup vom 12. bis 19. Juli in Frankfurt / Main )m2 trifft die an Nummer eins gesetzte deutsche Mannschaft um Wimbledon-Siegerin Steffi Graf auf Außenseiter Neuseeland . (ID: 8573x8583); (Calc-Prob:50) WECHSEL Ivo Knoflicek hat in Österreich endlich einen neuen Verein gefunden : ( Der frühere CSFRNationalspieler )m1 wurde vom ( FC St. Pauli Hamburg )m2 zunächst für ein Jahr an Vorwärts Steyr ausgeliehen . Leihgebühr : 100.000 Mark . (ID: 8572x8573); (Calc-Prob:50) Vorweg : ( Sieben spannende Seiten Widmungen an die Menschen , die Torsten Schmidt als Junkies knipste )m1 :[. . . ] S. P. Ausstellung : bis 6. 8. im Schlachthof ; Buch : “Ich bin einmalig , und daß ich lebe , das freut mich . Menschen in der Drogenszene “, Rasch und Röhring Verlag , ( 29.80 DM )m2 (ID: 2927x2954); (Calc-Prob:50) In example (30a) the Fußballclub, “FC”, is misinterpreted as Federation Cup and in (30b) as frühere CSFR-Nationalspieler, where the F corresponds to frühere and the C to CSFR-Nationalspieler. The most unlikely alias interpretation is given in example (30c). Here, the Deutsche Mark, “DM”, is linked with a large markable that contain the sequential words die and Menschen. Thus, one possibility is to completely remove the alias feature since it has an insufficient predictive power. 4.3.9. First markable begins with “kein“ Markables that begin with the quantifier kein cannot refer to any other markable outside the quantifier’s scope. Example (31) shows an instance of such a true coreference. Here, all underlined markables are coreferent: (31) Kein Hund, dem sein Herrchen sein Stöckchen gibt, ärgert sich. 85 Chapter 4. Linguistic error analysis A negative example is given in (32): (32) » ( Keine Gewalt )m1 « war ein Slogan der großen Novemberdemonstrationen .[. . . ] Im puren , physischen Sinn war es auch eine weitgehend » gewaltlose « Revolution . ( Die andere Gewalt , die Gewalt der Geschichte , des Alltags , unserer Gefühle und Vorurteile )m2 , entlud sich dagegen , und sie entlädt sich immer noch . (ID: 1881x1892); (Calc-Prob:83) Here, there is an exact string match between the markables’ heads but m2 is not in the scope of the first markable’s quantifier, kein. Thus, the only possible anaphora for a markable starting with kein is a relative pronoun, a possessive pronoun or a reflexive pronoun. These pronouns have to occur in the same sentence as m1 . A possible feature for this mismatch returns TRUE in the case that the first markable begins with kein and the second markable is a pronoun of the kinds described above and is in the same sentence. It returns FALSE otherwise. The same solution can be done for the quantifier jede∼. 4.3.10. Disagreement in gender and number This group of false positives is very surprising since a mismatch in gender or number should have been filtered out in the pre-filters (4.1.2). There are several problems that are responsible for this mismatch. Most of the time, the problem is that some kinds of attributive pronouns (e.g. attributive possessive pronouns (P P OSAT )) are labeled unknown with respect to some atomic features (e.g. < gender, unknown >). First, one remark on the annotation of attributive pronouns, e.g. possive pronouns. There are in a way two kinds of annotation for a possessive pronoun. The first one constitutes the syntactical agreement with the respective NP-head and the second one constitutes the coreferential agreement with the antecedent. In example (33) the possessive pronoun seine syntactically agrees with the noun Schuhe and coreferentially agrees with the subject Peter. One can say that this coreferential agreement is a necessary condition for coreference but no sufficient condition, as there might be another entity with the same coreferential agreement features. In the first case (syntactical agreement), the possessive pronoun in (33) has the number value plural and the gender value male. These values can be considered as based on morphological suffixes: as seine ends with an e, it cannot agree with an NP-head in singular and non-female, because this e-suffix indicates either a plural-number or a female-gender in singular. This annotation is the one that is returned by a parser and that is the only important one for the grammaticality in German, as the noun phrases sein Schuhe or seine Schuh are not well-formed. But this annotation is not relevant for coreference resolution. Here, another kind of annotation is needed that is based on the coreferential agreement. Here, the possessive pronoun in example (33) has the number value singular and the gender value male or neutral, which are the exact complementary annotation possibilities as for the syntactical agreement of seine. The coreferential agreement can be considered to be based on the morphological stem: as seine starts with sein (rather than ihr) it cannot agree with an antecedent which is singular and female or plural (e.g. Maria/die Männer : seine). This annotation is usually not returned by a German parser. (33) Peterm1 holt seinem2 Schuhe ab. Figure 4.2 shows another excerpt of the original task dataset from SemEval-2010 (cf. (2.7)). However, the reason, why almost all attributive pronouns are labeled unknown with respect to the grammatical attributes, is that there is no gold annotation for them in the original SemEval dataset (cf. column 7 in line 5 in figure 4.2 but only the automatic parsed annotation (column 8). Thus, the system returns unknown due to the lack of gold annotation. 86 4.3. Error analysis in false positives 1 Heute heute heute ADV ADV _ _ 2 2 ADV ADV _ _ _ _ _ 2 wählen wählen wählen VVFIN VVFIN _ per=3|num=pl|temp=pres|mood=ind 0 0 ROOT ROOT _ _ _ _ _ 3 die die d ART ART cas=n|num=pl|gend=fem cas=n|num=pl|gend=fem 4 4 DET DET _ _ _ _ (179 4 Schottinnen Schottin Schottin NN NN cas=n|num=pl|gend=fem cas=n|num=pl|gend=fem 2 2 SUBJ SUBJ _ _ _ _ 179) 5 ihr ihr ihr PPOSAT PPOSAT _ cas=a|num=sg|gend=neut 6 6 DET DET _ _ _ _ (4|(179) 6 Parlament Parlament Parlament NN NN cas=a|num=sg|gend=neut cas=a|num=sg|gend=neut 2 2 OBJA OBJA _ _ _ _ 4) 7 . . . $. $. _ _ 6 6 -PUNCT--PUNCT-_ _ _ _ _ Figure 4.2.: Another sentence in the original SemEval-2010 task dataset The excerpt in figure 4.2 shows the original dataset of the sentence “Heute wählen die Schottinnen ihr Parlament”. As described above, the automatic annotation (column 8), that is returned by a German parser, corresponds to the syntactical agreement (i.e. the possessive pronoun ihr has the number value singular and the gender value neutral as it agrees with the NP-head Parlament.) The right antecedent for the possessive pronoun is the noun phrase die Schottinnen, which has completely different values in terms of case, number and gender. Consequently, the syntactic agreement with possessive pronouns are irrelevant for coreference resolution, but it is possible to narrow the achievable attribute values by considering the morphological stem of the possessive pronoun. Canoo.net (2011b) presents table 4.7: singular plural 1st person 2nd person mein∼ unser∼ dein∼ euer∼ 3rd person male sein∼ female ihr∼ ihr∼ neuter sein∼ Table 4.7.: Table of the possessive pronouns Therefore, if the possessive pronoun starts with sein, the attribute values are <number,singular>, <person,3> and <gender,male/neutral> and so on. A few examples of the false positives based on this unknown-disagreement are given in (34). For each example, the attributes are examined and the problem is figured out. Afterwards, a solution is proposed: (34) a. Als habe ( er )m1 von Anfang an klarstellen wollen , daß hier lediglich eine “Schauperposse “zu sehen ist , läßt Armin Holz den Figuren des Stücks gar keine Gelegenheit , sich erst einmal sinnhaft vorzustellen .[. . . ] Bernd Stempel als Pedro Gailo muß in seinem viel zu kleinen Anzug und einem zweifarbigen Rauschebart über die Bühne stelzen und sich von der überzeichneten Zuhälterfigur Septimo ( Ulrich Haß ) verhöhnen lassen , seine Frau Mari-Gaila und Schwester Marica agieren gleich zu Beginn so exaltiert , daß einem der später so wüst entbrennende Streit über den lukrativen Karren nicht mehr bizarr , sondern nur noch folgerichtig erscheinen kann . So kann die Groteske nicht zu ( ihrer )m2 Wirkung kommen , kann nichts von der Form ins Formlose umschlagen , vom Maßvollen ins Sinnlose kippen . 87 Chapter 4. Linguistic error analysis (ID: 156x179); (Calc-Prob:53) 581 er 2 2 29 PPER singular male nominative third 692 ihrer 2 2 31 PPOSAT unknown unknown unknown unknown Problem: The possessive pronoun ihrer is labeled unknown for the attributes number, gender, case and person. The personal pronoun er is singular and male but ihrer has just two other options: (singular and female) or (plural and unknown). The only given restriction in the prefilters (4.1.2) is a mismatch between female and male. Solution: One solution is to re-annotate those pronouns manually. Since it is not possible to decide which attributes to use for a kind of possessive pronoun with the stem ihr (singular and female) or (plural and unknown) (cf. table 4.7), one way is to keep the gender unknown and introduce a new number value both_ihr and a new restriction in prefilters such that if a markable has the number both_ihr, the other markable must not be male or neuter if it is singular. An alternative could be the introduction of a pre-filter that checks whether the string representation of a markable starts with ihr instead of introducing a number value for this. Both ways are possible and would lead to the same result. For pursuing the goal of filling the lack of annotation, the first option with the number value both_ihr is preferred to the second one, that leaves the number value unknown. b. Und so mischten sich beim nicht gerade enthusiastischen Schlußapplaus , als Armin Holz zu seinen Schauspielern auf die Bühne kam , unter die wenigen Bravo-Rufe auch lautstarke Unmutsbekundungen . Man hatte sich am Ende einer erfolgreichen Spielzeit von dieser letzten Premiere im ( Deutschen Theater )m1 , mit ( der )m2 übrigens Ignaz Kirchner ursprünglich sein Debüt als neues Ensemblemitglied hätte geben sollen , mehr versprochen . Kirchner wird nun erst im Herbst mit einer Langhoff-Inszenierung seine Arbeit aufnehmen . (ID: 242x243); (Calc-Prob:51) 953 Theater 2 2 41 NN singular neutral dative unknown 956 der 2 2 41 PRELS singular female dative unknown Problem: The first markable Deutschen Theater has the gender value neutral, whereas the second markable der is the dative form of a relative pronoun with the gender value female. Thus, there is a mismatch between female and neutral. But there is no restriction in the prefilters for a mismatch between male and neutral or between female and neutral given that one markable is not a pronoun (cf. feature no. 8 in (4.1.2)) as there are neutral common nouns in German (e.g. Mädchen) that can be referred to by female pronouns. Solution: An introduction of four further features in the prefilters (for each combination of mismatch of male/female and neutral given that the first markable’s head is not within a special list of keywords (e.g. Mädchen)) solves this problem. c. 88 Der zweieinhalbstündige Theaterabend in den Kammerspielen des Deutschen Theaters blieb dann auch entsprechend unentschieden . ( Viele Gedanken )m1 , Blitzlichter einer Idee wurden angerissen , aber nicht ausgeführt , wie zum Beispiel die Inzest-Anspielung zwischen Pedro und ( seiner )m2 Tochter Simonina ( verläßlich gut : Ulrike Krumbiegel ) . Ramon del Valle-Inclans Katholizismuskritik ist bei Armin Holz so stark in den Hintergrund gedrängt , daß das Kreuz , als es auf die Bühne geschleppt wird , kaum mehr als ein weiterer Gag sein kann . (ID: 188x193); (Calc-Prob:51) 4.4. Error analysis in false negatives 727 Gedanken 2 2 33 NN plural male nominative unknown 747 seiner 2 2 33 PPOSAT unknown unknown unknown unknown Problem: The possessive pronoun seiner is singular and male/neutral (cf. table 4.7) but the common noun Gedanken is plural. So, there is a mismatch between singular and plural that is caused by the insufficient annotation of attributive possessive pronouns. Solution: To solve this problem one can annotate all possessive pronouns with the stem sein as singular and introduce a new gender value non_fem and some corresponding pre-filters that prevent to include markable pairs containing sein and any markable with the gender female. 4.4. Error analysis in false negatives This section focusses on the analysis of false negatives. Therefore, it is to figure out why two coreferent markables are misclassified as disreferent. Again, the most often and clearest link errors are grouped together into some main groups, for which one or several possible solutions are proposed and later implemented in chapter 5 in order to move the links to True Positive (TP). Each group provides some examples from SUCRE’s output. Further examples to each group are given in the appendix E.2. 4.4.1. Reflexive pronouns with non-subjects or considerable sentence distance In this group, two main problems are revealed. First, based on the architecture of SUCRE, there are two kinds of false negatives (FN) that have to be distinguished from each other. Second, there are more complex syntactic structures that enforces the system to allow a reflexive pronoun to corefer with, say, an accusative object in the same (global) sentence. (35) a. b. c. Der 71jährige Ober-Olympier kommt aber auch wirklich nicht gut weg : Nicht nur , daß er die ( ihm )m1 anvertrauten Coubertinschen Ideale verraten und auf ihren Trümmern ein Wirtschaftsunternehmen aufgebaut hat , kreiden ihm die Briten an .[. . . ] Der Spanier-Hansl kümmert sich wenigstens . Um ( sich )m2 und auch um seine Kumpels . (ID: 558x595); (Calc-Prob:0) Auch beim Nato-Verbündeten Türkei , in die ( Außenminister Klaus Kinkel )m1 in der kommenden Woche reisen wird , werde unvermindert gefoltert .[. . . ] Immer mehr Menschen “verschwinden “oder werden von “offenkundig staatlich geduldeten Todesschwadronen “ermordet , sagte Deile . Er appellierte an Kinkel , ( sich )m2 für die politischen Häftlinge einzusetzen . (ID: 5023x5031); (Calc-Prob:0) Immer mehr Menschen “verschwinden “oder werden von “offenkundig staatlich geduldeten Todesschwadronen “ermordet , sagte Deile . Er appellierte an ( Kinkel )m1 , ( sich )m2 für die politischen Häftlinge einzusetzen . SEITE 8 (ID: 5030x5031); (Calc-Prob:7) In example (35a), the output of SUCRE proposes to classify the markable ihm and the markable sich as coreferent (what actually has not been done), although there are several sentences between them and ihm obviously does not constitute the subject. In (35b), the subject in m1 , Außenminister Klaus Kinkel, 89 Chapter 4. Linguistic error analysis is repeated several sentences later as Kinkel, adjacent to m2 . These proposals are very surprising, as one prefilter feature (no. 4 in (4.1.2)) excludes such links to be classified at all. The clarification for this confusion is that there are two kinds of false negatives (FN) based on the architecture of SUCRE (cf. (3.2)). First, the links created by the link generator are filtered (cf. (4.1.2)) and afterwards they are used for training/testing. Now, one can use the term false negatives (FN) as a group of instances that have been misclassified as negative (i.e. disreferent). Another usage of false negatives (FN) comes up after the clustering step, where SUCRE uses bestfirst clustering to create coreference chains out of the pairwise decisions of the aforementioned classifier. Now, the system might consider two markables mi and mj to be disreferent, because they do not occur in the same predicted cluster. This prediction is compared with the true partition (gold-standard coreference information). If mi and mj belong to the same true cluster, they should have been clustered (rather than classified) as coreferent. Thus, the link connecting mi and mj is an instance of this usage of false negatives (FN). And this is exactly the case in example (35a) and (35b) as SUCRE’s output of the false negatives starts after the clustering step and corresponds to the comparison of predicted partition and true partition. In order to improve the classification part of SUCRE, its output has to be restricted to the links that are created for the classifier’s input. Example (35c) reveals another problem. It concerns complex syntactic structure. Here, there is an infinitival embedded clause containing the second markable, sich. This clause functions as argument to the finite main verb appellierte. This kind of infinite clause has no subject and therefore, there is no explicit antecedent for the reflexive pronoun. In most syntactic theories, the embedded infinite main verb einzusetzen nevertheless needs a subject. Most of the time, this subject is the subject or direct/indirect object of the superordinated (matrix) clause. This is the case when the infinite clause is an argument of a so-called control verb. This group of verbs is split up into verbs that enable the control of subjects and those that enables the control of objects. Example (36) shows some examples of control: (36) a. Peteri verspricht Mariaj , sichi umzudrehen. (→ subject control verb) b. Mariaj bittet Peteri , sichi umzudrehen. (→ direct object control verb) c. Mariaj empfiehlt Peteri , sichi umzudrehen. (→ indirect object control verb) d. Petersi Versuch, sichi umzudrehen, missfiel Mariaj . (→ deverbal subject control noun) Example (36b) shows the same pattern of direct object control as in (35c). The verb versprechen (cf. example (36a)) on the other hand enables subject control whereas the verb empfehlen (cf. example (36c)) enables indirect object control. A special kind of control are deverbal nouns that are derived from control verbs. In (36d) the noun Versuch is derived from the control verb versuchen, which enables subject control. One way of solving the problem of false negatives concerning reflexive pronouns in embedded clauses governed by control verbs is to check whether there is a intermediate dependency relation between a control verb or a deverbal noun derived from a control verb and the reflexive pronoun (i.e. a dependency relation between the control verb and the main verb governing the reflexive pronoun). This can be done by using a list of subject control verbs and a list of object control verbs and then assign coreference to the markable if it is annotated with the right grammatical function and disreference otherwise. 4.4.2. Semantic Relations between the markables This group is based on the idea that two markables can corefer although they do not share any syntactical or string-based features. This is often the case, when both markables have common nouns as heads that do not substring match with each other. For implementing features that check for a specific semantic relation, there has to be a feature access to an appropriate knowledge base for ontological information like in GermaNet. 90 4.4. Error analysis in false negatives The examples below show some false negatives that occur because of such an implicit relation between two common nouns that cannot be captured string-based: (37) a. Das gefiel dem Hund so gut , daß er unmittelbar hinter der Tür Stellung bezog und niemanden mehr durchließ . Als ein Bekannter ( des Hundehalters )m1 versuchte , die Wohnung zu räumen , wurde er gebissen und flüchtete ins Wohnzimmer zur Gattin ( des Besitzers )m2 . Erst die Feuerwehr konnte beide durch das Fenster befreien . (ID: 335x340); (Calc-Prob:10) b. Der 24jährige Besitzer hatte dem Tier am Vortag ( sein zukünftiges Heim )m1 gezeigt .[. . . ] Das gefiel dem Hund so gut , daß er unmittelbar hinter der Tür Stellung bezog und niemanden mehr durchließ . Als ein Bekannter des Hundehalters versuchte , ( die Wohnung )m2 zu räumen , wurde er gebissen und flüchtete ins Wohnzimmer zur Gattin des Besitzers . (ID: 329x337); (Calc-Prob:5) c. Der 24jährige Besitzer hatte ( dem Tier )m1 am Vortag sein zukünftiges Heim gezeigt . Das gefiel ( dem Hund )m2 so gut , daß er unmittelbar hinter der Tür Stellung bezog und niemanden mehr durchließ . Als ein Bekannter des Hundehalters versuchte , die Wohnung zu räumen , wurde er gebissen und flüchtete ins Wohnzimmer zur Gattin des Besitzers . (ID: 326x331); (Calc-Prob:5) d. Daraufhin weigerte sich ( Daimler-Benz )m1 , ihn nach Abschluß seiner Ausbildung als Schlosser zu übernehmen - der Artikel sei nämlich ein “Bekenntnis zur Gewalt “. Es sei zu befürchten , daß der junge Mann in bestimmten Situationen auch im Betrieb Gewalt befürworten werde , argumentierte ( das Unternehmen )m2 . Das Bundesarbeitsgericht teilte den Standpunkt und wies die Klage auf Einstellung ab . (ID: 6010x6022); (Calc-Prob:7) e. Aufgrund des gleichen Paragraphen gibt es in ( Warschau )m1 inzwischen Tausende kommunaler Sozialmieter , die ihre Zahlungen eingestellt haben - Kündigung droht ihnen deshalb nicht . ( Die Stadt )m2 hat inzwischen sogar schon private Schuldenjäger beauftragt , die Mieten einzutreiben . Jemanden vor die Tür setzen dürfen die allerdings auch nicht . (ID: 511x519); (Calc-Prob:5) In example (37a), the markables des Hundehalters and des Besitzers should have been coreferred, since Hundehalters is a compound word with the head Halter, which is a synonym to Besitzer. Thus, a possible feature has to check the semantic relation of synonymy with respect to the heads of the markables (possibly a compound word). If there is a positive relation, the feature returns TRUE, otherwise FALSE. Another kind of semantic relationship is given in example (37b) and (37c). Here, the pairs of the respective markable heads, <Heim,Wohnung> as well as <Tier,Hund> show the semantic relation of hyponymy or hypernymy: an apartment (Wohnung) is a type of home (Heim) and one type of animal (Tier) is a dog (Hund). Therefore, one can implement another feature that checks whether the markables’ heads are in an hyponymy relation with each other. If there is a positive relation, again the feature returns TRUE, otherwise FALSE. The markable pairs in (37d) and (37e) contain a proper name (Daimler-Benz or Warschau) and a common noun that groups this proper name in a special category (e.g. a company (Unternehmen) or a city (Stadt)). 91 Chapter 4. Linguistic error analysis In the case that the ontological knowledge base (e.g. GermaNet) does not provide any information about the given proper name, for this purpose, there will be need for a named entity recognizer (which is not available in the TüBa-D/Z corpus) and for the aforementioned knowledge about synonymy and hyponymy. Then, an implementation of a feature would return TRUE in the case that one markable is a named entity and its category (predicted by NER) is (a synonym/hyponym of) the head of the other markable, and FALSE otherwise. 4.4.3. Both markables contain a common, possibly appositive proper name This group arose because of the fact that proper names (in particular surnames) are very often expressed appositively to a head (a common noun or a first name). However, the respective features in the given feature set (no. 11 - 13 in (4.1.3)) just check for proper names in the markables’ heads. Some false negatives for this problem are given in (38): (38) 92 a. ( Ramon Valle-Inclan )m1 [. . . ] Erst in den 70er Jahren entstanden Übersetzungen und Inszenierungen , die jedoch für die grell-grausigen aber poetischen Stücke keine überzeugenden Lösungen fanden . Da dies auch Armin Holz nicht gelungen zu sein scheint ( siehe nebenstehende Rezension ) , bleibt ( Valle-Inclan , der Exzentriker der Moderne )m2 , auch weiterhin ein Geheimtip . (ID: 1x60); (Calc-Prob:13) b. Als mir Anfang des Jahres ( Martin Flug )m1 sein Manuskript Treuhand-Poker - Die Mechanismen des Ausverkaufs auf den Verlagstisch legte , schien mir manches recht überzogen , und ich bat ihn , mir die haarsträubendsten Geschichten mit Dokumenten zu belegen , da ich wenig Lust verspürte , gleich nach Erscheinen verklagt zu werden .[. . . ] Doch in jedem einzelnen Fall konnte er mich von der Sauberkeit seiner Recherche überzeugen , und die Tatsache , daß bis heute - drei Monate nach der Erstauslieferung - keine Einstweiligen Verfügungen bei uns herniedergegangen sind , scheinen ihm zusätzlich recht zu geben . Heute würde ich wahrscheinlich nicht mehr so skeptisch fragen , denn das , was ich in den letzten Wochen in meiner unmittelbaren Umgebung , der Ostberliner Verlagsszene , erlebt habe , stellt ( Flugs )m2 Report noch um einiges in den Schatten . (ID: 9647x9680); (Calc-Prob:9) c. Im November 1990 sollen sie ( den Angolaner Amadeu Antonio )m1 so zusammengeschlagen haben , daß der 28jährige starb .[. . . ] Sie wehrte sich beredt gegen die Verteidiger , die ihrer Darstellung nicht glauben wollten . Durch ihre Schilderung sind die Vorgänge , die ( Amadeu Antonio )m2 das Leben kosteten , noch weniger klar als vorher . (ID: 5116x5211); (Calc-Prob:5) d. Daran ist dann offensichtlich auch der Versuch von Schmidt-Braul gescheitert , ( den Luftfahrtunternehmer Dornier )m1 für die Übernahme zu gewinnen . ( Silvius Dornier )m2 wußte aus seinen zähen Verhandlungen mit Daimler-Benz , denen er einen Großteil seiner Aktien verkauft hatte , daß auch bei der Treuhand etwas rauszuschlagen wäre . Wenn sie schon die teure Immobilie einsackte , sollte sie wenigstens noch den Verlag entschulden und etwas Geld für die Anschubfinanzierung locker machen . (ID: 9868x9870); (Calc-Prob:7) 4.4. Error analysis in false negatives e. f. ( Frieda Mermet )m1 heißt seine Angebetete , und eine einfache Wäscherin in einem Schweizer Bergkaff ist sie . Beim Walser-Ensemble , das die Briefe an ( die » Liebe Frau Mermet )m2 « auf die Bühne bringt , hockt die Holde tatsächlich im Dichterolymp . Ein goldener Bilderrahmen , ganz Gelsenkirchener Barock , schwebt über der Szene . (ID: 13022x13029); (Calc-Prob:12) “Das ist die alte Übung , Herr Präsident , auch wir müssen umlernen “, mußte sich Bundeskanzler Helmut Kohl auf seiner Pressekonferenz mit ( dem russischen Präsidenten Boris Jelzin )m1 am Mittwoch in München verteidigen . Er hatte ( seinen neuen “Duzfreund “Boris )m2 nämlich aus Versehen als sowjetischen Präsidenten bezeichnet und wurde daraufhin auch prompt von ihm unterbrochen . WAFFENSCHMUGGEL IM KLEINEN RAHMEN (ID: 10091x10097); (Calc-Prob:8) In example (38a), the head of m1 is the forename Ramon whereas the surname Valle-Inclan is the last word in m1 but the head in m2 . The same problem is shown in (38b) but here, the surname is inflected, so that a simple exact string match would not have sufficient predictive power. A similar example is given in (38c). Here, again the head of m2 is appositively given in m1 but this time, they share a common last word. One possibility of solving this problem is a modification of the given features that check for proper names (cf. no. 11 - 13 in (4.1.3)) such that they check whether the head of one markable is a proper name and if this is true, then if there is a substring match between this head and any word of the other markable. If both is the case, the feature returns TRUE and otherwise FALSE. In (38d) and (38e), there are more complex cases, because the NE-heads of m2 (or m1 ) are not present in m1 (or m2 ) at all. In this case, the feature sketched above has to be modified to simply check the head and the last word of the markable for being a proper name and occurring in the other markable. But this is not possible, if the common proper name is neither the head, the first or the last word. Admittedly, one can check for the number of common words, but there is no possibility to check for all common words whether they are proper names. Here, the pseudo-language should be extended. For instance, the implementation of bounded variables that provide word attributes (e.g. f0 for part-of-speech) could solve that problem (cf. example (39)) (39) seqmatch(X,m1a)&&seqmatch(X,m2a)&&(X.f0==NE) Another case is shown in example (38f): both markables’ heads are common nouns (Präsidenten vs. Duzfreund) that do not share any common features. They are even in no semantic relation. This case can be captured by the aforementioned feature because at least one proper name is one markable’s last word. 93 Chapter 4. Linguistic error analysis 94 CHAPTER 5 Implementation of the features In this chapter, the features that are proposed in the linguistic analysis in chapter 4 as well as in some approaches presented in (2.5) are implemented in SUCRE’s regular link feature definition language. In (5.1), all features that are proposed in the analysis of the false positives are implemented in their groups (as arranged in chapter 4). In (5.2), the features of the false negatives analysis for which an implementation is possible are coded. Finally, in (5.3), the implementations of the four features from the German approaches in (2.5) are presented. For each feature, the idea and its function are described briefly. Further descriptions on syntax and functions of the feature definition language are presented in appendix B. Every feature is exemplified by two instances, mostly a coreferent and a disreferent one. Finally, the feature is evaluated as a modification of the baseline feature set. However, sometimes this evaluation is misleading, as an improvement with respect to the baseline does not always mean a positive contribution to the final feature set. The reason for this is based on the dependencies of a link feature to others. As the final feature set contains largely nonoriginal features, the dependencies to final features might be quite different as to the original features. However, sometimes the trend is steady. At the end of each group, the best performing features resp. a combination of them is chosen for the final feature set. 5.1. Features for False Positives 5.1.1. The second markable is indefinite This group addresses the indefiniteness of m2 in the case of exact string match and substring match (cf. (4.3.1)). Indef1 : The feature Indef1 extends the string matching feature by disallowing m2 to start with ein: a) {seqmatch(m1h,m2h)&&(strmatchlc(m2b,ein)==0)} b) {strmatchlc(m1h,m2h)&&(strmatchlc(m2b,ein)==0)} The element (strmatchlc(m2b,ein)==0) checks, whether there is a substring match of the first word in m2 and the string ein, thereby capturing all indefinite articles like ein, eine, einer, eines, einem, . . . . If this is true, then strmatchlc(m2b,ein) is 1 and the equation 1==0 returns FALSE. In (40), two examples are given: (40) a. b. <Ein Mannm1 , Der Mannm2 > <Die Fraum1 , Eine Fraum2 > 95 Chapter 5. Implementation of the features In (40a), m2 does not start with an indefinite article and the heads of m1 and m2 string match. Therefore, the feature Indef1 returns TRUE. In (40b), m2 starts with an indefinite article and thus, although the heads of m1 and m2 string match, the feature Indef1 returns FALSE. Table 5.1 shows the result of the comparison of the new baseline against the addition of Indef1 . Feature set New baseline + Indef1 Difference: MUC-B3 -f-score 66.44% 68.39% 1.95% Table 5.1.: Evaluation of the feature Indef1 Indef2 : The feature Indef2 extends the feature Indef1 by the check for a keyword like andere, that indicates disreference: a) {seqmatch(m1h,m2h)&&(strmatchlc(m2b,ein)==0)&& (strmatchlc(andere,m2a)==0)} b) {strmatchlc(m1h,m2h)&&(strmatchlc(m2b,ein)==0)&& (strmatchlc(andere,m2a)==0)} The element (strmatchlc(andere,m2a)==0) checks, whether there is a substring match between the words in m2 and the string andere. If this feature performs well, one can decide to extend this feature by further keywords like neue, alternative, . . . . In (41), two examples are given: (41) a. b. <Ein Mannm1 , Der Mannm2 > <Ein Mannm1 , Der andere Mannm2 > As shown above, example (41a) returns TRUE for the elements in Indef1 . The item (strmatchlc(andere,m2a)==0) returns TRUE as the predicate strmatchlc(andere,m2a) returns FALSE and thus 0==0 is true. Therefore, the overall feature returns TRUE. In example (41b), strmatchlc(andere,m2a) returns TRUE and therefore 1==0 gets false and the overall feature returns FALSE. Table 5.2 shows the result of the comparison of the new baseline against the addition of Indef2 . Feature set New baseline + Indef2 Difference: MUC-B3 -f-score 66.44% 66.81% 0.37% Table 5.2.: Evaluation of the feature Indef2 Indef3 : The feature Indef3 extends the feature Indef1 by the possibility of m1 being a headline, i.e. it lacks a determiner: a) {seqmatch(m1h,m2h)&&((strmatchlc(m2b,ein)==0)|| (m1b.rewtag != rewtags.DET))} 96 5.1. Features for False Positives b) {strmatchlc(m1h,m2h)&&((strmatchlc(m2b,ein)==0)|| (m1b.rewtag != rewtags.DET))} The element ((strmatchlc(m2b,ein)==0)||(m1b.rewtag != rewtags.DET)) has to be understood as an implication: if m1 starts with a determiner, then m2 may not start with an indefinite article. As headlines often lack determiners, the repetition in the text below might corefer with it, although it starts with an indefinite article. In (42), two examples are given: (42) a. b. <Neue Studie zum Rauchenm1 , Eine neue Studiem2 > <Eine neue Studiem1 , Eine Studiem2 > The example in (42a) returns TRUE as m1 lacks a determiner, whereas the example in (42b) returns FALSE due to the fact that both markables contain an indefinite article (i.e. a determiner). Table 5.3 shows the result of the comparison of the new baseline against the addition of Indef3 . Feature set New baseline + Indef3 Difference: MUC-B3 -f-score 66.44% 67.86% 1.42% Table 5.3.: Evaluation of the feature Indef3 Indef4 : The feature Indef4 extends the feature Indef1 by the case that m1 and m2 are proper names and m2 is adding an indefinite article journalistically: a) {seqmatch(m1h,m2h)&&((strmatchlc(m2b,ein)==0)||(m2h.f0==f0.NE))} b) {strmatchlc(m1h,m2h)&&((strmatchlc(m2b,ein)==0)||(m2h.f0==f0.NE))} The element ((strmatchlc(m2b,ein)==0)||(m2h.f0==f0.NE)) has to be understood as an implication: if m2 starts with an indefinite article, then it has to be a proper name. Proper names usually corefer, if there is a string match. In (43), two examples are given: (43) a. b. <Robert Musilm1 , eines Robert Musilm2 > <Ein Mannm1 , eines Mannesm2 > The example in (43a) returns TRUE as m2 starts with an indefinite article but also contains a proper name as its head. On the other hand, in (43b), there is no proper name and thus, the feature returns FALSE. Table 5.4 shows the result of the comparison of the new baseline against the addition of Indef4 . Feature set New baseline + Indef4 Difference: MUC-B3 -f-score 66.44% 68.43% 1.99% Table 5.4.: Evaluation of the feature Indef4 97 Chapter 5. Implementation of the features Indef5 : The feature Indef5 allows m2 to start with an indefinite article in the case that all words in both markables are the same. As shown in example (10c), if both markables are exactly the same, a coreference relation can also hold if they start with an indefinite article. a) {seqmatch(m1h,m2h)&&((strmatchlc(m2b,ein)==0)|| (seqmatch(m1a,m2a)==max(m1.wrdlen,m2.wrdlen)))} b) {strmatchlc(m1h,m2h)&&((strmatchlc(m2b,ein)==0)|| (seqmatch(m1a,m2a)==max(m1.wrdlen,m2.wrdlen)))} The predicate max(m1.wrdlen,m2.wrdlen) returns the length of the longer markable. The predicate seqmatch(m1a,m2a) returns the number of common words in m1 and m2 . As one markable might be included in the other one, the idea is to compare the number of common words against the length of the longer markable (in the case that they differ). Again, the element ((strmatchlc(m2b,ein)==0)||(seqmatch(m1a,m2a)== max(m1.wrdlen,m2.wrdlen))) has to be understood as an implication: if m2 starts with an indefinite article, then m1 and m2 has to be exactly the same. In (44), two examples are given: (44) a. b. <Ein generelles Verbotm1 , Ein generelles Verbotm2 > <Ein generelles Verbotm1 , Ein absolutes Verbotm2 > The example in (44a) returns TRUE as m2 starts with an indefinite article, however m1 and m2 are exactly the same. With example (44b), the feature returns FALSE, as m1 and m2 differ in one word: max(m1.wrdlen,m2.wrdlen) returns 3 and seqmatch(m1a,m2a) returns 2, thus 2==3 is FALSE and so the overall feature result. Table 5.5 shows the result of the comparison of the new baseline against the addition of Indef5 . Feature set New baseline + Indef5 Difference: MUC-B3 -f-score 66.44% 68.33% 1.89% Table 5.5.: Evaluation of the feature Indef5 The selection of the final features One have to keep in mind that the features Indef2 up to Indef5 are all based on Indef1 . They are grouped together as all of them constitute a kind of exception for the string match or for a disreference based on an indefinite anaphora and are therefore intended to relativize the prediction of Indef1 . So, if one of the features is worse than Indef1 (68.39%), it does not perform sufficiently. This is the case for the features Indef2 (66.81%), Indef3 (67.86%) and Indef5 (68.33%). The feature Indef4 performs slightly better (68.43%). As it also contains Indef1 , Indef4 is selected as final feature. 5.1.2. Wrong assignment of a relative pronoun This group concerns the problem of the relative pronouns. Given that m1 or m2 is a relative pronoun its anaphor or antecedent has to be a specific pronoun or at a specific position (cf. (4.3.2)). Relpron1 : The feature Relpron1 is the following implication: if m1 is a relative pronoun, then it has to constitute the subject and m2 has to be a reflexive pronoun, a possessive pronoun or a relative pronoun too: 98 5.1. Features for False Positives a) {(((m2h.f0==f0.PRF)||(m2h.f0==f0.PPOS∼)||(m2h.f0==f0.PREL∼))&& (m1h.rewtag==rewtags.SUBJ))||(m1h.f0!=f0.PRELS)} The term PPOS∼ stands for attributive and substituting possessive pronouns. In (45), two examples are given: (45) a. Ein Hund, derm1 seinm2 Herrchen beißt . . . b. Ein Hund, derm1 (sein Herrchen)m2 beißt . . . Example (45a) returns TRUE as m1 is a relative pronoun which constitutes the subject of the verb beißen and m2 is an attributive possessive pronoun. On the other hand, in (45b) the head of m2 is a common noun and thus, the feature returns FALSE. Table 5.6 shows the result of the comparison of the new baseline against the addition of Relpron1 . Although there is a deterioration of 3.11%, this feature is considered for the final feature set, as it contributes to the final score (i.e. a removal would worsen the score about 0.09% (table 5.7)). Feature set New baseline + Relpron1 Difference: MUC-B3 -f-score 66.44% 63.33% -3.11% Table 5.6.: Evaluation of the feature Relpron1 Feature set Final set - Relpron1 Difference: MUC-B3 -f-score 73.06% 72.97% -0.09% Table 5.7.: The final set without Relpron1 Relpron2 : The feature Relpron2 is the following implication: if m2 is a relative pronoun, then the distance between m2 and m1 has to be less or equal to 3. The reason for the number 3 is that usually at least the comma separates the relative pronoun from its preceding antecedent. Sometimes, the relative pronoun is within a prepositional phrase and then, there are two tokens between the antecedent and m2 . Thus, the distance between m1 and m2 is at most 3 in terms of tokens. a) {(abs(m2h.txtpos-m1e.txtpos)<=3)||(m2h.f0!=f0.PRELS)} In (46), two examples are given: (46) a. (Ein Hund)m1 , derm2 . . . b. Zu (diesem Genre)m1 gehören neben den Wunderworten (diem2 . . . In example (46a), m2 is a relative pronoun and its antecedent is two tokens apart. Therefore, Relpron2 returns TRUE. The markable m2 in (46b) however is six tokens apart. Thus, the feature returns FALSE. Table 5.8 shows the result of the comparison of the new baseline against the addition of Relpron2 . Feature set New baseline + Relpron2 Difference: MUC-B3 -f-score 66.44% 65.26% -1.18% Table 5.8.: Evaluation of the feature Relpron2 99 Chapter 5. Implementation of the features Relpron3 : The feature Relpron3 tries to capture the possibility of m2 referring to an embedded noun phrase. In example (12c) the noun phrase Tausende kommunaler Sozialmieter corefers with the suceeding relative pronoun die. However, m1 was set to the embedded phrase kommunaler Sozialmieter. As this is not always wrong, this feature constitutes just a proposal for solving a problem like this. a) {(m2h.f0!=f0.PRELS)||(m1h.rewtag!=rewtags.GMOD)} The feature says: if m2 is a relative pronoun, then m1 may not be a genitive modifier. Table 5.9 shows the result of the comparison of the new baseline against the addition of Relpron3 . Feature set New baseline + Relpron3 Difference: MUC-B3 -f-score 66.44% 64.28% -2.16% Table 5.9.: Evaluation of the feature Relpron3 The selection of the final features Although Relpron1 shows a bad performance in the feature set of the new baseline, it contributes to the final feature set, as a removal worsens the final score slightly. Relpron2 and Relpron3 show both a deterioration that is also visible with the final feature set. Therefore, Relpron1 is the only feature selected for the final feature set. 5.1.3. Reflexive pronouns and non-subjects Reflex1 : This single group concerns the case that a reflexive pronoun is related to a non-subject. Examples of such misclassifications are given in (4.3.4). A simple way of solving this (and thereby disregarding complex structures like syntactic control) is the following feature: a) {((m2h.f0==f0.PRF)&&(m1h.rewtag == rewtags.SUBJ))|| ((m1h.f0==f0.PRF)&&(m2h.rewtag == rewtags.SUBJ))} The feature returns TRUE in the case that m2 is a reflexive pronoun and m1 is a subject or vice versa (for the case of cataphoric reflexive pronouns). Table 5.10 shows the result of the comparison of the new baseline against the addition of Reflex1 . Although, the addition to the base feature set shows a slight improvement, the feature does not perform well with the final features (cf. table 5.11). Feature set New baseline + Reflex1 Difference: MUC-B3 -f-score 66.44% 66.69% 0.25% Table 5.10.: Evaluation of the feature Reflex1 Feature set Final set + Reflex1 Difference: MUC-B3 -f-score 73.06% 72.79% -0.27% Table 5.11.: The final set with Reflex1 The selection of the final features As an addition of Reflex1 to the final feature set would result in an deterioration of 0.27%, there will be no features selected from this group to the final feature set. 100 5.1. Features for False Positives 5.1.4. Problems with substring-matches The features that are implemented in this group are based on an important issue - the German morphology. As a simple string matching feature often does not suffice, two further topics will be focussed in this group: inflectional suffixes and German compound words. The latter differ from English as a German compound word is always realized in one string (cf. (4.3.5)). For some German compound words like Schäferhundm1 vs. Hundm2 two new functions have been added to the function set of SUCRE’s pseudo language (cf. appendix B) in order to increase expressiveness: bswitch (bswitchlc) and eswitch (eswitchlc). bswitchlc(m1h,m2h) returns true in the case that the first markable’s head begins with the second markable’s head (e.g. “Hundehalter”m1h - “Hund”m2h ) eswitchlc(m1h,m2h) returns true in the case that the first markable’s head ends with the second markable’s head (e.g. “Schäferhund”m1h - “Hund”m2h ) The following features base on the idea that the substring matching feature strmatch/2 is only meaningful, if the exact string matching feature seqmatch/2 does not take effect. Substr1 : This feature checks, whether the head of m2 is the compound head of the head of m1 (i.e. whether m1h ends with m2h ). It is a modification of the feature Indef1 , which has been implemented in (5.1.1). a) {strmatchlc(m1h,m2h)&&(strmatchlc(m2b,ein)==0) &&eswitchlc(m1h,m2h)} Substr1 returns TRUE, if the head of m1 ends with the head of m2 and m2 does not start with an indefinite article. It returns FALSE otherwise. The reason, why this feature ignores the other way round (i.e. m2h ending with m1h ), is that repetitions of a reference are usually less informative than preceding ones. In (47), two examples are given: (47) a. b. <Ein Schäferhundm1 , Der Hundm2 > <Ein Hundehalterm1 , Der Hundm2 > With example (47a), Substr1 returns TRUE, as the head of m1 ends with the head of m2 and there is no indefinite article initializing m2 . In example (47b), the head of m1 starts with the head of m2 . So, the feature returns FALSE. Table 5.12 shows the result of the comparison of the new baseline against the addition of Substr1 . Feature set New baseline + Substr1 Difference: MUC-B3 -f-score 66.44% 68.94 2.5% Table 5.12.: Evaluation of the feature Substr1 Substr2 : This feature is addressed to inflectional suffixes. It claims the premise that the markables’ heads case-sensitively substring match but do not exact string match and m2 is definite. There are two possibilities: m1h is an inflected version of m2h or vice versa. For instance, if m1h is inflected, then it begins with m2h and ends with one of three possible inflectional suffixes: s, es and e. To be able to exclude an inflected compound word, the threshold for the edit distance between m1h and m2h is set to 3: 101 Chapter 5. Implementation of the features a) {strmatch(m1h,m2h)&&(seqmatch(m1h,m2h)==0) &&(strmatchlc(m2b,ein)==0)&&(editdist(m1h,m2h)<3) && ((bswitchlc(m1h,m2h)&&(eswitch(m1h,s)||eswitch(m1h,es)|| eswitch(m1h,e))) || (bswitchlc(m2h,m1h)&&(eswitch(m2h,s)||eswitch(m2h,es)|| eswitch(m2h,e))) )} In (48), two examples are given: (48) a. b. <Ein Hundm1 , des Hundesm2 > <Der Hundm1 , des Hundehaltersm2 > With the markable pair in (48a), the feature returns TRUE as m2h starts with m1h and ends with the inflectional suffix es. Although, with respect to the suffix s, this is the same case in example (48b), the feature returns FALSE, as the edit distance between m1h and m2h is 8 (rather than being below 3). However, this feature ignores compound words and the case in which both markables are (differently) inflected. Table 5.13 shows the result of the comparison of the new baseline against the addition of Substr2 . If an option for compound words (i.e. eswitchlc(m1h,m2h)) is offered, the performance is worse than in a single check for inflection (cf. table 5.14). Feature set New baseline + Substr2 Difference: MUC-B3 -f-score 66.44% 70.37% 3.93% Table 5.13.: Evaluation of the feature Substr2 Feature set New baseline + Substr2 + c.w. Difference: MUC-B3 -f-score 66.44% 69.82% 3.38% Table 5.14.: Substr2 with compound words Substr3 : The feature Substr3 is an extension of the previous feature Substr2 : beside the aforementioned inflectional suffixes, the feature allows the suffixes chen and lein, which function as diminutive (a small version of the respective common noun). Thus, the edit distance threshold is set to 5: a) {strmatch(m1h,m2h)&&(seqmatch(m1h,m2h)==0) &&(strmatchlc(m2b,ein)==0)&&(editdist(m1h,m2h)<5) && ((bswitchlc(m1h,m2h)&&(eswitch(m1h,s)||eswitch(m1h,es)|| eswitch(m1h,e)||eswitch(m1h,chen)||eswitch(m1h,lein))) || (bswitchlc(m2h,m1h)&&(eswitch(m2h,s)||eswitch(m2h,es)|| eswitch(m2h,e)||eswitch(m2h,chen)||eswitch(m2h,lein))) )} Table 5.15 shows the result of the comparison of the new baseline against the addition of Substr3 . Feature set New baseline + Substr3 Difference: MUC-B3 -f-score 66.44% 70.30% 3.86% Table 5.15.: Evaluation of the feature Substr3 102 5.1. Features for False Positives Substr4 : This feature is based on example (18c), where the noun Arm is related to the noun Marmorpalast because of a substring match. The idea is that if the length of one markable is less or equal to 3, then the other markable may not have more than double length: a) {strmatchlc(m1h,m2h)&&(strmatchlc(m2b,ein)==0)&& (((editdist(m1h,#)>3)||(editdist(m2h,#)<=2*editdist(m1h,#)))&& ((editdist(m2h,#)>3)||(editdist(m1h,#)<=2*editdist(m2h,#))))} The feature Substr4 returns TRUE, if the heads of m1 and m2 substring match and the length of mi is bigger than 3 or the length of mj is less or equal to the double length of mi . For computing the length of a markable, its edit distance to a dummy like “#” that definitely cannot occur in the markable is used. The element ((editdist(m1h,#)>3)||(editdist(m2h,#)<= 2*editdist(m1h,#))) is the implication: if the length of m1h is less or equal to 3, then m2h has to be less or equal to the double length of m1h . In (49), two examples are given: (49) a. <Der Armm1 , des Armesm2 > b. <Der Armm1 , der Marmorpalastm2 > In both examples in (49), the heads of m1 and m2 substring match. With example (49a), Substr4 returns TRUE, as the length of m2h is not greater than twice the length of m1h , given that m1h has the length 3. On the other hand, in (49b) m2h has the length 12, whereas m1h has the length 3. Therefore, Substr4 returns FALSE. Table 5.16 shows the result of the comparison of the new baseline against the addition of Substr4 . Feature set New baseline + Substr4 Difference: MUC-B3 -f-score 66.44% 67.63% 1.19% Table 5.16.: Evaluation of the feature Substr4 Substr5 : As pronouns are a very small closed class of words, a string match between them is quite meaningless. The feature Substr5 extends Indef1 by the condition, that neither m1 nor m2 is a pronoun: a) {seqmatch(m1h,m2h)&&(strmatchlc(m2b,ein)==0)&& ((m1h.f0==f0.P∼)==0)&&((m2h.f0==f0.P∼)==0)} b) {strmatchlc(m1h,m2h)&&(strmatchlc(m2b,ein)==0)&& ((m1h.f0==f0.P∼)==0)&&((m2h.f0==f0.P∼)==0)} Here, the feature Substr5 is divided into two parts: one for substring match and one for exact string match. The feature Substr5a returns TRUE, if the heads of m1 and m2 exact string match and are no pronouns and FALSE otherwise. Substr5b checks for the same with the substring matching feature. In (50), two examples are given: (50) a. <Die Mutterm1 , Die Mutterm2 > b. <1,98 mm1 , seinemm2 > 103 Chapter 5. Implementation of the features In example (50a), m1 and m2 are exactly the same. Thus, they substring match and exact string match and neither is a pronoun. Thus, both Substr5a and Substr5b return TRUE. In example (50b), the head of m1 is an abbreviation for the unit Meter and this m is part of the inflectional suffix em of the possessive pronoun sein. So, Substr5a returns FALSE, as there is no exact string match and Substr5b returns FALSE, as m2 is a pronoun. Table 5.17 shows the result of the comparison of the new baseline against the addition of Substr5 . Feature set New baseline + Substr5 Difference: MUC-B3 -f-score 66.44% 67.64% 1.20% Table 5.17.: Evaluation of the feature Substr5 Substr6 : This feature is based on a discovery in example (20c), where the headline noun BUNDESRAT does not exact string match with the noun Bundesrat. Substr6 constitutes the simple modification of case-sensitive exact string match to case-insensitive exact string match. So, instead of feature {seqmatch(m1h,m2h)&&(strmatchlc(m2b,ein)==0)}, Substr6 proposes: a) {seqmatchlc(m1h,m2h)&&(strmatchlc(m2b,ein)==0)} Thereby, a markable pair like the one in (20c) would be handled as exact strings. Table 5.18 shows the result of the comparison of the new baseline against the addition of Substr6 . There is a slight improvement concerning the new baseline, but a deterioration in the case of the final feature set. Feature set New baseline + Substr6 Difference: MUC-B3 -f-score 66.44% 66.45% 0.01% Table 5.18.: Evaluation of the feature Substr6 Feature set Final set + Substr6 Difference: MUC-B3 -f-score 73.06% 72.94% -0.12% Table 5.19.: The final set with Substr6 The selection of the final features This group has presented six features concerning string match. The feature Substr1 shows a less improvement than with Substr2 . The idea of diminutives in Substr3 does not work well enough, as Substr2 performs better. Although Substr4 improves the score, it is not compatible with Substr2 and as this feature outperforms Substr4 , Substr4 will be disregarded. Feature Substr5 can be combined with Substr2 and also show good performance. The last feature, Substr6 , show a minimal improvement in the new baseline, but it worsens the final score. Therefore, for the final feature set, Substr2 and Substr5 are selected. 5.1.5. “Es“(“it“) as expletive pronoun in German This group addresses the pronoun es, which is very often non-referring (cf. (4.3.6)). As there are no clues for figuring out whether an es-pronoun refers or not, the following feature just checks for the presence of such pronouns and returns TRUE, if they are not present. 104 5.1. Features for False Positives Es1 : The feature Es1 checks, whether both markables are different from es. If so, it returns TRUE, otherwise FALSE: a) {(seqmatchlc(m1h,es)==0)&&(seqmatchlc(m2h,es)==0)} This feature is very restrictive and disallows any occurence of the pronoun es; although it might be referential. In (51), two examples are given: (51) a. b. <Ein Autom1 , Das Autom2 > <Das Autom1 , esm2 > In example (51a), neither m1 nor m2 equals es. Thus, the feature returns TRUE. In (51b), m2 might corefer with m1 , however the feature returns FALSE. Table 5.20 shows the result of the comparison of the new baseline against the addition of Es1 . Feature set New baseline + Es1 Difference: MUC-B3 -f-score 66.44% 66.01% -0.43% Table 5.20.: Evaluation of the feature Es1 Es2 : The feature Es2 checks, whether one markable is different from es. If so, it returns TRUE, otherwise FALSE: a) {(seqmatchlc(m1h,es)==0)||(seqmatchlc(m2h,es)==0)} Now, the feature is less restrictive and allows one markable to be equal to es. However, if both markables equal es, Es2 returns FALSE. In (52), two examples are given: (52) a. b. <Ein Autom1 , esm2 > <Esm1 , esm2 > Now, example (52a) is accepted and the feature returns TRUE. In (52b), both markables equal es and the feature returns FALSE. Table 5.21 shows the result of the comparison of the new baseline against the addition of Es2 . Feature set New baseline + Es2 Difference: MUC-B3 -f-score 66.44% 66.44% 0.0% Table 5.21.: Evaluation of the feature Es2 Es3 : In contrast to Es1 and Es2 , Es3 is distributed over the personal pronoun features of m1 and m2 . The modification of the personal pronoun features restricts them in as much as the personal pronoun features return TRUE if the respective personal pronoun differs from es: The feature Es3 checks, whether one markable is different from es. If so, it returns TRUE, otherwise FALSE: 105 Chapter 5. Implementation of the features a) {(m1h.f0==f0.PPER)&&(seqmatchlc(m1h,es)==0)} b) {(m2h.f0==f0.PPER)&&(seqmatchlc(m2h,es)==0)} Table 5.22 shows the result of the comparison of the new baseline against the addition of Es3 . This addition shows a deterioration of 0.21%. However, this is not the case with the final feature set as a removal also worsen the score (cf. table 5.23). Feature set New baseline + Es3 Difference: MUC-B3 -f-score 66.44% 66.23% -0.21% Table 5.22.: Evaluation of the feature Es3 Feature set Final set - Es3 Difference: MUC-B3 -f-score 73.06% 72.76% -0.3% Table 5.23.: The final set without Es3 The selection of the final features The addition of feature Es1 shows a deterioration of the new baseline. As this is also the case with the final feature set, Es1 will be ignored. Es2 does not show any effect on the new baseline and thus, it will not be considered as a final feature. Although, there is also a deterioration with Es3 , it will be selected as a final feature, since it contributes to the final score (as a removal worsens the final score about 0.3%). 5.1.6. Units, currencies, month names, weekdays and the like This group addresses some keywords which are very unlikely to corefer with others. Among these keywords are units like Meter or Prozent, currencies like Dollar or Mark, month names, week days and the like (cf. (4.3.7)). Unit1 : This feature checks for inequality of every keyword with markable m1 . The focus on m1 is sufficient, as this problem only occurs in cases of an exact string match. This feature is used as a link feature (as opposed to the prefilter feature Unit2 ). a) {((seqmatchlc(m1h,Mark))==0)&&((seqmatchlc(m1h,Meter))==0) &&((seqmatchlc(m1h,Prozent))==0)&&((seqmatchlc(m1h,Sekunden))==0) &&((seqmatchlc(m1h,Juli))==0)&&((seqmatchlc(m1h,Milliarden))==0) &&((seqmatchlc(m1h,Dollar))==0)&&((seqmatchlc(m1h,Jahr))==0) &&((seqmatchlc(m1h,Mittwoch))==0)} The feature returns TRUE in the case that the head of m1 is neither identical to Mark, nor to Meter, . . . , nor to Mittwoch. If m1 equals any of the keywords, the feature returns FALSE. The set of keywords is based on the false positives described in the previous chapter. For an adaption to other corpora, corresponding keywords like Montag, Dienstag, Januar, Februar, Stunden, Millionen, . . . have to be added. For the sake of simplicity, they are left out in this study. Table 5.24 shows the result of the comparison of the new baseline against the addition of Unit1 . Feature set New baseline + Unit1 Difference: MUC-B3 -f-score 66.44% 66.42% -0.02% Table 5.24.: Evaluation of the feature Unit1 106 5.1. Features for False Positives Unit2 : This feature checks for equality of any keyword with markable m1 . In contrast to Unit1 , Unit2 is used as a prefilter feature. a) {seqmatchlc(m1h,Mark) || seqmatchlc(m1h,Meter) || seqmatchlc(m1h,Prozent) || seqmatchlc(m1h,Sekunden) || seqmatchlc(m1h,Juli) || seqmatchlc(m1h,Milliarden) || seqmatchlc(m1h,Dollar) || seqmatchlc(m1h,Jahr) || seqmatchlc(m1h,Mittwoch)} The feature returns TRUE in the case that the head of m1 is equal to one of the keywords and FALSE otherwise. Table 5.25 shows the result of the comparison of the new baseline against the addition of Unit2 . Feature set New baseline + Unit2 Difference: MUC-B3 -f-score 66.44% 66.88% 0.44% Table 5.25.: Evaluation of the feature Unit2 The selection of the final features The evaluations in table 5.24 and 5.25 show that the usage of Unit2 (i.e. the prefilter version) performs better than the usage of the vector feature Unit1 . Thus, Unit2 is selected as final feature for the final prefilter feature set. 5.1.7. First markable begins with “kein“ This group concerns markables that start with the quantifiers kein or jede, as the possibilities of coreferring markables for these are very restricted (cf. (4.3.9)). The only possible anaphors are a relative pronoun, a possessive pronoun or a reflexive pronoun. Kein1 : This feature checks whether the anaphor is a reflexive, a relative or a possessive pronoun: a) {((strmatchlc(m1b,kein)||strmatchlc(m1b,jede))==0)||((m2h.f0==f0.PRF)|| (m2h.f0==f0.PRELS)||(m2h.f0==f0.PPOSAT))} This feature has the meaning of the implication: if m1 is quantified with kein or jede, then m2 has to be a pronoun of the kinds mentioned above. In (53), two examples are given: (53) a. (Jeder Hund)m1 , derm2 sein Herrchen beißt . . . b. <Keine Gewaltm1 , Die andere Gewaltm2 > In example (53a), Kein1 returns TRUE as m1 starts with the quantifier Jeder but m2 is a relative pronoun. On the other hand, in example (53b), m2 is a definite noun phrase. Thus, Kein1 returns FALSE. Table 5.26 shows the result of the comparison of the new baseline against the addition of Kein1 . 107 Chapter 5. Implementation of the features Feature set New baseline + Kein1 Difference: MUC-B3 -f-score 66.44% 66.37% -0.07% Table 5.26.: Evaluation of the feature Kein1 Kein2 : This feature is based on the insight, that a link containing an antecedent that is quantified by kein or jede is predominantly proposed because of a positive string matching feature. Therefore, Kein2 is a modification of the string matching features, excluding a markable that starts with such a quantifier at all: a) {strmatchlc(m1h,m2h)&& (strmatchlc(m1b,kein)==0)&&(strmatchlc(m1b,jede)==0)} b) {seqmatch(m1h,m2h)&& (strmatchlc(m1b,kein)==0)&&(strmatchlc(m1b,jede)==0)} In (54), two examples are given: (54) a. b. <Eine Gewaltm1 , Die Gewaltm2 > <Keine Gewaltm1 , Die andere Gewaltm2 > In example (54a), m1 is not quantified by kein or jede and thus Kein2 returns TRUE, whereas in (54b) again, the feature returns FALSE. Table 5.27 shows the result of the comparison of the new baseline against the addition of Kein2 . Feature set New baseline + Kein2 Difference: MUC-B3 -f-score 66.44% 66.61% 0.17% Table 5.27.: Evaluation of the feature Kein2 The selection of the final features As the performance of Kein2 in the base feature set is better than the one of Kein1 (and so with the final feature set), Kein2 is selected as final feature. 5.1.8. Disagreement in gender and number This group is based on the discovery made in (4.3.10). Although there are several prefilter features checking for a disagreement in gender or number, some markable pairs, in particular with an attributive possessive pronoun, are listed as false positives. The reason for this was a suboptimal annotation of the original SemEval-2010 dataset. The steps presented in this group contain re-annotations with the use of heuristics as well as introductions of new prefilter features and extensions of the grammatical attributes number and gender. Agree1 : This step comprises the modification of the number attribute, the modification of the word table and the introduction of two new prefilter features. 108 5.1. Features for False Positives 1. New number value both_ihr: The number value both_ihr denoting the two possibilities of possessive pronouns starting with ihr∼ is added to the existing number values singular, plural and unknown: 0 0 1 singular 1 singular 2 unknown ⇒ 2 unknown 3 plural 3 plural 4 both_ihr 2. The modification of attributive possessive pronouns with the stem ihr∼ in the word table: in each entry, the unknown in column 7 is transformed into both_ihr: 692 ihrer 2 2 31 PPOSAT unknown unknown unknown unknown ⇓ 692 ihrer 2 2 31 PPOSAT both_ihr unknown unknown unknown 3. Finally, two new prefilter features for the number value both_ihr are introduced in the prefilters: a) {(m1h.f1==f1.both_ihr)&&((m2h.f2==f2.male)|| (m2h.f2==f2.neutral))&&(m2h.f1==f1.singular)} b) {(m2h.f1==f1.both_ihr)&&((m1h.f2==f2.male)|| (m1h.f2==f2.neutral))&&(m1h.f1==f1.singular)} The features discard a markable pair, if m1 has the number value both_ihr, whereas m2 has the value singular and the gender value male or neuter; or vice versa with m2 being annotated with both_ihr. Table 5.28 shows the result of the comparison of the new baseline against the addition of Agree1 . Although the modification step Agree1 shows a bad performance on the new baseline, if it would be undone in the final configuration, the score would be worse about 0.23%. Feature set New baseline + Agree1 Difference: MUC-B3 -f-score 66.44% 65.43% -1.01% Table 5.28.: Evaluation of the feature Agree1 Feature set Final set - Agree1 Difference: MUC-B3 -f-score 73.06% 72.83% -0.23% Table 5.29.: The final set without Agree1 Agree2 : This step only includes the introduction of four new prefilter features that checks for a disagreement between the gender values male/female and neuter. Additionally, the special case like in the markable pair <Das Mädchen, sie> can be checked for each keyword. Here, only Mädchen is used: a) {(m1h.f2==f2.male)&&(m2h.f2==f2.neutral)} b) {(m1h.f2==f2.neutral)&&(m2h.f2==f2.male)} c) {(m1h.f2==f2.female)&&(m2h.f2==f2.neutral)&& (seqmatch(m2h,Mädchen)==0)} d) {(m1h.f2==f2.neutral)&&(m2h.f2==f2.female)&& (seqmatch(m1h,Mädchen)==0)} The features return TRUE (i.e. discard a markable pair), if m1 and m2 show a mismatch in terms of the gender value neuter. In addition to that, the latter two features check for the inequality of the female markable with the string Mädchen. 109 Chapter 5. Implementation of the features Table 5.30 shows the result of the comparison of the new baseline against the addition of Agree2 . Feature set New baseline + Agree2 Difference: MUC-B3 -f-score 66.44% 66.96% 0.52% Table 5.30.: Evaluation of the feature Agree2 Agree3 : This step comprises the modification of the gender attribute, the modification of the word table and the introduction of two new prefilter features. 1. New gender value non_fem: The gender value non_fem denoting the incompatibility of possessive pronouns starting with sein∼ and female antecedents is added to the existing gender values: 0 1 male 2 unknown 3 female 4 neutral 5 both ⇒ 0 1 male 2 unknown 3 female 4 neutral 5 both 6 non_fem 2. Modification of attributive possessive pronouns with the stem sein∼ in the word table: in each entry, the unknown in column 8 is transformed into non_fem: 747 seiner 2 2 33 PPOSAT unknown unknown unknown unknown ⇓ 747 seiner 2 2 33 PPOSAT singular non_fem unknown unknown 3. Finally, two new prefilter features for the gender value non_fem are introduced in the prefilters: a) {(m1h.f2==f2.non_fem)&&(m2h.f2==f2.female)} b) {(m2h.f2==f2.non_fem)&&(m1h.f2==f2.female)} The features discard a markable pair, if m1 has the gender value non_fem and m2 is female or vice versa. Table 5.31 shows the result of the comparison of the new baseline against the addition of Agree3 . As this addition shows a bad performance for the baseline as well as for the final configuration (cf. 5.32), Agree3 which is addressed to the possessive pronouns sein∼ seems to work anyhow different than Agree1 for the possessive pronouns ihr∼. Feature set New baseline + Agree3 Difference: MUC-B3 -f-score 66.44% 66.02% -0.42% Table 5.31.: Evaluation of the feature Agree3 110 Feature set Final set + Agree3 Difference: MUC-B3 -f-score 73.06% 73.0% -0.06% Table 5.32.: The final set with Agree3 5.2. Features for False Negatives The selection of the final features Since the performance of Agree1 shows a positive contribution to the final configuration, this modification step is used for it. Agree2 also shows good performance. The only issue touches Agree3 . Here, there is no clear reason for this deterioration with the new baseline and in the final configuration. This has to be figured out in future work (cf. (7.3)). Thus, for the final configuration, Agree1 and Agree2 are used. 5.2. Features for False Negatives 5.2.1. Both markables contain a common, possibly appositive proper name This group is based on the insight, that proper names are not always the head of a markable but an apposition to a forename which is not mentioned in the other markable or to a common noun (cf. (4.4.3)). As the features for proper names in the original feature set concern only the markables’ heads, an extension is needed. Proper1 : This feature modifies the original feature that handles markables whose heads are proper names. Proper1 checks additionally if the head of one markable is contained in the other: a) {(m1h.f0==f0.NE)&&(m2h.f0==f0.NE) &&(seqmatch(m1h,m2a)||seqmatch(m2h,m1a))} This feature returns TRUE in the case that both markables’ heads are proper names and the head of one markable is in the other, and FALSE otherwise. In (55), two examples are given: (55) a. b. <(Ramonm1 h Valle-Inclan)m1 , (Valle-Inclanm2 h , der Exzentriker der Moderne)m2 > <(Peterm1 h Müller)m1 , (Mariam2 h Maier)m2 > In example (55a), the feature returns TRUE, as both markables’ heads are proper names (i.e. Ramon and Valle-Inclan) and the head of one markable (i.e. m2h: Valle-Inclan) occurs in the other markable (here, the last word in m1 ). However in example (55b), there is no head of one markable occurring in the other and thus, Proper1 returns FALSE. Table 5.33 shows the result of the comparison of the new baseline against the addition of Proper1 . Feature set New baseline + Proper1 Difference: MUC-B3 -f-score 66.44% 66.80% 0.36% Table 5.33.: Evaluation of the feature Proper1 Proper2 : This feature is created as a named entity can also have a common noun as head (cf. example (38c): “den Angolanerm1h Amadeu Antonio” vs. “Amadeum2h Antonio”). Thus, it also checks the last word of each markable for a match in the other markable: a) {((m1h.f0==f0.NE)&&(strmatchlc(m1h,m2a)))|| ((m2h.f0==f0.NE)&&(strmatchlc(m2h,m1a)))|| ((m1e.f0==f0.NE)&&(strmatchlc(m1e,m2a)))|| 111 Chapter 5. Implementation of the features ((m2e.f0==f0.NE)&&(strmatchlc(m2e,m1a)))} Proper2 returns TRUE, if the head or last word of m1 is a proper name that occurs in m2 or vice versa and FALSE otherwise. In (56), two examples are given: (56) a. <(den Luftfahrtunternehmerm1 h Dornier)m1 , (Silviusm2 h Dornier)m2 > b. <(Der Fußball-Profim1 h Manuel Neuer, der . . . )m1 , (Der Nationaltorwartm2 h Manuel Neuer, der . . . )m2 > The feature Proper2 returns TRUE in the case of example (56a): m1 (resp. m2 ) contains a proper name as its last word that occurs in m2 (resp. m1 ). However, this feature cannot handle common proper names that are neither head nor last word of a markable, as it is the case in example (56b), where a finite verb of a relative clause will determine the last word of each markable. For this, the relational database model or the expressiveness of the feature definition language has to be extended (cf (4.4.3); (7.3)). Table 5.34 shows the result of the comparison of the new baseline against the addition of Proper2 . Feature set New baseline + Proper2 Difference: MUC-B3 -f-score 66.44% 65.64% -0.8% Table 5.34.: Evaluation of the feature Proper2 The selection of the final features The performance of Proper1 means an improvement to the new baseline, whereas Proper2 worsens the score. As this trend is also obvious in the final feature set, the feature Proper1 is selected as final feature. 5.3. Features from inspirations of German approaches in (2.5) This group contains three features that arose from the inspiration of some German approaches, that are presented in section (2.5). Inspire1 : The idea of this feature is to simply check whether m2 constitutes a subject. The idea is the information status: discourse-given referents are usually mentioned sentence-initially in the canonical subject position. Therefore, if m2 is a subject, it is more likely coreferent with a preceding markable. Original features just focus on the commonality of grammatical functions or case but do not focus on the subject role of m2 : a) {(m2h.rewtag == rewtags.SUBJ)} Table 5.35 shows the result of the comparison of the new baseline against the addition of Inspire1 . Although there is a slight improvement with respect to the baseline, the adding of this feature to the final feature set worsens the final score (5.36). 112 5.3. Features from inspirations of German approaches in (2.5) Feature set New baseline + Inspire1 Difference: MUC-B3 -f-score 66.44% 66.48% 0.04% Table 5.35.: Evaluation of the feature Inspire1 Feature set Final set + Inspire1 Difference: MUC-B3 -f-score 73.06% 73.02% -0.04% Table 5.36.: The final set with Inspire1 Inspire2 : This feature functions as prefilter and discards all markable pairs in which one markable has the grammatical function of a copula predicate (i.e. the rewtag PRED). For instance, in the sentence (Peter)m1 ist (ein Bauer)m2 , the markable m2 is predicative and non-referring; it rather describes Peter’s property of being a farmer. a) {(m1h.rewtag == rewtags.PRED)||(m2h.rewtag == rewtags.PRED)} Inspire2 returns TRUE in the case of m1 or m2 being a copula predicate (and thereby discards the markable pair) and FALSE otherwise. Table 5.37 shows the result of the comparison of the new baseline against the addition of Inspire2 . In contrast to the baseline, the addition of Inspire2 to the final prefilter feature set worsens the score about 0.23%. Feature set New baseline + Inspire2 Difference: MUC-B3 -f-score 66.44% 66.58% 0.14% Table 5.37.: Evaluation of the feature Inspire2 Feature set Final set + Inspire2 Difference: MUC-B3 -f-score 73.06% 72.83% -0.23% Table 5.38.: The final set with Inspire2 Inspire3 : This clause-boundness feature is described in (Klenner and Ailloud, 2009) as a global feature and is reimplemented as a link feature in (Broscheit et al., 2010b). It is based on binding theory. The idea is that if both m1 and m2 are governed by the same verb (i.e. they are in the same subclause) and none of them is a reflexive pronoun, a possessive pronoun or one is the apposition of the other, m1 and m2 are disreferent: a) {(m1h.stcnum==m2h.stcnum)&& (m1h.f0!=PRF)&&(m2h.f0!=PRF)&& ((m1h.f0==POS∼)=0)&&((m2h.f0==POS∼)=0)&& (m1h.rewtag != rewtags.APP)&&(m2h.rewtag != rewtags.APP)} The feature Inspire3 returns TRUE (i.e. discards the markable pair) if m1 and m2 are in the same sentence (as having the same governor, i.e. (m1h.rewpos==m2h.rewpos) returns zero values for MUC) but neither is a reflexive or possessive pronoun and neither is an apposition. Table 5.39 shows the result of the comparison of the new baseline against the addition of Inspire3 . Feature set New baseline + Inspire3 Difference: MUC-B3 -f-score 66.44% 66.44% 0.0% Table 5.39.: Evaluation of the feature Inspire3 113 Chapter 5. Implementation of the features Inspire4 : This feature concerning the first/second person is proposed by Broscheit et al. (2010b). It checks whether both m1 and m2 are first or second person (i.e. non-third person). This feature might be an approximation of a dialog: a) {((m1h.f4==f4.first)||(m1h.f4==f4.second)) &&((m2h.f4==f4.first)||(m2h.f4==f4.second))} This feature returns TRUE in the case that m1 and m2 are first or second person. It returns FALSE, if m1 or m2 is third person. Table 5.40 shows the result of the comparison of the new baseline against the addition of Inspire4 . Feature set New baseline + Inspire4 Difference: MUC-B3 -f-score 66.44% 66.42% -0.02% Table 5.40.: Evaluation of the feature Inspire4 The selection of the final features Although all features proposed in (2.5) are linguistically motivated, their performance on the final feature set is bad. Inspire1 and Inspire2 show a positive effect on the new baseline, but the dependencies to other features in the final feature set eliminate this slight improvement of Inspire1 and Inspire2 . Therefore, no feature from this group is selected for the final feature set. 114 CHAPTER 6 Evaluation of the implemented link features This chapter presents the results of choosing the best performing features of the ones proposed in the previous chapter. In (6.1), the final link features are presented and described. In (6.2) the modified prefilter feature set is presented and the features are described. As the improvements of the feature set include several modifications of some features, the clearest way to show the performance of each improvement is to group them together into several improvement steps. These and their contributions to the final score are presented in (6.3). The final scores are presented in (6.4). Here, all four evaluation measures introduced in (2.6) are used for showing the performance gain of the final feature set (with or without further modifications of the prefilters and dataset) in comparison to the new baseline (cf. (4.2)) and the SemEval-2010 baseline. The final scores are also compared to the official scores of SemEval2010 in the setting German, closed, gold. In (6.5), the final features are checked for their contribution to the final score. As there are still dependencies between the features, the series of results after each feature addition is not always strong monotonic increasing. However in the reversed addition order, the increase is almost weak monotonic, indicating that no feature worsens the final score. In (6.6), two additional evaluations are done: one for a dissonance in the final feature set and one for checking how the three sentence distance features contribute, if added to the final feature set. 6.1. The final link feature set 1. {strmatch(m1h,m2h)&&(seqmatch(m1h,m2h)==0)&&(editdist(m1h,m2h)<3)&& ((bswitchlc(m1h,m2h)&&(eswitch(m1h,s)||eswitch(m1h,es)||eswitch(m1h,e)))|| (bswitchlc(m2h,m1h)&&(eswitch(m2h,s)||eswitch(m2h,es)||eswitch(m2h,e))))&& ((strmatchlc(m2b,ein)==0)||(m2h.f0==f0.NE))&& (strmatchlc(m1b,kein)==0)&&(strmatchlc(m1b,jede)==0)&& ((m1h.f0==f0.P∼)==0)&&((m2h.f0==f0.P∼)==0)} ⇒ The heads of m1 and m2 substring match but do not exact string match and their edit distance is less than 3. Moreover, the head of m1 starts with the head of m2 and ends with the suffix “s”, “es” or “e” or vice versa. m2 does not start with an indefinite article or its head is a proper name. m1 does not start with the quantifier “kein” or “jede”. Neither m1 nor m2 is a pronoun. 2. {seqmatch(m1h,m2h)&& ((strmatchlc(m2b,ein)==0)||(m2h.f0==f0.NE))&& (strmatchlc(m1b,kein)==0)&&(strmatchlc(m1b,jede)==0)} ⇒ The heads of m1 and m2 exact string match, 115 Chapter 6. Evaluation of the implemented link features m2 does not start with an indefinite article or its head is a proper name and m1 does not start with the quantifier “kein” or “jede”. 3. {((m1b.txtpos<=m2b.txtpos)&&(m2e.txtpos<=m1e.txtpos))||((m2b.txtpos<=m1b.txtpos)&& (m1e.txtpos<=m2e.txtpos))} ⇒ m1 includes m2 or vice versa. 4. {(m1b.txtpos<=m2b.txtpos)&&(m1e.txtpos<=m2e.txtpos)&&(m1e.txtpos>=m2b.txtpos)} ⇒ m1 precedes m2 but they overlap. 5. {(m2b.txtpos<=m1b.txtpos)&&(m2e.txtpos<=m1e.txtpos)&&(m2e.txtpos>=m1b.txtpos)} ⇒ m2 precedes m1 but they overlap. 6. {(m1h.f0==f0.NE)&&(m2h.f0==f0.NE)&&(seqmatch(m1h,m2a)||(seqmatch(m2h,m1a)))} ⇒ Both markables are proper names and the head of one markable is contained in the other. 7. {(m1h.f0==f0.NE)&&(m2h.f0==f0.NN)} ⇒ m1 is a proper name and m2 is a common noun. 8. {(m1h.f0==f0.NN)&&(m2h.f0==f0.NE)} ⇒ m1 is a common noun and m2 is a proper name. 9. {(m1h.f0==f0.NN)&&(m2h.f0==f0.NN)} ⇒ Both markables are common nouns. 10. {(m1h.rewtag == rewtags.SUBJ) && (m2h.rewtag == rewtags.SUBJ)} ⇒ Both markables are subjects. 11. {(m1h.rewtag != rewtags.SUBJ) && (m2h.rewtag != rewtags.SUBJ)} ⇒ Neither m1 nor m2 is a subject. 12. {m1h.f0==f0.PDS} ⇒ m1 is a substituting demonstrative pronoun. 13. {m1h.f0==f0.PIS} ⇒ m1 is a substituting indefinite pronoun. 14. {(m1h.f0==f0.PPER)&&(seqmatchlc(m1h,es)==0)} ⇒ m1 is a personal pronoun but does not exact string match with “es”. 15. {(m1h.f0==f0.PPOSS)||(m1h.f0==f0.PPOSAT)} ⇒ m1 is a substituting or attributive possessive pronoun. 16. {m1h.f0==f0.PRF} ⇒ m1 is a reflexive pronoun. 17. {(((m2h.f0==f0.PRF)||(m2h.f0==f0.PPOS∼)||(m2h.f0==f0.PREL∼))&& (m1h.rewtag==rewtags.SUBJ))||(m1h.f0!=f0.PRELS)} ⇒ If m1 is a substituting relative pronoun, then m2 is a reflexive pronoun, a possessive pronoun or also a relative pronoun. 18. {m2h.f0==f0.PDS} ⇒ m2 is a substituting demonstrative pronoun. 19. {m2h.f0==f0.PIS} ⇒ m2 is a substituting indefinite pronoun. 116 6.2. The final prefilter feature set 20. {(m2h.f0==f0.PPER)&&(seqmatchlc(m2h,es)==0)} ⇒ m2 is a personal pronoun but does not exact string match with “es”. 21. {(m2h.f0==f0.PPOSS)||(m2h.f0==f0.PPOSAT)} ⇒ m2 is a substituting or attributive possessive pronoun. 22. {m2h.f0==f0.PRF} ⇒ m2 is a reflexive pronoun. 23. {(m2h.f0==f0.PRELAT)||(m2h.f0==f0.PRELS)} ⇒ m2 is a substituting or attributive relative pronoun. 24. {(m1h.f1==m2h.f1)&&(m1h.f1!=f1.unknown)} ⇒ Both markables have the same number which is not unknown. 25. {(m1h.f2==m2h.f2)&&(m1h.f2!=f2.unknown)} ⇒ Both markables have the same gender which is not unknown. 6.2. The final prefilter feature set 1. {seqmatchlc(m1h,Mark) || seqmatchlc(m1h,Meter) || seqmatchlc(m1h,Prozent) || seqmatchlc(m1h,Mittwoch) || seqmatchlc(m1h,Sekunden) || seqmatchlc(m1h,Juli) || seqmatchlc(m1h,Milliarden) || seqmatchlc(m1h,Dollar) || seqmatchlc(m1h,Jahr) } ⇒ A link is filtered out, if the head of m1 exact string matches with units, currencies, month days and the like. 2. {(m1h.f2==f2.male)&&(m2h.f2==f2.neutral)} ⇒ A link is filtered out, if the gender of m1 is “male” and the one of m2 is “neuter”. 3. {(m1h.f2==f2.neutral)&&(m2h.f2==f2.male)} ⇒ A link is filtered out, if the gender of m1 is “neuter” and the one of m2 is “male”. 4. {(m1h.f2==f2.female)&&(m2h.f2==f2.neutral)&&(seqmatch(m2h,Mädchen)==0)} ⇒ A link is filtered out, if the gender of m1 is “female” and the one of m2 is “neuter” and m2 does not exact string match with “Mädchen”. 5. {(m1h.f2==f2.neutral)&&(m2h.f2==f2.female)&&(seqmatch(m1h,Mädchen)==0)} ⇒ A link is filtered out, if the gender of m1 is “neuter” and the one of m2 is “female” and m1 does not exact string match with “Mädchen”. 6. {(m1h.f1==f1.both_ihr)&&((m2h.f2==f2.male)||(m2h.f2==f2.neutral))&&(m2h.f1==f1.singular)} ⇒ A link is filtered out, if the number of m1 is “both_ihr” and the one of m2 is “singular” and its gender is “male” or “neuter”. 7. {(m2h.f1==f1.both_ihr)&&((m1h.f2==f2.male)||(m1h.f2==f2.neutral))&&(m1h.f1==f1.singular)} ⇒ A link is filtered out, if the number of m2 is “both_ihr” and the one of m1 is “singular” and its gender is “male” or “neuter”. 8. {(m1h.f1==f1.singular)&&(m2h.f1==f1.plural)} ⇒ A link is filtered out, if the number of m1 is “singular” and the one of m2 is “plural” 9. {(m1h.f1==f1.plural)&&(m2h.f1==f1.singular)} ⇒ A link is filtered out, if the number of m1 is “plural” and the one of m2 is “singular” 117 Chapter 6. Evaluation of the implemented link features 10. {(abs(m2b.stcnum-m1b.stcnum)>2)&&((m1h.f0==f0.P∼)||(m2h.f0==f0.P∼))} ⇒ A link is filtered out, if the markables are more than two sentences apart from each other and at least one of them is a pronoun. 11. {(abs(m2b.stcnum-m1b.stcnum)>0)&&((m1h.f0==f0.PRF)||(m2h.f0==f0.PRF) ||(m1h.f0==f0.PRELS)||(m2h.f0==f0.PRELS)||(m1h.f0==f0.PRELAT)||(m2h.f0==f0.PRELAT))} ⇒ A link is filtered out, if the markables are not in the same sentence and one of the markables is a reflexive or a relative pronoun. 12. {(m2b.f0==f0.PIS)||(m2b.f0==f0.PIAT)||(m2b.f0==f0.PIDAT)} ⇒ A link is filtered out, if m2 is a kind of indefinite pronoun. 13. {(m1h.f2==f2.female)&&(m2h.f2==f2.male)} ⇒ A link is filtered out, if the gender of m1 is “female” and the one of m2 is “male”. 14. {(m1h.f2==f2.male)&&(m2h.f2==f2.female)} ⇒ A link is filtered out, if the gender of m1 is “male” and the one of m2 is “female”. 15. {(m1h.f2!=m2h.f2)&&(m1h.f2!=f2.unknown)&&(m2h.f2!=f2.unknown)&&((m1h.f0==f0.P∼)} ⇒ A link is filtered out, if the genders of m1 and m2 are different but not unknown and m1 is a pronoun. 6.3. Evaluation of improvement steps As the final feature set does not only contain new added features but lacks of original features and contains modified features, the clearest way of showing the improvements of each basic step (i.e. feature deletion, feature insertion, feature modification and feature merging) and thereby not going too far in details is to create groups of such basic steps. Subsequently, these groups are called improvement steps. No. 1 2 3 4 5 6 7 8 9 10 11 Description Involved final features Feature engineering Indefiniteness of the second markable 1; 2 Wrong assignment of a relative pronoun in m1 17 Problems with substring-matches 1 “Es“(“it“) as expletive pronoun in German 14; 20 Problems with the alias-feature First markable begins with kein/jede 1; 2 A common, possibly appositive proper name 6 Simplification of the original pronoun features 12 - 23 Deletion of features concerning equality in case and person Modification of word table, number attribute and prefilters Disagreement in gender and number Prefilters: 2 - 7 Units, currencies, month names, weekdays and the like Prefilters: 1 Table 6.1.: The steps from the new baseline to the final feature set In table 6.1, every improvement step is listed with its number (i.e. the position in the change from the new baseline to the final feature set) and the final features that are involved in this step. The order in which the improvement steps are set is irrelevant. It is based on the order of groups in chapter 4 and 5, but separates the improvement steps modifying prefilter features. 118 6.3. Evaluation of improvement steps The improvement steps from no. 1 - 9 are based on pure feature engineering, whereas improvement steps no. 10 and 11 address the modification of the prefilter feature set, the annotation in the word table and the modification of the number attribute (i.e. the insertion of a new value). The basic steps in the respective improvement step are: 1. Insertion of ((strmatchlc(m2b,ein)==0)||(m2h.f0==f0.NE)) in substring and exact string matching features. 2. Insertion of feature no. 17 Deletion of the features: (m1h.f0==f0.PRELS)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)|| (m2h.f1==f1.unknown)) (m1h.f0==f0.PRELAT)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)|| (m2h.f1==f1.unknown)) 3. Modification of the substring matching feature: strmatch(m1h,m2h)&&(seqmatch(m1h,m2h)==0)&&(editdist(m1h,m2h)<3)&& ((bswitchlc(m1h,m2h)&&(eswitch(m1h,s)||eswitch(m1h,es)|| eswitch(m1h,e)))|| (bswitchlc(m2h,m1h)&&(eswitch(m2h,s)||eswitch(m2h,es)|| eswitch(m2h,e)))) Insertion of the requirement of non-pronominality: ((m1h.f0==f0.P∼)==0)&&((m2h.f0==f0.P∼)==0) 4. Modification of the pronoun features concerning personal pronouns in m1 and m2 : (m1h.f0==f0.PPER)&&(seqmatchlc(m1h,es)==0) (m2h.f0==f0.PPER)&&(seqmatchlc(m2h,es)==0) 5. Removal of the alias-feature of the original feature set 6. Insertion of (strmatchlc(m1b,kein)==0)&&(strmatchlc(m1b,jede)==0) in the string matching features 7. Modification of the named entity feature (m1h.f0==f0.NE)&&(m2h.f0==f0.NE) with the condition that the head of one markable has to be contained in the other markable: (seqmatch(m1h,m2a)||(seqmatch(m2h,m1a))) 8. Deletion of the features concerning attributive demonstrative pronouns (P DAT ) and attributive indefinite pronouns (P IDAT ), as they are not annotated as markables in the dataset. Deletion of the number condition, as a disagreement is excluded by the prefilters anyway: ((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)||(m2h.f1==f1.unknown)) Merging of the remaining pronoun features of the same class and markable: e.g. (m1h.f0==f0.PPOSS) and (m1h.f0==f0.PPOSAT) ⇒ (m1h.f0==f0.PPOSS)||(m1h.f0==f0.PPOSAT) 9. Deletion of the last two features in the original feature set: (m1h.f3==m2h.f3)&&(m1h.f3!=f3.unknown) (equality in case) (m1h.f4==m2h.f4)&&(m1h.f4!=f4.unknown) (equality in person) as they do not contribute to a better score and the person value of two coreferent markables can in fact be different (e.g. in the case of a shift from direct speech to indirect speech or vice versa). 10. Inclusion of the features Agree1 and Agree2 : Introduction of the new number value both_ihr, that describe the number value of a possessive 119 Chapter 6. Evaluation of the implemented link features pronoun starting with ihr. Annotation of every possessive pronoun starting with ihr with the number value both_ihr. Insertion of the prefilter features no. 2 - 7 concerning the disagreement with the gender neuter and the number value both_ihr. 11. Introduction of the prefilter feature no. 1 concerning units, month names and the like. The performance of each improvement step in turn is listed in table 6.2. Each step shows an improvement. The best improvements in these steps are done by the improvement step no. 1, indefiniteness, and no. 3, substring match, underlining the importance of checking the definiteness of m2 and the substring match in coreference resolution. The improvement steps no. 10 and 11 provide a performance gain of about 1.1%. Thus, the quality of the annotation as well as appropriate prefilter features are significant for the performance of a coreference resolver. Sometimes, the improvement is very slight (as for the removal of the alias-feature). But by removing features that do not perform well, the resulting feature set becomes smaller and thus clearer. Moreover, it reduces runtime of the training of the decision tree classifier significantly. Improvement step New baseline + Indefiniteness + Relative pronoun + Substring match + es-pronoun + alias-removal + keine/jede with m1 + Proper name + Simplification + Deletion case/pers. + Disagreement + Units MUC C 2390 2339 2335 2222 2194 2190 2189 2229 2235 2226 2248 2243 P 0.4608 0.492 0.4929 0.5431 0.5539 0.555 0.5578 0.5589 0.56 0.5621 0.5714 0.5821 R 0.7251 0.7096 0.7084 0.6742 0.6657 0.6644 0.6641 0.6763 0.6781 0.6754 0.682 0.6805 F1 0.5635 0.5811 0.5814 0.6016 0.6047 0.6048 0.6064 0.612 0.6134 0.6136 0.6219 0.6275 P 0.7367 0.7779 0.7802 0.8359 0.8489 0.8498 0.8516 0.8477 0.8487 0.8517 0.8545 0.8604 B3 R 0.898 0.8948 0.895 0.887 0.8856 0.8854 0.8849 0.8887 0.8905 0.8887 0.8893 0.8889 F1 0.8094 0.8322 0.8337 0.8607 0.8669 0.8673 0.8679 0.8677 0.8691 0.8698 0.8716 0.8744 MUC-B3 F1 0.664425 0.68437 0.685011 0.708185 0.712397 0.712634 0.713943 0.717789 0.719219 0.719554 0.725828 0.730662 Table 6.2.: Performance of the improvement steps 6.4. The final scores In this section, the question is addressed, how well the final feature set performs compared to the new baseline and to the SemEval-2010 baseline for SUCRE. This is shown in table 6.3. A comparison between the final scores of SUCRE and the results of SemEval-2010 in German, closed, gold (i.e. the participants TANL-1, UBIU and SUCRE) is provided in table 6.4. With respect to the performance differences, this section provides information about all four evaluation scores: MUC, B3 , CEAF and BLANC. The columns in table 6.3 mean: • Evaluation score: all kinds of evaluation scores that are available in SUCRE • New baseline: the scores of the new baseline gained by removing the first three features from the original feature set (cf. (4.2)) • SemEval-2010: The scores of SUCRE in German, closed, gold, extracted from the literature 120 6.4. The final scores • Feature engineering: The scores achieved by just modifying the link feature set • Including modification of word table, number attribute and prefilters: This is the final score including all 11 improvement steps The focus on the evaluation scores MUC and B3 is indicated by the highlighted lines of the respective f-scores. Evaluation score New baseline (cf. (4.2)) SemEval2010 Feature engineering MUC-correct MUC-Precision MUC-Recall MUC-f-score B3 -all B3 -Precision B3 -Recall B3 -f-score CEAFM -all CEAFM -Precision CEAFM -Recall CEAFM -f-score CEAFE -correct CEAFE -Precision CEAFE -Recall CEAFE -f-score BLANC-Attraction-f-score BLANC-Repulsion-f-score BLANC-Precision BLANC-Recall BLANC-f-score MUC-B3 -f-score RAND-accuracy Number of link features 2390 0.460767 0.725121 0.56348 13446 0.73673 0.898049 0.80943 13446 0.718132 0.718132 0.718132 7139.44 0.864444 0.703393 0.775647 0.315814 0.987133 0.611977 0.742423 0.670918 0.664425 0.974741 37 2439 0.481 0.74 0.584 13446 0.736 0.904 0.811 2226 0.562121 0.675364 0.613561 13446 0.851721 0.888692 0.869814 13446 0.795329 0.795329 0.795329 8277.64 0.872617 0.815532 0.843109 0.416529 0.993164 0.704375 0.70532 0.704848 0.719554 0.986487 25 0.729 0.729 0.729 0.618 0.782 0.664 0.6790 Including modification of word table, number attribute and prefilters 2243 0.582144 0.680522 0.6275 13446 0.860368 0.888931 0.874416 13446 0.80299 0.80299 0.80299 8395.35 0.875154 0.827128 0.850464 0.426256 0.993411 0.713553 0.706246 0.709881 0.730662 0.986972 25 Table 6.3.: The final scores MUC: The MUC-correct value, the absolute number of correct coreference links, is biggest in SemEval. Although, this value is not given in the literature, it can be predicted by the following formula, as it correlates with the MUC-Recall: 0.74 M U C RecallSemEval · M U C correctN ew baseline = · 2390 ≈ 2439 M U C RecallN ew baseline 0.725121 (6.1) After feature engineering, the MUC-correct value is about 213 links smaller. By modifying prefilters, word table and number attribute, it slightly rises by about 17 links. However, the precision of MUC rises by about 10% compared with SemEval, such that the MUC-f-score of 62.75% is about 4.35% bigger than in SemEval-2010. The reason for the decrease of recall might be due to the fact that most features in the final feature set are based on linguistic analyses of false positives, i.e. they try to constrain a vote for coreference. 121 Chapter 6. Evaluation of the implemented link features B3 : A similar scene is given with B3 . The recall of 90.4% in SemEval-2010 slightly decreases to 88.9% after feature engineering and the improvement steps no. 10 and 11. However, the precision increases about 12.4% up to 86%. Therefore, the final B3 -f-score is about 6.3% better than in SemEval-2010. MUC-B3 -f-score: The harmonic mean of MUC-f-score and B3 -f-score, which is focussed as an appropriate trade-off between MUC and B3 within this study, shows the same improvement as in MUC and B3 : the SemEval-2010 baseline of 67.9% (66.4% in the new baseline) is outperformed by the final system configuration, which achieves 73%. This means an improvement of 5.1% compared to SemEval-2010 and even 6.6% compared to the new baseline. CEAF: Depending on the similarity measure of the one-to-one entitiy alignment, one can distinguish between mention-based CEAF (CEAFM ) and entity-based CEAF (CEAFE ) (see (2.6.3)). The most common version of CEAF is the mention-based CEAF, for which precision and recall are identical in the case of true markables. In CEAFM , there is a clear improvement from 72.9% in SemEval-2010 (71.8% in the new baseline) up to 80.3%. In SemEval-2010, there are no results for CEAFE . Thus, the final results are only compared to the new baseline. Here, both in precision and recall, there is an improvement. The precision increases by 1.1% and the recall even increases by 12.4%. So, the CEAFE -f-score is increased by 7.5%. BLANC: The BLANC-Attraction-f-score (the BLANC-score for coreference links, see F 1c in table 2.16 in (2.6.4)) can only be compared to the new baseline. Here, there is a clear improvement in the final scores of about 11%. The BLANC-Repulsion-f-score (the BLANC-score for disreference links, see F 1d in table 2.16 in (2.6.4)) of the final scores shows a slight improvement of about 0.6%. With respect to the BLANC score, the recall of the final scores is much smaller (about 7.6%) than in SemEval-2010. However, the precision is significantly better (about 9.5%). Thus, the overall BLANC score of the final scores is with 71% about 4.6% better than in SemEval-2010. Number of features: The number of features clearly decreases by removing non-performing features or merging two similar features together (see improvement step no. 8). This results in a faster training step, in a clearer feature set and in the prevention of too much dependencies between the link features. CEAF P R F1 SemEval-2010 German closed × gold SUCRE 72.9 72.9 72.9 TANL-1 77.7 77.7 77.7 UBIU 67.4 68.9 68.2 The final scores (rounded) SUCRE 80.3 80.3 80.3 R B3 P F1 R 58.4 25.9 21.9 90.4 77.2 73.7 73.6 96.7 77.9 81.1 85.9 75.7 78.2 54.4 60.0 61.8 75.1 77.2 66.4 57.4 64.5 62.8 88.9 86.0 87.4 70.6 71.4 71.0 R MUC P F1 74.4 16.4 22.1 48.1 60.6 21.7 68.1 58.2 BLANC P F1 Table 6.4.: SemEval-2010 Results - German, closed, gold vs. Final scores Table 6.4 provides the comparison of the final configuration with the SemEval-2010 participants in German, closed, gold. The involved evaluation scores with recall (R), precision (P) and f-score (F1 ) 122 6.5. The performance of each feature in the final feature set are: CEAFM , MUC, B3 and BLANC. The participants are SUCRE, TANL-1 and UBIU (for more details on SemEval-2010 see (2.7)). With respect to the SemEval-2010 competition, the best results in MUC and BLANC are achieved by SUCRE, whereas it is outperformed by TANL-1 in B3 and CEAF. There is a bias that TANL-1 has a greater precision but a lower recall than SUCRE. Now, the final scores outperform every participant in each evaluation score with respect to f-score: CEAF: Here, the final configuration outperforms the winner TANL-1 by 2.6%. As the CEAFM -score is identical for recall and precision, the same improvements are given in recall and precision. MUC: In SemEval-2010, SUCRE achieved best results in MUC and now outperforms itself by 4.4%. However, the final configuration shows a significant decrease in the MUC-Recall. So, SUCRE2010 is better in recall and TANL-1 still outperforms the final system in precision. B3 : The winner concerning B3 , TANL-1, is outperformed by the final system by 1.5%. As with MUC, the precision is outperformed by TANL-1, whereas the recall is outperformed by SUCRE2010 . BLANC: SUCRE performs best in SemEval-2010 in BLANC. The final system of SUCRE outperforms itself by 4.6%. As with MUC and B3 , the final system is outperformed in recall by SUCRE2010 and in precision by TANL-1. 6.5. The performance of each feature in the final feature set In table 6.5 and 6.6 the final features are checked for their contribution in the given order. As there are still dependencies between the features, the addition of one feature might worsen the final score so far, as it is the case with features no. 1 - 17 in table 6.5. But since in the reversed addition order, the increase is weak monotonic except a slight decrease with features no. 9 (cf. features no. 25 - 9 in table 6.6), which is not present in table 6.5 (cf. features no. 1 - 9), one can conclude that no feature worsens the final score. 6.6. Additional evaluations m2 may be an indefinite pronoun? One might be confused by the fact that the final feature set needs feature no. 19, which checks for m2 being an indefinite pronoun: 19. m2h.f0==f0.PIS although, there is the prefilter feature no. 12, which discards all markable pairs in which m2 is any kind of an indefinite pronoun: 12. {(m2b.f0==f0.PIS)||(m2b.f0==f0.PIAT)||(m2b.f0==f0.PIDAT)} However, if vector feature no. 19 is removed, the MUC-B3 -score decreases (cf. table 6.7). Feature set Final set - (m2 == PIS) Difference: MUC-B3 -f-score 73.06% 73.00% -0.06% Table 6.7.: The performance without vector feature no. 19 123 Chapter 6. Evaluation of the implemented link features MUC Features 1-1 1-2 1-3 1-4 1-5 1-6 1-7 1-8 1-9 1 - 10 1 - 11 1 - 12 1 - 13 1 - 14 1 - 15 1 - 16 1 - 17 1 - 18 1 - 19 1 - 20 1 - 21 1 - 22 1 - 23 1 - 24 1 - 25 C 175 1158 1158 1158 1158 1197 1457 1456 1449 1449 1356 1356 1356 1355 1449 1449 1227 1227 1230 1355 1674 1788 2077 2128 2243 P 0.7292 0.5212 0.5212 0.5212 0.5212 0.5245 0.4834 0.4896 0.5254 0.5254 0.5394 0.5394 0.5394 0.5394 0.5223 0.5223 0.5298 0.5284 0.5286 0.5268 0.5589 0.5488 0.5739 0.5737 0.5821 R 0.0531 0.3513 0.3513 0.3513 0.3513 0.3632 0.4421 0.4417 0.4396 0.4396 0.4114 0.4114 0.4114 0.4111 0.4396 0.4396 0.3723 0.3723 0.3732 0.4111 0.5079 0.5425 0.6302 0.6456 0.6805 F1 0.099 0.4197 0.4197 0.4197 0.4197 0.4292 0.4618 0.4644 0.4787 0.4787 0.4668 0.4668 0.4668 0.4666 0.4774 0.4774 0.4373 0.4368 0.4375 0.4618 0.5322 0.5456 0.6007 0.6076 0.6275 P 0.9928 0.9249 0.9249 0.9249 0.9249 0.9213 0.8766 0.8809 0.8994 0.8994 0.9142 0.9142 0.9142 0.9143 0.8972 0.8972 0.9205 0.9205 0.9203 0.9101 0.8924 0.879 0.8692 0.8647 0.8604 B3 R 0.7622 0.8158 0.8158 0.8158 0.8158 0.8182 0.8381 0.8385 0.8384 0.8384 0.8289 0.8289 0.8289 0.8288 0.8379 0.8379 0.8197 0.8198 0.82 0.8259 0.847 0.8548 0.8756 0.8793 0.8889 F1 0.8623 0.8669 0.8669 0.8669 0.8669 0.8667 0.8569 0.8592 0.8678 0.8678 0.8695 0.8695 0.8695 0.8695 0.8666 0.8666 0.8672 0.8672 0.8673 0.866 0.8691 0.8667 0.8724 0.8719 0.8744 MUC-B3 F1 0.17758 0.565604 0.565604 0.565604 0.565604 0.574085 0.600162 0.60294 0.617034 0.617034 0.60745 0.60745 0.60745 0.607291 0.615662 0.615662 0.581387 0.580987 0.5816 0.602396 0.660137 0.669663 0.711503 0.716129 0.730662 Table 6.5.: Cumulative performance of the final feature set A possible reason for this might be the following slight difference: the prefilter feature refers to the first word in the markable and the final link feature to the head word. Usually, substituting indefinite pronouns occur without determiner, but there are still examples where an indefinite pronoun is initialized by an adverb or an definite article, like in so mancherm2 , die anderenm2 and die allermeistenm2 . However, if the prefilter feature is modified to (m2h.f0==f0.PIS), the MUC-B3 -f-score decreases to 72.99%. The final feature set and the three sentence distance features Three features have been removed in (4.2): 1. {abs(m2b.stcnum-m1b.stcnum)==0} 2. {abs(m2b.stcnum-m1b.stcnum)==1} 3. {abs(m2b.stcnum-m1b.stcnum)>1} If these features are re-added to the final feature set, the result decreases enormously (cf. table 6.8): 124 6.6. Additional evaluations MUC Features 25 - 25 25 - 24 25 - 23 25 - 22 25 - 21 25 - 20 25 - 19 25 - 18 25 - 17 25 - 16 25 - 15 25 - 14 25 - 13 25 - 12 25 - 11 25 - 10 25 - 9 25 - 8 25 - 7 25 - 6 25 - 5 25 - 4 25 - 3 25 - 2 25 - 1 C 0 0 318 318 353 681 681 706 705 721 873 899 910 918 1173 1236 1234 1234 1234 1749 1749 1749 1750 2158 2243 P 0.0 0.0 0.8051 0.8051 0.7879 0.6637 0.6637 0.6586 0.6589 0.6369 0.62 0.6635 0.6647 0.6643 0.6528 0.6581 0.6592 0.6592 0.6592 0.7081 0.7081 0.7081 0.7108 0.5789 0.5821 R 0.0 0.0 0.0965 0.0965 0.1071 0.2066 0.2066 0.2142 0.2139 0.2188 0.2649 0.2728 0.2761 0.2785 0.3559 0.375 0.3744 0.3744 0.3744 0.5306 0.5306 0.5306 0.5309 0.6547 0.6805 F1 nan nan 0.1723 0.1723 0.1886 0.3151 0.3151 0.3233 0.323 0.3257 0.3712 0.3866 0.3901 0.3925 0.4606 0.4778 0.4776 0.4776 0.4776 0.6067 0.6067 0.6067 0.6079 0.6145 0.6275 P 1.0 1.0 0.9972 0.9972 0.9956 0.9785 0.9785 0.9756 0.976 0.9741 0.9527 0.9544 0.9538 0.9523 0.9344 0.9349 0.9354 0.9354 0.9354 0.9237 0.9237 0.9237 0.9251 0.8644 0.8604 B3 R 0.7549 0.7549 0.7775 0.7775 0.7783 0.7956 0.7956 0.7978 0.7978 0.7995 0.8024 0.8036 0.8047 0.8048 0.8218 0.8246 0.8247 0.8247 0.8247 0.8558 0.856 0.856 0.8576 0.8814 0.8889 F1 0.8603 0.8603 0.8738 0.8738 0.8737 0.8776 0.8776 0.8778 0.878 0.8782 0.8711 0.8726 0.8729 0.8724 0.8745 0.8763 0.8766 0.8766 0.8766 0.8885 0.8886 0.8886 0.8901 0.8728 0.8744 MUC-B3 F1 0.287855 0.287855 0.310187 0.463743 0.463743 0.472507 0.472204 0.475124 0.520547 0.535788 0.53926 0.541383 0.603412 0.618392 0.618278 0.618278 0.618278 0.721002 0.721046 0.721046 0.722389 0.721198 0.730662 Table 6.6.: Reversed cumulative performance of the final feature set Feature set Final set + sentence distance Difference: MUC-B3 -f-score 73.06% 64.02% -9.04% Table 6.8.: The performance with the final features and sentence distance 125 Chapter 6. Evaluation of the implemented link features 126 CHAPTER 7 Summary and conclusions 7.1. Summary This diploma thesis intended to improve the performance of SUCRE’s coreference resolution in the German language with the use of a more linguistic background. Therefore, misclassified markable pairs (i.e. links) are considered in their context to determine what linguistic phenomenon is responsible for this coreference or disreference and how this phenomenon could be modeled in a link feature in order to provide a feature set that is able to rightly classify the respective markable pairs afterwards. Based on the architecture of the mention-pair model used in SUCRE, this task just focuses on the resulting feature vectors, i.e. the input for the pairwise classifier. Thus, one issue which has to be kept in mind is that this work only modifies the classifier’s result rather than the overall result, i.e. the output of the clustering step. And if the classification is improved, it might not take effect on the clustering (see Combining classification and clustering in (2.3.1)) Chapter 2 first outlined the term of coreference and coreference resolution. Two markables are coreferent, if they refer to the same real-world entity. The coreference resolution over a set of markables yields a coreference partition in which every cluster constitutes an equivalence class. In an end-to-end coreference system, the input is raw text. Thus the markables still have to be detected. This is a hard task as there are difficulties like nested markables. In (2.3), three models based on supervised machine learning are presented. The mention-pair model uses a pairwise classifier which checks for two markables whether they are coreferent or not. Afterwards, a clustering step uses these pairwise decisions to create a final coreference partition. This model is used in SUCRE. However, one drawback of this model is that it does not provide a perspective over the entire cluster and thereby allows two disreferent markables ending up in the same cluster. This issue is addressed by the entity-mention model. Here, a markable is compared with a cluster of preceding markables and thereby it is ensured that every markable in a cluster is compatible with each other. Another problem of the mention-pair model is that every candidate antecedent is considered independently from each other and thus, there is no chance for comparing one candidate against another. This flaw is addressed by the ranking model. A training instance of the ranking model is a triple of markables: the respective markable, its true candidate antecedent and a false one. Apart from the supervised approaches, Cardie and Wagstaff (1999) present an unsupervised clustering method for coreference resolution (cf. (2.4)). Advantages of a pure clustering task is that there is no need for labeled training data and that it includes local as well as global constraints. In contrast to the models using a pairwise classifier, whose input is a feature vector, which corresponds to a markable pair, Cardie and Wagstaff (1999) represent each markable as a feature vector comprising 11 markable features. As a distance measure, Cardie and Wagstaff (1999) use a combination of feature weights and incompatibility values. Coreference resolution has been researched predominantly in English. In (2.5), five approaches for 127 Chapter 7. Summary and conclusions German coreference resolution were presented. Hartrumpf (2001) (cf. (2.5.1)) applies a hybrid approach that combines “syntactico-semantic” rules with corpus statistics. He uses 18 coreference rules that “license possible coreference”. For all licensed markable pairs, all possible partitions are created, however pruned if certain conditions are violated. The second approach is from Strube et al. (2002) (cf. (2.5.2)). They show the necessity of a more complex string matching feature. When they implemented two features for the minimum edit distance between m1 and m2 , they improved the performance for non-pronouns (i.e. definite noun phrases and proper names). Versley (2006) implements hard and weighted soft constraints, whose weights are estimated with a maximum entropy model. He focusses on the feature design of proper names and definite noun phrases. He uses several external sources like hypo-/hypernymy information from GermaNet for what he calls coreference bridging. In (2.5.4), the approach of Klenner and Ailloud (2009) is shown. They use a memory-based pairwise classifier and a modification of a Zero-One Integer Linear Programming system (ILP). This ILP framework is used as a clustering method that enables the use of global constraints like clause-boundedness (see binding theory). In addition, they show that the first solution of their algorithm (“Balas-First”) is very often the best and thus “optimization [. . . ] is not needed”. Broscheit et al. (2010b) try to create a coreference system that is freely avaible for further research in coreference resolution. Therefore, they extend the coreference system BART for the use of German data. Beside the feature set from Klenner and Ailloud (2009), they implement some features concerning the first/second person or quoted speech. The performance is best with a maximum entropy classifier and a separation for pronouns and non-pronouns (“split”). In (2.6), the four most common evaluation scores (i.e. MUC, B3 , CEAF and BLANC) are described. The MUC-score focusses on coreference links, disregarding disreference links and singletons. It counts each link that occurs both in the predicted partition and in the true partition. By dividing it by the number of links in the true partition or in the predicted partition, one can get the recall or the precision of MUC. Their harmonic mean is the f-score of MUC. The B3 -score on the other hand disregards links and measures the overlap of predicted and true clusters. For each markable, the number of common markables in the respective cluster is divided by the number of markables in the respective true or predicted cluster. This results in recall and precision, whose harmonic mean again is the f-score. One drawback of B3 is that clusters may be used several times for an alignment of predicted and true clusters. The solution for this is done by CEAF. It only allows a one-to-one alignment between clusters. The clusters are chosen according to a similarity measure (Luo, 2005). Depending on different measures, one can distinguish between CEAFM and CEAFE . The most complex evaluation score is the BiLateral Assessment of Noun-Phrase Coreference (BLANC). BLANC computes all coreference and disreference links in a cluster. For each link class, precision and recall are computed by dividing the number of right links by the number of predicted links or by the number of true links (cf. table 2.15). The final BLANC score is the arithmetic mean of f-scores for both link classes. The SemEval-2010 competition in coreference resolution is sketched in (2.7). It provides analyses of six languages including German and English. Based on the properties closed/open (meaning the use of external sources) and gold/regular (meaning the use of automatic annotation), there are four evaluation settings. Here, SUCRE performs best in closed × regular in German, English and Italian. Chapter 3 presented the SUCRE system. For coreference resolution, it uses a mention-pair model. For the feature definition, there are the relational database model and the related regular feature definition language. Feature engineering and the classification are clear separated. The output of a preprocessing step is the relational database model containing at least the word table, the markable table and the link table, which are connected to each other via foreign keys. The use of a regular feature definition language enables to simply express features and to have access to the different levels in the relational database model. 128 7.1. Summary After creating links and thereof feature vectors, a classifier (e.g. a decision tree classifier) is trained. The classification results (i.e. the pairwise decisions) are used for best-first clustering in order to get to the final coreference chains (i.e. the final partition). In (3.3), the interactive visualization of the feature space with the use of Self Organizing Maps was presented. The advantage of presenting high dimensional coreference data in a low dimensional space enables the user to better understand the distribution of coreference data in the feature space. By exploring areas in the feature space with gray nodes (i.e. with nodes that are assigned with coreference and disreference links), one can find new insights for feature engineering. Moreover, SOMs help to annotate a larger amount of data faster. Kobdani and Schütze (2010b) show the multi-lingual aspect of SUCRE by implementing identical and universal features that are applicable on several languages. Here, the proposed languages are Dutch, German, Italian and Spanish. In chapter 4, the linguistic analysis of false positives and false negatives took place. The first issue was a dissonance between the MUC-B3 -f-score of SemEval-2010 and of the initial configuration. This was mitigated by removing three features concerning the distance in terms of sentences. Within the analysis of the false positives, several groups of most frequent link errors were found. One issue was the definiteness of the anaphor. Here, an exact string match outweighs an existing feature for indefinite anaphors. A solution was to exclude the indefiniteness of m2 within the string matching features. Another problem was the wrong assignment of relative pronouns, for instance m1 as a relative pronoun is assigned to a succeeding markable with a grammatical function that is not permitted or m2 as a relative pronoun is related to an antecedent which is not directly adjacent as it is usually the case. Sometimes, the right antecedent of an anaphor was between the predicted antecedent and m2 . This problem cannot be solved as there is no way for checking the context of a markable (i.e. whether there are compatible markables closer to m2 ) and the link features are only defined on two markables and do not provide a set of markables between m1 and m2 . A further group addressed the case where a reflexive pronoun is related to a non-subject in the same sentence. A solution would be to extend the feature concerning reflexive pronouns by the check for the subject role of the other markable. However, this feature does not perform well and thus has been ignored in the final feature set. The most interesting issue were link errors concerning substring matches. In the case of compound words, two new predicates have been introduced, however, the combination of compound words and inflectional suffixes cannot be handled yet. The German personal pronoun es is very often non-referring. However, it has always the state of a markable. A way of solving this was to check whether a given personal pronoun differs from es. Common nouns like Mark (a former German currency), Prozent, Meter or Sekunden usually do not corefer with other markables that string match. A simple link feature that checks for such a keyword and returns TRUE, if the markables’s heads are different, does not perform well enough. A better solution was to code this feature as a prefilter feature that excludes all links containing such keywords. The alias feature in the inital feature set produces few false positives but no real true positive. Removing this feature slightly improves the overall results. If a markable m1 starts with the quantifier kein or jede, the succeeding markable m2 is restricted to a small set of possible NP-forms (e.g. possessive pronouns or reflexive pronouns). This most often occurs in the case of a string match. Modifying the string matching features with a condition described above, improves the results. Another problem in the false positives was the disagreement in grammatical attributes like number, gender or person which was traced back to the lack of gold annotation of attributive pronouns like ihr in the original SemEval-2010 dataset. A re-annotation with the use of heuristics could partially solve this problem. By analysing the false negatives, it became obvious that the false negative output of SUCRE was misleading. Not the coreference links that are misclassified as disreferent were printed out but links that 129 Chapter 7. Summary and conclusions come up by regarding the final cluster. In the case of reflexive pronouns, one has to take into account complex syntactic structures like with control verbs. Another issue was that markables whose heads are common nouns might corefer if the common nouns are related semantically. This cannot be captured by SUCRE as there are no external information sources like GermaNet. A last problem concerned common proper names that do not constitute the head of a markable. Here, a check, whether one markable head that is a proper name occurs anywhere in the other markable, improves the final score. In chapter 5, the features proposed in chapter 4 and in some German approaches in (2.5) were implemented and evaluated with the new baseline. Sometimes, this evaluation misled as the results were opposed to the evaluation on the final feature set. This conflict is based on the different features in the baseline feature set and in the final feature set, as different features mean different dependencies. The best performing features were selected for the final feature set. 7.2. Conclusion The final scores: As a bottom line under the results in chapter 6, one can conclude that the linguistic analyses of the false positives/negatives as well as further modifications of the feature set yield very good outcomes. The f-scores of each evaluation measure are increased in comparison to the new baseline (cf. (4.2)) and the results for SUCRE in SemEval-2010. Moreover, whereas SUCRE only wins with respect to MUC and BLANC in SemEval-2010, now, the final system is best for all four evaluation measures and thereby also outperforms TANL-1 in CEAF and B3 . However, one problem is that the MUC-Recall decreases significantly (from 74% down to 68%). A possible reason for this deterioration might be the fact, that most new features are based on the analysis of false positives and thereby try to create new constraints for coreference. On the other hand, the MUC-Precision increases much more than the recall decreases, in particular from 48.1% up to 58.2%. Similar trade-offs for precision and recall are also given with B3 and BLANC. The reason for this kind of improvement is based on the fact that it is more simple to find restrictions for coreference resolution (i.e. indicators for disreference) than characteristics/indicators for coreference. SUCRE architecture: The benefit of the architecture of SUCRE in this feature research was considerable. By the use of the relational database model and the feature definition language, it was possible to implement fine-grained features that can model certain linguistic patterns and thereby improve the quality of features. However, the expressiveness of the definition language and also the set of attributes for words or markables could be extended further (see (7.3) for future work). Fewer features: A further result of the feature engineering within this diploma thesis is the decrease of the number of features. In the initial configuration, the number of features is 40, in the new baseline there are 37 features and now in the final feature set, the number of features is 25. Keeping in mind that the feature values constitute the input for the decision tree classifier, the runtime for the training step increases exponentially with the number of link features. Table 7.1 shows the runtime1 comparison for the initial configuration, the new baseline and the final feature set. While the runtime for the test step is roughly constant, there is a significant difference between 40 features 1 An 130 IMS-server (“koenigspinguin”) was used with 4 x Dual-Core Opteron 8216, 2,4 GHz and 32 GByte memory 7.3. Future work in the initial feature set and 25 features in the final feature set. Further investigations in the feature set will be more comfortable resp. with respect to time, more research is possible. Training Test Total Runtime comparison Initial configuration New baseline 40 features 37 features 4093 seconds 1953 seconds ≈ 68.21 minutes = 32.55 minutes 557 seconds 541 seconds ≈ 9.28 minutes ≈ 9 minutes 4650 seconds 2494 seconds = 77.5 minutes ≈ 41.6 minutes Final feature set 25 features 496 seconds ≈ 8.3 minutes 519 seconds ≈ 8.7 minutes 1015 seconds ≈ 16.9 minutes Table 7.1.: Runtime comparison Moreover, Kobdani et al. (2010) argue that “removing redundant features often improves classification accuracy. Smaller feature sets also can be more robust and cut down classification times.” Furthermore, a smaller feature set provides a clearer overview about each feature and its contribution to the final score. As the link features show dependencies to others, pruning features avoids too many dependencies. Annotation quality: The quality of annotation is not perfect in the used TüBa-D/Z dataset (cf. (4.3.10)). By improving the annotation (e.g. for attributive possessive pronouns like ihr) the performance of the system improves considerably. With respect to the improvement steps no. 10 (i.e. modifying the word table and the number attribute, see table 6.1), the performance gains 0.6%. So, one can conclude that the quality of annotation is important for a good performance. 7.3. Future work Although the system’s performance has been improved significantly, the result is certainly not optimal yet. There are several areas in which further work should be done. This section proposes four issues to be further improved resp. investigated. As this way of doing feature engineering shows a great effect on the performance of SUCRE, the feature engineering using linguistic analysis should be continued. There are a lot of linguistic phenomena that cannot be captured because of a lack of enough expressivity of the feature definition language or external knowledge, if it should be used (cf. closed settings in SemEval2010). Moreover, the system of SUCRE shows some problems that have to be solved in order to get reliable data. Pseudo language/Relational database model: The feature definition language and the relational database model are not flexible enough for capturing all linguistic phenomena in German. Subsequently, some extensions are proposed and exemplified by some linguistic problems that occured in the course of of this study. • Hartrumpf (2001) uses a unification mechanism instead of the identity of feature values. A drawback in SUCRE’s identity check is the dealing with the feature value unknown. If one wants to check the agreement of two markables in terms of gender, the setting of unknown has to be excluded. For instance the prefilter feature {(m1h.f2!=m2h.f2)&&(m1h.f2!=f2.unknown)&&(m2h.f2!=f2.unknown) &&((m1h.f0==f0.P∼)} 131 Chapter 7. Summary and conclusions checks whether the genders of two markables mismatch. Such a mismatch does not include the value unknown, i.e. female mismatches male but female does not mismatch unknown. If the equality predicate ==/2 resp. inequality predicate !=/2 is redefined in a unification check, say =u /2 resp. !=u /2, such that unknown is unifiable with any other value, the resulting prefilter feature would be: {(m1h.f2!=u m2h.f2)&&((m1h.f0=u f0.P∼)}. • In the analysis of false positives, one insoluble problem was the case where m1 is compatible with m2 but precedes the right antecedent (cf. (4.3.3)). Due to the local perspective of link features that is based on markable pairs, this problem remains unsolvable. A possible solution might be the extension from markable pairs to a triple containing m1 , m2 and a set of markables M that are inbetween. A possible link feature could be one that uses the markable m3 that is universally quantified over M . So one can check for incompatibility between m2 and m3 . This or other solutions for this problem of “relative proximity in context” can be researched in future work. • As morphology is very predominant in German (other than in English), a good way of defining strings is necessary. SUCRE provides the predicates seqmatch/2 and strmatch/2 in a case-sensitive and case-insensitive version, which check for exact string match or substring match between two words. However, these predicates are not expressive enough for German morphology. In order to capture valid compound words, during this study, the predicates bswitch/2 and eswitch/2 (case-sensitive and case-insensitive) have been introduced for checking if one string starts or ends with the other. Now one can vote for a coreference with the markable pair <Schäferhundm1h , Hundm2h > and for a disreference with the markable pair <Hundefängerm1h , Hundm2h >, although both markable pairs show a substring match. However, these two predicates are still not expressive enough. Consider the markable pair <Bürgerombudsmannsm1h , Ombudsmannm2h >. Here, the head of m1 is the compound word Bürgerombudsmann with the inflectional suffix s. This suffix prevents a positive response for the check whether the head of m1 ends with the head of m2 (i.e. eswitchlc(m1h,m2h)). A possible solution for this might be a string concatenation predicate +c /2 which concatenates two strings to a new one which functions as predicate input. This way, the markable pair <Bürgerombudsmannsm1h ,Ombudsmannm2h > could be captured by the expression: eswitchlc(m1h,m2h+c s). An alternative solution could be the general introduction of a lemmatized version of each word in the word table. • Although the markable pair <(derm1b Rennfahrerm1h Michael Schumacher, der sieben mal Weltmeister wurdem1e ), (derm2b Weltmeisterm2h Michael Schumacher, der 1969 geboren wurdem2e )> obviously contain two coreferent markables, SUCRE cannot handle this, as the markables’ heads do not show a match. There might be two ways of solving this drawback. On the one hand, it becomes obvious, that three “pointers” (i.e. first word, head word and last word) are not enough for large noun phrases. So an extension of the possible “pointers” in a markable could yield a solution. For instance, if the markable head is a common noun and the succeeding word is a proper name, it should be pointed to by a name-word-pointer m1n, otherwise m1n is identical to the head m1h. Thereby, a check of exact string match between m1n and m2n would yield a positive response, whereas a simple head match check would deny coreference. On the other hand, a more flexible solution is the introduction of an existential bounded variable, say ∃X. This means, a variable that corresponds to a string or word object and is 132 7.3. Future work bounded within the respective feature. A possible solution for the problem sketched above might be a feature like this: {seqmatch(X,m1a)&&seqmatch(X,m2a)&&(X.f0==NE)}. This feature checks whether there is common word X in both markables and whether this common word is a proper name. The use of bounded variables could also be used in the context of lists. • A further extension for the pseudo language is the introduction of lists. This enables more readable features in the cases, several options have to be checked. For instance, consider valid case suffixes for German nouns. The new introduced substring feature checks for three suffixes: (eswitch(m1h,s)||eswitch(m1h,es)||eswitch(m1h,e)). If the feature definition language is extended with bounded variables, a list structure and a member predicate member/2, this check can be simplified to: (eswitch(m1h,X)&& member(X,[s,es,e])). Another task for which the usage of lists is helpful would be the check for a control verb (cf. example (36d) in (4.4.1)). For instance, given the markable pair <Peterm1 ,sichm2 >, where Peter constitutes the object of a Verb V , one can check whether there is a dependency relation between V and the main verb governing sich. If this is true, a check whether V is in a list of object control verbs [bitten, überreden, . . . ], could indicate a confident coreference between Peter and sich. The prefilter feature concerning units, month names and the like could be simplified enormously. An alternative to the final prefilter feature no. 1 could be: seqmatchlc(m1h,X)&& member(X,[Mark, Meter, Prozent, Sekunden, ...]). Morever, this way, the list of keywords can be augmented easier. A possible list could be expressed by: m1a.f0, i.e. a list of part-of-speech tags corresponding to each word in m1 . Then, one can check whether a markable contains a proper name somewhere, rather than just checking the head for a proper name. • Another way for making a feature set clearer is to use macros, that could be defined in another file. This way, complex features become readable, typing errors (i.e. missing parentheses) can be prevented and thus, creating complex features would be encouraged. • A drawback of the relational database model is that there is no way to check the context of a markable. For instance, the words before or after a markable. Broscheit et al. (2010b) implement a feature that checks whether a markable (or both markables) is (are) inside a quoted speech. A possible way for checking such a situation could be an examination within a window of n words before and after the markable, whether there is a quotation mark. The benefit of this feature is an appropriate check for person match. If there is a shift from direct speech to indirect speech or vice versa, a person mismatch does not indicate disreference. For implementing such a feature, the relational database model has to provide information about the context of a markable, i.e. what n tokens are before and after the markable. Problems in the annotation of the dataset: One issue that caused trouble was the annotation of the used dataset of TüBa-D/Z. One case was the markable annotation of non-referring markables like the es-pronoun (cf. (4.3.6)), which contradicts the idea of true markables. As there are no clues for deciding whether an es-pronoun is referring or not (given the local perspective and the current annotations), a revised annotation addressing the question of pronoun reference may help. However, the most problematic case was the annotation of attributive possessive pronouns like sein∼ or ihr∼. Here, every grammatical attribute was set to unknown as there are no gold annotation for them in the original SemEval-2010 task dataset (cf. (4.3.10)). Moreover, the annotation of attributive pronouns in SemEval-2010 grammatically agrees with the corresponding NP-head, 133 Chapter 7. Summary and conclusions rather than with the antecedent. Thus, an annotation of this kind would be useless, if not counterproductive, given that the attributes of the corresponding NP-head and of the antecedent contradict. One way of solving this problem is to use an heuristic like: the antecedent of ihr∼ can only be singular/female or plural. This works fine for the possessive pronoun ihr∼ but not for the pronoun sein∼, for which the antecedent has to be singular/(male or neuter) (cf. feature Agree3 in (5.1.8)). A future work could investigate this issue or find further solutions for the annotation problem. Using of external sources of knowledge: As described in (4.4.2), sometimes, it is important to know some semantic relations between two common nouns, in order to decide whether they could be coreferent or not. This requires external knowledge, for instance a knowledge base for ontological information like in GermaNet. Given a markable pair <dem Tierm1 , dem Hundm2 >, there is no linguistic basis for regarding m1 and m2 as coreferent. However, a feature that returns TRUE for a check of hyponymy or synonymy, could provide evidence for coreference. Inside the SUCRE system: Apart from the relational database model and the feature definition language there are further issues concerning the SUCRE system in general. • Klenner and Ailloud (2009) show that a good clustering algorithm is important for coreference resolution with a mention-pair model. They use an ILP (Integer Linear Programming) framework that enables to use global constraints such as clause-boundedness (cf. (2.5.4)) which is not possible with best-first clustering. Thus, a further improvement of SUCRE might be the use of such an ILP framework or any other clustering method that provides the inclusion of global constraints. • The MUC-B3 -result of SUCRE in SemEval-2010, German, closed, gold is 67.9%. However, the inital configuation performs worse (namely 61.78%). By removing the first three features concerning the markable distance in terms of sentences the system achieves 66.44%. This was defined as new baseline from which the feature research started. By adding the three features to the final feature set, the score of 73.06% collapses down to 64.02% (cf. (6.6)). As the distance between two markables is a very important feature, it is not plausible why a removing of these features improves the performance. This problem has to be solved, in order to get reliable information and possibly an even better result than the one achieved within this study. • A misleading output was detected within the analysis of the false negatives concerning the reflexive pronoun sich (cf. (4.4.1)). Here, the output should just show disreference links that are misclassified rather than all possible links that come up by regarding a cluster of markables. So, the output has to be moved in front of the clustering step. Then, a more precise analysis of false negatives is possible. This yields useful statistics about the distances in terms of sentences and the like which might be important for tuning constants in features. • Another issue that concerns this output problem is that there is no way for evaluating the pure classification step. Although the input of the classifier (i.e. the feature vectors) are modified within the feature research in this study, the result is evaluated by scoring the partition, that comes up after using best-first clustering (e.g. by MUC or B3 ). A more simple way for scoring the classification quality would be to count the number of true positives, false positives TP P and false negatives and compute precision ( T PT+F P ), recall ( T P +F N ) and f-score. However, for different feature sets, the sums (T P + F P + F N + T N ) are unequal. So, the output has to be revised in order to get reliable numbers for T P , F P , . . . and to be able to score the classification without the aspect of a clustering step inbetween. 134 APPENDIX A The Stuttgart-Tübingen tag set (STTS) The part of speech tag set which is used in the TüBa-D/Z corpus, listed in (STTS, 2011). Word class Adjectives Adverbs part of speech tag description example ADJA attributive adjective [das] große [Haus] ADJD adverbial/predicative adjective [er fährt] schnell; [er ist] schnell ADV adverb schon, bald, doch PAV pronominal adverb dafür, dabei, deswegen preposition; left part of a circumposition in [der Stadt] fused preposition and determiner im [Haus], [Sache] APPO postposition [ihm] zufolge APZR right part of a circumposition [von jetzt] an (in)definite article der, die, das, ein, eine APPR Adpositions Determiners APPRART ART Cardinal number CARD cardinal number zwei [Männer] Foreign material FM foreign material “A big fish” Interjection ITJ interjection mhm, ach, tja KOUI infinitival subjunction um [zu leben] KOUS finite subjunction weil, dass KON parataxis/conjunction und, oder comparative conjunction als, wie Conjunctions KOKOM Common noun NN common noun Tisch, Herr Proper name NE proper name Hans, Hamburg zur 135 Appendix A. The Stuttgart-Tübingen tag set (STTS) Word class part of speech tag description example substituting demonstrative pronoun dieser, jener attributive demonstrative pronoun jener [Mensch] substituting indefinite pronoun keiner, man attributive indefinite pronoun without determiner kein [Mensch] PIDAT attributive indefinite pronoun [die] [Brüder] PPER personal pronoun ich, er, ihm PPOSS substituting possessive pronoun meins, deiner attributive possessive pronoun mein [Buch], deine [Mutter] substituting relative pronoun [der Hund ,] der attributive relative pronoun [der Mann ,] dessen [Hund] PRF reflexive pronoun sich, einander PWS substituting interrogative pronoun wer, was PWAT attributive interrogative pronoun welche [Farbe] PWAV adverbial interrogative pronoun or relative pronoun warum, wo PTKZU “zu”-particle before infinitive zu [gehen] negation particle nicht PTKVZ verbal particle [er kommt] an PTKANT answer particle ja, nein particle with adjectives/adverbs am [schönsten] TRUNC truncation An- [und Abreise] VVFIN finite full verb [du] gehst VVIMP imperative full verb komm [!] VVINF infinite full verb gehen, ankommen VVIZU infinite full verb with “zu”-particle anzukommen VVPP full verb as past participle gegangen VAFIN finite auxiliary verb [du] bist VAIMP imperative auxiliary verb sei [ruhig !] VAINF infinite auxiliary verb werden, sein VAPP auxiliary verb as past participle gewesen VMFIN finite modal verb dürfen VMINF infinitival modal verb wollen VMPP modal verb as past participle gekonnt PDS PDAT PIS PIAT Pronouns PPOSAT PRELS PRELAT PTKNEG Particles PTKA Truncation Verbs 136 beiden APPENDIX B The pseudo language for SUCRE’s link feature definition Functions and operations for link features that are available for TüBa-D/Z corpus, listed on (IMSWikipedia, 2011): B.1. Markable keywords m1: The first markable m1 (usually the antecedent) m1b: The first word (the beginning) of markable m1 (e.g. the determiner) m1h: The head word of markable m1 (e.g. the noun) m1e: The last word (the ending) of markable m1 (usually the head or a post-nominal adjunct) m2: The second markable m2 (usually the anaphor) m2b: The first word (the beginning) of markable m2 (e.g. the determiner) m2h: The head word of markable m2 (e.g. the noun) m2e: The last word (the ending) of markable m2 (usually the head or a post-nominal adjunct) Two special markable keywords: m1a: All words of markable m1 (e.g. <determinerm1b , adjective, nounm1h/e >) ⇒ The only way to get access to other words as m1b, m1h or m1e (e.g. a pre-nominal adjective) m2a: All words of markable m2 (e.g. <determinerm2b , adjective, nounm2h/e >) ⇒ The only way to get access to other words as m2b, m2h or m2e (e.g. a pre-nominal adjective) B.2. Attributes Attributes to the markable keywords m1/m1b/m1e/m1h/m2/m2b/m2e/m2h: Parts-of-speech: Parts of speech determined by a tagger; here, STTS (appendix A) (e.g. m1h.f0) Number: The grammatical attribute number: can be unknown, singular, plural or both (e.g. m1h.f1) Gender: The grammatical attribute gender: can be unknown, neuter, female or male (e.g. m1h.f2) Case: The grammatical attribute case: can be unknown, nominative, genitive, dative or accusative (e.g. m1h.f3) 137 Appendix B. The pseudo language for SUCRE’s link feature definition Person: The grammatical attribute person: can be unknown, first, second, third (e.g. m1h.f4) Dependency relation: The relation of a markable to its mother in a dependency tree (e.g. m1h.rewtag) Token number: The token-ID of the respective word in the markable (e.g. m1h.txtpos) Sentence number: The sentence-ID of the markable (e.g. m1h.stcnum) Paragraph number: The paragraph-ID of the markable (e.g. m1h.prfnum) Document number: The document-ID of the markable (e.g. m1h.docnum) B.3. Arithmetic operations Given the integers A and B, these operations return an integer: Addition: A + B Subtraction: A − B Multiplication: A ∗ B Division: A/B B.4. Arithmetic predicates Given the integers A and B, these operations return a boolean expression (i.e. TRUE/FALSE resp. 1/0): Equality: A == B Inequality: A ! = B Greater: A > B Less: A < B Greater or equal: A >= B Less or equal: A <= B B.5. Boolean operations Given the boolean expressions A and B, these operations return a boolean expression: Equivalence: A == B Inequality: A ! = B Disjunction: A||B Conjunction: A&&B Boolean constants: 0, 1 138 B.6. Functions They have the following semantics: A 0 0 1 1 B 0 1 0 1 A == B 1 0 0 1 A!=B 0 1 1 0 A||B 0 1 1 1 A&&B 0 0 0 1 Table B.1.: A value table for the defined boolean operators Further boolean operators can be expressed in terms of the operators above: • The negation ¬A can be expressed as A == 0. • The implication A → B can be expressed as ¬A ∨ B (i.e. (A == 0)||B). B.6. Functions Functions in the SUCRE link feature definition language return integers or boolean expressions (integers with the value 1 or 0). Absolute value: abs(A) returns the absolute value of A (e.g. abs(−5) = 5) Maximum value: max(A,B) returns the maximum value of A and B (e.g. max(1, 3) = 3) Minimum value: min(A,B) returns the minimum value of A and B (e.g. min(1, 3) = 1) Exact string matching: seqmatch(str1,str2) returns TRUE if str1 and str2 are identical, otherwise FALSE (e.g. seqmatch(Hund,Hund) returns TRUE) Exact string matching; case-insenstive: seqmatchlc(str1,str2) returns TRUE if str1 and str2 are identical, independent of upper/lower case, otherwise FALSE (e.g. seqmatchlc(Ich,ich) returns TRUE) Substring matching: strmatch(str1,str2) returns TRUE if str1 is in str2 or vice versa, otherwise FALSE (e.g. strmatch(Schäferhund,hund) returns TRUE) Substring matching; case-insenstive: strmatchlc(str1,str2) returns TRUE if str1 is in str2 or vice versa, independent of upper/lower case, otherwise FALSE (e.g. strmatchlc(Schäferhund,Hund) returns TRUE) Levenshtein distance: editdist(str1,str2) returns the edit distance between str1 and str2 (e.g. editdist(Hund,Hundes) returns 2) 139 Appendix B. The pseudo language for SUCRE’s link feature definition 140 APPENDIX C A python script for computing the BLANC-score #!/usr/bin/env python # Patrick Ziering # Date: 10.02.2011 # "pairs" returns a set of non-reflexive unordered pairs for a given set def pairs(in_set): return set([tuple(sorted([i,j])) for i in in_set for j in in_set if i!=j]) # The given example markables = set(range(1,10)+["A","B","C"]) GOLD = [set(range(1,6)),set([6,7]),set([8,9,"A","B","C"])] SYS_a = [set(range(1,6)),set([6,7,8,9,"A","B","C"])] SYS_b = [set([6,7]),set(range(1,6)+[8,9,"A","B","C"])] SYS_c = [markables] SYS_d = [set([i]) for i in markables] # Definition c_links_true c_links_pred d_links_true d_links_pred of coreference/disreference link sets = set([pair for entity in GOLD for pair in pairs(entity)]) = set([pair for entity in SYS_a for pair in pairs(entity)]) = pairs(markables) - c_links_true = pairs(markables) - c_links_pred # Combinations of decisions rc, rd, wc, wd: rc = len(c_links_pred & c_links_true) rd = len(d_links_pred & d_links_true) wc = len(c_links_pred - c_links_true) wd = len(d_links_pred - d_links_true) # Creating the BLANC-scores Pc,Rc = float(rc)/(rc+wc),float(rc)/(rc+wd) F1c = (2*Pc*Rc)/(Pc+Rc) print "Pc: %f\nRc: %f\nF1c: %f"%(Pc,Rc,F1c) Pd,Rd = float(rd)/(rd+wd),float(rd)/(rd+wc) F1d = (2*Pd*Rd)/(Pd+Rd) print "Pd: %f\nRd: %f\nF1d: %f"%(Pd,Rd,F1d) print "BLANC: %f"%((F1c+F1d)/2) 141 Appendix C. A python script for computing the BLANC-score 142 APPENDIX D Upper and lower bounds / evaluation results in (Versley, 2006) Pmax Rmax always allow non-resolution head identity 100.0 54.4 same head 100.0 76.9 uniq_name 100.0 74.3 force resolution all 27.0 98.7 4gram 31.1 76.6 head identity 52.1 54.4 same_head 49.0 76.9 +agr_num 52.1 76.5 +comp_mod 56.4 71.4 uniq_name 57.1 74.3 +hard_seg(8) 64.9 68.7 +loose_seg(8) 62.8 71.1 include coreferent bridging no filter 62.3 92.5 +gwn_only 62.3 92.5 filter_ne 61.7 90.1 +gwn only 61.7 90.1 unique_mod 60.7 86.3 +segment 60.6 85.6 +num 60.6 85.6 +gwn 59.8 83.0 +syn_role 59.8 83.0 NE_semdist 59.8 83.0 +pred_arg 59.8 83.0 Pmin Rmin Perp Prec Recl 0.0 0.0 0.0 0.0 0.0 0.0 1.89 1.98 1.88 62.5 58.3 66.8 38.5 40.5 58.4 0.0 13.3 32.1 33.6 36.3 38.2 40.5 43.8 43.0 0.0 37.5 47.1 59.0 60.4 57.7 61.6 59.0 59.8 23.68 2.28 1.68 1.65 1.62 1.57 1.57 1.61 1.58 1.2 26.3 58.2 51.6 56.0 62.1 62.0 67.8 66.6 4.9 54.7 50.5 69.4 69.7 64.8 68.6 63.2 65.8 14.3 14.3 17.1 17.1 21.2 21.4 21.4 21.7 21.7 21.7 21.7 61.6 61.6 61.6 61.6 61.6 61.6 61.6 61.6 61.6 61.6 61.6 1.42 1.28 1.68 1.31 1.51 1.49 1.49 1.28 1.27 1.27 1.26 62.0 62.0 62.0 62.0 62.0 62.0 62.0 61.7 61.9 61.9 61.9 68.6 68.6 68.6 68.6 68.6 68.6 68.6 69.2 69.5 69.7 70.0 Table D.1.: Upper and lower bounds / evaluation results in (Versley, 2006) 143 Appendix D. Upper and lower bounds / evaluation results in (Versley, 2006) 144 APPENDIX E All link errors from Chapter 4 E.1. False positives E.1.1. The second markable is indefinite (57) a. b. c. d. Erst in den 70er Jahren entstanden Übersetzungen und Inszenierungen , die jedoch für die grell-grausigen aber poetischen Stücke keine überzeugenden Lösungen fanden . Da dies auch Armin Holz nicht gelungen zu sein scheint ( siehe nebenstehende Rezension ) , bleibt ( Valle-Inclan , der Exzentriker der Moderne )m1 , auch weiterhin ( ein Geheimtip )m2 . D. N. (ID: 60x61); (Calc-Prob:52) Besonders Margit Bendokat als La Tatula , Bärbel Bolle als die verstorbene Juana la Reina und Corinna Harfouch als Mari-Gaila verliehen ( der Inszenierung )m1 wichtige Glanzpunkte .[. . . ] Man hatte sich am Ende einer erfolgreichen Spielzeit von dieser letzten Premiere im Deutschen Theater , mit der übrigens Ignaz Kirchner ursprünglich sein Debüt als neues Ensemblemitglied hätte geben sollen , mehr versprochen . Kirchner wird nun erst im Herbst mit ( einer Langhoff-Inszenierung )m2 seine Arbeit aufnehmen . (ID: 218x250); (Calc-Prob:52) Das Bühnenbild von Peter Schubert erzielt mit großen Stahlkäfigen zwar Wirkung , die lachsfarbene Wandbespannung erscheint aber für die Atmosphäre ( des Stückes )m1 fast wieder zu schick .[. . . ] Besonders Margit Bendokat als La Tatula , Bärbel Bolle als die verstorbene Juana la Reina und Corinna Harfouch als Mari-Gaila verliehen der Inszenierung wichtige Glanzpunkte . Diese wenigen atmosphärischen Momente lassen ( ein dichtes , interessantes Stück )m2 erahnen , das aber in dieser Fassung weit unter dem Möglichen inszeniert scheint . (ID: 208x221); (Calc-Prob:52) Gott guckt uns nicht zu , ( der )m1 hat der Welt längst den Rücken gekehrt “, läßt der spanische Dichter Ramon del Valle-Inclan einen seiner Helden gleich zu Beginn seiner Grotske “Wunderworte “erklären .[. . . ] Und wirklich : viel kann Gott in dem kleinen galizischen Dorf trotz all der katholischen Frömmigkeit seiner Bewohner nicht verloren haben . Als die Witwe Juana la Reina plötzlich auf offener Straße stirbt und ihrem irren Sohn Laureano ( ein einträgliches Geschäft )m2 hinterläßt , möchte sich so mancher in ihrer Verwandtschaft gerne vor den Karren spannen , in dem die Verblichene ihren irren Laure- 145 Appendix E. All link errors from Chapter 4 ano von Jahrmarkt zu Jahrmarkt geschoben hatte . (ID: 74x92); (Calc-Prob:51) 146 e. Das gefiel dem Hund so gut , daß ( er )m1 unmittelbar hinter der Tür Stellung bezog und niemanden mehr durchließ . Als ( ein Bekannter des Hundehalters )m2 versuchte , die Wohnung zu räumen , wurde er gebissen und flüchtete ins Wohnzimmer zur Gattin des Besitzers . Erst die Feuerwehr konnte beide durch das Fenster befreien . (ID: 332x336); (Calc-Prob:56) f. In letzter Zeit kümmern sich die Besetzer allerdings nicht mehr sonderlich darum , ob ( eine Wohnung )m1 bewohnt ist oder nicht .[. . . ] Die Rechtslage , die so entstanden ist , spricht so ziemlich allem Hohn , was sich Juristen je ausgedacht haben : Bricht jemand in ( eine Wohnung )m2 ein , wird er in Polen , wie überall auf der Welt , mit bis zu mehreren Jahren Gefängnis bestraft . (ID: 447x458); (Calc-Prob:83) g. Helena , deren Fall inzwischen von einigen Zeitungen aufgegriffen wurde , kann bis heute nicht in ( ihre Wohnung )m1 .[. . . ] Die sonderbare Art und Weise der Hausbesetzung kommt noch aus Zeiten , als aufgrund bürokratischer Verwicklungen Sozialwohnungen leerstanden , die dann von wilden Mietern besetzt wurden . In letzter Zeit kümmern sich die Besetzer allerdings nicht mehr sonderlich darum , ob ( eine Wohnung )m2 bewohnt ist oder nicht . (ID: 427x447); (Calc-Prob:83) h. Bei der Polizei erfuhr ( die alte Dame )m1 , daß es sich bei ihrem Fall nicht um ein Vergehen handele , welches von Amts wegen verfolgt werden könne .[. . . ] Sie hat dabei noch Glück gehabt . ( Eine andere alte Dame , der gleiches widerfuhr )m2 , mußte einen Monat auf dem örtlichen Bahnhof nächtigen , sozusagen als Obdachlose . (ID: 389x432); (Calc-Prob:83) i. Doch hätte ( die )m1 nicht gezahlt , hätte Helena G. sie auch nicht rauswerfen können . Denn das darf man nur , wenn man ( eine Ersatzwohnung )m2 ... Aufgrund des gleichen Paragraphen gibt es in Warschau inzwischen Tausende kommunaler Sozialmieter , die ihre Zahlungen eingestellt haben - Kündigung droht ihnen deshalb nicht . (ID: 505x508); (Calc-Prob:52) j. Die sonderbare Art und Weise der Hausbesetzung kommt noch aus Zeiten , als aufgrund bürokratischer Verwicklungen Sozialwohnungen leerstanden , die dann von ( wilden Mietern )m1 besetzt wurden .[. . . ] Inzwischen klagt dieser beim Obersten Gerichtshof , dessen Richter vorsichtshalber auch gleich in Urlaub gefahren sind . Einer von ihnen erklärte einer Zeitung schon mal anonym , es sei durchaus rechtens , wenn man ( wilde Mieter )m2 auf eigene Faust rauswerfe . (ID: 442x492); (Calc-Prob:52) k. ( Helena , deren Fall inzwischen von einigen Zeitungen aufgegriffen wurde )m1 , kann bis heute nicht in ihre Wohnung .[. . . ] Bleibt er und klaut dabei auch noch die Wohnung , bleibt er ungeschoren bis ans Ende seiner Tage . Daß Einbruch auch strafbar ist , wenn der Einbrecher nicht mit einem Sack auf dem Rücken und einer Maske vor dem Gesicht das Weite sucht , ist ( eine Erkenntnis )m2 , E.1. False positives die auch nach Ansicht des polnischen Bürgerombudsmanns die Auffassungsgabe der polnischen Polizei bei weitem übersteigt . (ID: 425x477); (Calc-Prob:52) l. Warschau ( ( taz )m1 ) - Helena G. hatte Pech , daß es ausgerechnet sie traf .[. . . ] Als sie zurückkam , war sie ihre Wohnung los . ( Eine sogenannte wilde Mieterin )m2 hatte sich dort eingenistet , die Schlösser ausgewechselt , und so war Helena G. draußen und die Neue drin . (ID: 368x381); (Calc-Prob:52) m. ( Dieses )m1 gab ihr recht und verurteilte die wilde Mieterin dazu , die Wohnung zu verlassen .[. . . ] Der Exmissionstitel ist allerdings vergilbt , weil die städtischen Behörden , eigentlich zuständig , Helena zu ihrem Recht zu verhelfen , es ablehnten einzugreifen . Begründung : Nach polnischem Mietrecht dürfe man ( einen Mieter )m2 nur aus der Wohnung entfernen , wenn man in der Lage sei , ihm Ersatzraum zur Verfügung zu stellen . (ID: 400x412); (Calc-Prob:51) n. » Leichtfertig « ist , nach Eberhard Diepgens Ansicht , mit den Informationen über die IOC-Mitglieder umgegangen worden . So leichtfertig wie er ( das )m1 dahersagt , so freudig wurde es vom Aufsichtsrat der Marketing GmbH aufgenommen und dem Geschäftsführer ( ein Strick daraus )m2 gedreht . Ein Kopf mußte rollen , damit sich die anderen aus der Schlinge ziehen konnten . (ID: 643x648); (Calc-Prob:51) o. Mit einer Mahnwache vor der Innenverwaltung am Fehrbelliner Platz machten etwa zehn Leute gestern vormittag auf die Lage ( der Flüchtlinge aus dem Kriegsgebiet )m1 aufmerksam .[. . . ] In einer Petition forderten sie Innensenator Dieter Heckelmann ( CDU ) auf , den Visumszwang für Kriegsflüchtlinge aus allen Teilen des ehemaligen Jugoslawien in Berlin aufzuheben . » Heckelmann kann sehr wohl entscheiden , ( Flüchtlinge , die mit dem Flugzeug kommen )m2 , unbürokratisch nach Berlin einreisen zu lassen « , so Christoph Koch , Dozent an der FU . (ID: 720x733); (Calc-Prob:83) p. Senat soll Visumzwang für ( Kriegsflüchtlinge )m1 aufheben[. . . ] Mit einer Mahnwache vor der Innenverwaltung am Fehrbelliner Platz machten etwa zehn Leute gestern vormittag auf die Lage der Flüchtlinge aus dem Kriegsgebiet aufmerksam . In einer Petition forderten sie Innensenator Dieter Heckelmann ( CDU ) auf , den Visumszwang für ( Kriegsflüchtlinge aus allen Teilen des ehemaligen Jugoslawien )m2 in Berlin aufzuheben . (ID: 702x727); (Calc-Prob:83) q. » Schmerzgekrümmt « reagierte Daimler Benz auf den gestrigen taz-Bericht über die Konflikte um die geplante U-Bahn-Linie 3 . Man sei nicht gegen die U-Bahn , man würde sie im Gegenteil begrüßen , wenn ( man )m1 einen schnellen Ausbau der Linie garantiert bekomme , sagte ( eine Unternehmenssprecherin )m2 . Man wolle jedoch keine Vertagung des U-Bahn-Baus bis ins nächste Jahrtausend und auch keine Vorhalteröhre , wie es der Senat vorschlage . (ID: 869x872); (Calc-Prob:50) r. Über den sogenannten Häuslebauerparagraphen 10e werden Erwerber von Eigentumswohnungen - auch bewohnten - großzügige Steuererleichterungen eingeräumt , als ob sie 147 Appendix E. All link errors from Chapter 4 s. t. u. v. w. x. 148 ( eine neue Wohnung )m1 schaffen würden .[. . . ] Von dieser Regelung profitiert bespielsweise auch die Bundesbauministerin Irmgard Schwaetzer ( FDP ) selbst mit ihren circa 15.000 Mark monatlichen Bruttoeinkommen und ihrem nicht unerheblichen Steuersatz . Die Ministerin besaß bis Anfang 1991 ( eine Wohnung in der Bonner Riemannstraße )m2 in Bonn , mit deren Erwerb sie bis zu tausend Mark Steuern im Monat sparen konnte . (ID: 1092x1103); (Calc-Prob:83) Zwischen 60 und 70 Prozent der Vermieter , schätzt Hanka Fiedler , wollen nur den Mieter herausbekommen , um ( die Wohnung , die dann im Preis steigt )m1 , besser verkaufen zu können .[. . . ] Und dies alles wird vom Steuerzahler noch bezuschußt . Über den sogenannten Häuslebauerparagraphen 10e werden Erwerber von Eigentumswohnungen - auch bewohnten - großzügige Steuererleichterungen eingeräumt , als ob sie ( eine neue Wohnung )m2 schaffen würden . (ID: 1067x1092); (Calc-Prob:83) Eine leere Eigentumswohnung bringt hingegen unverändert zwischen 4.000 und 5.000 Mark - ( den Mieter )m1 herauszuklagen , lohnt sich da schon .[. . . ] Und anders geht es auch kaum . » ( Einen vertragestreuen Mieter )m2 kriegen Sie heutzutage nur über eine Eigenbedarfskündigung raus « , heißt es in Hausbesitzerkreisen . (ID: 1078x1081); (Calc-Prob:83) Was mit ( diesen Mietern )m1 geschieht , kann man in West-Berlin ablesen .[. . . ] Wenn die Kündigung nicht zieht , macht der Vermieter eben nächtlichen Telefonterror , schickt Bauarbeiter in Haus , kündigt Mieterhöhungen an oder veranstaltet allwöchentliche Besichtigungstouren von potentiellen Käufern durch die Wohnung . Außerdem , erzählt Frau Fiedler , gebe es häufig westdeutsche Hausbesitzer , die ( renitenten Mietern )m2 drohten , nach Berlin zu ziehen und dann eben Eigenbedarf anzumelden . (ID: 985x1042); (Calc-Prob:83) Nur zehn Prozent der umgewandelten Wohnungen sind an die dort wohnenden Mieter verkauft worden , ein Drittel der Mieter hat ( die Wohnung )m1 verlassen müssen , viele andere haben mit Kündigungsprozessen zu kämpfen .[. . . ] Viele Mieter ziehen entweder vorzeitig entnervt aus oder lassen sich auf einen Vergleich ein . Denn Vermieter , die ( eine Wohnung )m2 freibekommen wollen , lassen sich einiges einfallen , um die Mieter herauszuekeln , berichtet Frau Fiedler . (ID: 992x1022); (Calc-Prob:83) Nur zehn Prozent der umgewandelten Wohnungen sind an die dort wohnenden Mieter verkauft worden , ( ein Drittel der Mieter )m1 hat die Wohnung verlassen müssen , viele andere haben mit Kündigungsprozessen zu kämpfen .[. . . ] Denn Eigentumswohnungen werden , so der Ring Deutscher Makler in seiner neuesten Bilanz , hauptsächlich für eigene Wohnzwecke und weniger stark von Anlegern gekauft . ( Ein Drittel der Mieter )m2 muß gehen (ID: 991x1007); (Calc-Prob:83) Nun steht es still - und alle Welt wundert sich , daß ( es )m1 sich nicht von der Stelle bewegt hat . Bei der hohen Drehgeschwindigkeit - immer um den Mittelpunkt jenes 24 Milliarden Dollar schweren Hilfspakets für Rußland - ist es eher ( ein Wunder )m2 , daß eine neue Einsicht dennoch aufspringen konnte : die Erkenntnis , daß sich mit einem IWFStandardprogramm die Schwierigkeiten Rußlands nicht Schlag auf Schlag lösen lassen , E.1. False positives y. z. wie Boris Jelzin den Kohls und Bushs klarmachen konnte . Rußlands Präsident war deshalb wohl wirklich “sehr zufrieden mit dem Treffen “der G7-Regierungschefs aus den USA , Japan , der Bundesrepublik , Frankreich , Großbritannien , Italien und Kanada - auch wenn es nur die konkrete Zusage gab , die erste IWFKredittranche von einer Milliarde Dollar im August zu überweisen . (ID: 8196x8204); (Calc-Prob:50) Sieben der 24 Mannschaften werden nach ( dieser Saison )m1 ihre regionalen Oberligen verstärken .[. . . ] Eben . Wir stehen vor ( einer großen Saison )m2 . (ID: 8706x8726); (Calc-Prob:83) Er sei zwar nicht dafür , alles zu konservieren , aber » ein bißchen von ( dem )m1 , was gewesen ist , sollten wir erhalten « .[. . . ] Berlin sei schließlich der einzige Ort in der Welt , wo » die Historie an jeder Ecke noch atmet « . ( Eine Feststellung , der David Cornell wohl auch zustimmen würde )m2 . (ID: 1443x1453); (Calc-Prob:51) E.1.2. Wrong assignment of a relative pronoun (58) a. b. c. d. Der neben Garcia Lorca bedeutendste spanische Dramatiker des 20. Jahrhunderts wurde für das deutschsprachige Theater spät entdeckt . Erst in den 70er Jahren entstanden Übersetzungen und ( Inszenierungen )m1 , ( die )m2 jedoch für die grell-grausigen aber poetischen Stücke keine überzeugenden Lösungen fanden . Da dies auch Armin Holz nicht gelungen zu sein scheint ( siehe nebenstehende Rezension ) , bleibt Valle-Inclan , der Exzentriker der Moderne , auch weiterhin ein Geheimtip . (ID: 52x54); (Calc-Prob:51) Aus seinem umfänglichen dichterischen Schaffen ragen vor allem die von ihm kreierten esperpentos heraus , Schauerpossen , die die von Leidenschaft und Gewalt deformierte Gesellschaft wie durch einen Zerrspiegel betrachten . Zu ( diesem Genre )m1 gehören neben den Wunderworten ( ( die )m2 im Original als Tragikomödie untertitelt ist ) auch die Dramen Glanz der Boheme und die Trilogie Karneval der Krieger . Es sind sperrige , sprachgewaltige Grotesken , die Mystik und Mythen karikieren und eine erhebliche Fortschrittsskepsis ausdrücken . (ID: 32x33); (Calc-Prob:51) Und so mischten sich beim nicht gerade enthusiastischen Schlußapplaus , als Armin Holz zu seinen Schauspielern auf die Bühne kam , unter die wenigen Bravo-Rufe auch lautstarke Unmutsbekundungen . Man hatte sich am Ende einer erfolgreichen Spielzeit von dieser letzten Premiere im ( Deutschen Theater )m1 , mit ( der )m2 übrigens Ignaz Kirchner ursprünglich sein Debüt als neues Ensemblemitglied hätte geben sollen , mehr versprochen . Kirchner wird nun erst im Herbst mit einer Langhoff-Inszenierung seine Arbeit aufnehmen . (ID: 242x243); (Calc-Prob:51) Bonn ( dpa ) - Unter dem Motto “Volkswagen für Volksvertreter “hat der Jenaer SPDBundestagsabgeordnete Christoph Matschie eine Umrüstung der Fahrbereitschaft des Bundestages gefordert . Die knapp 100 Autos umfassende Flotte bestehe fast ausschließlich aus BMW- und 149 Appendix E. All link errors from Chapter 4 e. f. g. h. i. j. 150 ( Mercedes-Limousinen )m1 , ( die )m2 durch umweltverträglichere Fahrzeuge ersetzt werden sollten . Um ein Signal für umweltbewußtes Handeln zu setzen , sollten die Abgeordneten auf den Diesel-Golf umsteigen , der 5,5 Liter Kraftstoff auf 100 Kilometer benötige . (ID: 307x309); (Calc-Prob:51) An diesem Zustand hat sich seither nichts geändert . Bei der Polizei erfuhr die alte Dame , daß ( es )m1 sich bei ihrem Fall nicht um ein Vergehen handele , ( welches )m2 von Amts wegen verfolgt werden könne . Helena begab sich zu Gericht . (ID: 390x394); (Calc-Prob:53) Denn das darf man nur , wenn man eine Ersatzwohnung ... Aufgrund des gleichen Paragraphen gibt es in Warschau inzwischen Tausende ( kommunaler Sozialmieter )m1 , ( die )m2 ihre Zahlungen eingestellt haben - Kündigung droht ihnen deshalb nicht . Die Stadt hat inzwischen sogar schon private Schuldenjäger beauftragt , die Mieten einzutreiben . (ID: 512x514); (Calc-Prob:51) Die Realität noch viel grauenerregender , als es sich der wildeste Hammerwerfer in seinen perversesten Träumen ersinnen könnte ? Die Briten Vyv Simson und Andrew Jennings haben ihre Version vom olympischen Märchen unter dem Titel Geld , Macht und ( Doping )m1 auch schon als Buch veröffentlicht , ( was )m2 dem IOC-Präsidenten Juan Antonio Samaranch wenig gefallen hat . Der 71jährige Ober-Olympier kommt aber auch wirklich nicht gut weg : Nicht nur , daß er die ihm anvertrauten Coubertinschen Ideale verraten und auf ihren Trümmern ein Wirtschaftsunternehmen aufgebaut hat , kreiden ihm die Briten an . (ID: 552x554); (Calc-Prob:51) Der Aufsichtsrat der Olympia Marketing GmbH ernannte ihn am Donnerstag abend überraschend zum Geschäftsführer der Firma . ( Er )m1 löst auf dem Posten Nikolaus Fuchs ab , ( der )m2 wegen der Intimdatei über IOC-Mitglieder seinen Hut nehmen mußte . Fuchs ist geschäftsführender Gesellschafter der Bossard Consultants , die die Datenmasken als » Spielmaterial « für die Olympia GmbH gefertigt hatte . (ID: 787x790); (Calc-Prob:53) Kürzungspläne des Senats , etwa die angedachte und schließlich verworfene Schließung der Deutschen Oper , würden nur das Angebot mindern und wirkten sich » schädigend auf den Tourismus aus « . Abseits aller kulturellen Angebote hält Berlin für Busch-Petersen , ( der )m1 bis 1989 » hinter der Mauer ausgehalten hat « , einen gewaltigen Trumpf in der Hand , wie er erst kürzlich wieder feststellen konnte : ( Ein schwedischer Gast , den er durch den Ostteil führte )m2 , sei von den Einschußlöchern an den Häusern » ganz fasziniert « gewesen . Er sei zwar nicht dafür , alles zu konservieren , aber » ein bißchen von dem , was gewesen ist , sollten wir erhalten « . (ID: 1430x1439); (Calc-Prob:53) Kürzungspläne des Senats , etwa die angedachte und schließlich verworfene Schließung der Deutschen Oper , würden nur das Angebot mindern und wirkten sich » schädigend auf den Tourismus aus « . Abseits aller kulturellen Angebote hält Berlin für Busch-Petersen , ( der )m1 bis 1989 » hinter der Mauer ausgehalten hat « , ( einen gewaltigen Trumpf )m2 in der Hand , wie er erst kürzlich wieder feststellen konnte : Ein schwedischer Gast , den er durch den Ostteil führte , sei von den Einschußlöchern an den Häusern » ganz fasziniert « gewesen . E.1. False positives k. l. m. n. o. Er sei zwar nicht dafür , alles zu konservieren , aber » ein bißchen von dem , was gewesen ist , sollten wir erhalten « . (ID: 1430x1433); (Calc-Prob:53) Sabine Schröder , Sprecherin der Hotel- und Gaststätteninnung , glaubt , daß die Mauer » ein Selbstläufer « gewesen und in den vergangenen zwei Jahren zu wenig in die Werbung für Berlin investiert worden sei . Hoffnung setzt ( sie )m1 auf die geplante » Tourismus GmbH « , ( die )m2 noch in diesem Jahr eingerichtet werden soll . Hauptstütze wird mit 60 bis 70 Prozent die Privatwirtschaft sein - den Rest trägt der Senat . (ID: 1308x1309); (Calc-Prob:51) Mit dem routinierten Blick auf die Uhr , » es ist 1.30 h und der Flug zum nächsten Festival startet bereits in 5 Stunden « - energiegeladen , unaufhörlich gute Laune verbreitend , selbst wenn seine Witze kaum Eckkneipenniveau erreichen , lädt Paquito in der Tradition der Afro-Cuban-Entertainment-Schule seines Altmeisters Dizzy Gillespie zum Abend der offenen Tür : pure Kommunikation der wirksamen Art . Im Gepäck hat er den 22jährigen Pianisten Ed Simon , ( der )m1 ( einem verliebten Jungen )m2 gleicht und mit flüchtigen Seitenblicken auf seinen Mentor den Club zum brodeln bringt . » Ich bin sehr jung und glaube , daß ich mich noch selbst finden muß « , sagt der introvertierte Tastenromantiker venezuelanischer Herkunft , der zugleich Mitglied von Bobby Watson’s Post-Motown Bop Band Horizon ist , Herbie Mann begleitet und mit Kevin Eubanks oder der M-Base Gruppe um Greg Osby funkt , » je unterschiedlicher die Musik ist , die ich mache , desto offener werde ich « . (ID: 1563x1564); (Calc-Prob:53) Mit dem routinierten Blick auf die Uhr , » es ist 1.30 h und der Flug zum nächsten Festival startet bereits in 5 Stunden « - energiegeladen , unaufhörlich gute Laune verbreitend , selbst wenn seine Witze kaum Eckkneipenniveau erreichen , lädt Paquito in der Tradition der Afro-Cuban-Entertainment-Schule seines Altmeisters Dizzy Gillespie zum Abend der offenen Tür : pure Kommunikation der wirksamen Art . Im Gepäck hat ( er )m1 den 22jährigen Pianisten Ed Simon , ( der )m2 einem verliebten Jungen gleicht und mit flüchtigen Seitenblicken auf seinen Mentor den Club zum brodeln bringt . » Ich bin sehr jung und glaube , daß ich mich noch selbst finden muß « , sagt der introvertierte Tastenromantiker venezuelanischer Herkunft , der zugleich Mitglied von Bobby Watson’s Post-Motown Bop Band Horizon ist , Herbie Mann begleitet und mit Kevin Eubanks oder der M-Base Gruppe um Greg Osby funkt , » je unterschiedlicher die Musik ist , die ich mache , desto offener werde ich « . (ID: 1561x1563); (Calc-Prob:53) In erster Instanz hatte das Amtsgericht Kreuzberg das Räumungsbegehren im März abgewiesen . In der Zwischenzeit war ein Grundsatzurteil des Bundesverfassungsgerichts für Bauherrenmodelle ergangen , laut dem ein Untermieter , ( dem )m1 eine Wohnung von ( einem gewerblichen Zwischenmieter )m2 vermietet wurde , vollen Kündigungsschutz genießt . Das Amtsgericht erkannte an , daß diese Rechtsauffassung für sämtliche gewerbliche Untermietsverhältnisse , die Wohnmietverhältnisse sind , gelte . (ID: 11545x11547); (Calc-Prob:53) In erster Instanz hatte das Amtsgericht Kreuzberg das Räumungsbegehren im März abgewiesen . In der Zwischenzeit war ein Grundsatzurteil des Bundesverfassungsgerichts für Bauherrenmodelle ergangen , laut ( dem )m1 ein Untermieter , dem ( eine Wohnung )m2 von 151 Appendix E. All link errors from Chapter 4 p. q. r. s. einem gewerblichen Zwischenmieter vermietet wurde , vollen Kündigungsschutz genießt . Das Amtsgericht erkannte an , daß diese Rechtsauffassung für sämtliche gewerbliche Untermietsverhältnisse , die Wohnmietverhältnisse sind , gelte . (ID: 11544x11546); (Calc-Prob:52) In erster Instanz hatte das Amtsgericht Kreuzberg das Räumungsbegehren im März abgewiesen . In der Zwischenzeit war ein Grundsatzurteil des Bundesverfassungsgerichts für Bauherrenmodelle ergangen , laut ( dem )m1 ( ein Untermieter , dem eine Wohnung von einem gewerblichen Zwischenmieter vermietet wurde )m2 , vollen Kündigungsschutz genießt . Das Amtsgericht erkannte an , daß diese Rechtsauffassung für sämtliche gewerbliche Untermietsverhältnisse , die Wohnmietverhältnisse sind , gelte . (ID: 11544x11548); (Calc-Prob:52) Und in der Tat läßt sich schwer vorstellen , wie das bisherige Land Brandenburg als umgebender Rand einer wachsenden Metropole eine eigenständige Existenz weiterführen könnte . Auch für die Stadt dürfte eine Konstruktion , ( die )m1 ( eine einheitliche Planung für den Stadtkern , den Stadtrand , die nähere und die weitere Umgebung )m2 möglich macht , im wesentlichen Vorteile haben . Überhaupt nicht bedacht wurde bisher das Binnenverhältnis , das sich bei einer einfachen Zusammenlegung der beiden bisherigen Bundesländer zwischen der Metropole Berlin und dem neuen Bundesland Brandenburg-Berlin ergeben würde . (ID: 11728x11732); (Calc-Prob:53) Ein » Zweckverband Berlin und Umland « ist auf die Dauer in Sachen Verkehr , Energieversorgung , Wohnungsbau und Wirtschaftsplanung in jedem Fall erforderlich . Wenn die 17 äußeren Bezirke des bisherigen Berlins nicht zu Berlin , sondern mit Potsdam , Nauen , Oranienburg , Bernau , Strausberg , Königs Wusterhausen , Zossen und anderen zum Umland zählten und entsprechenden Einfluß hätten , wäre das den Interessen aller Bürgerinnen und Bürger bestimmt dienlicher als die heutige Konstruktion , ( die )m1 gegenüber den Gemeinden des Umlandes nur auf ( ein Diktat der Metropole )m2 hinauslaufen würde . Wenn die Ausdehnung der Stadtgebiete und die Zentralisierung der Verwaltungen Kennzeichen des Fortschritts sind , warum sind dann Offenbach und Hanau , Rüsselsheim und Eschborn , Kronberg und Oberursel noch nicht längst in Frankfurt eingemeindet ? (ID: 12076x12080); (Calc-Prob:52) Es gibt keine gesellschaftliche Kraft , die die Vorteile eines gemeinsamen Landes nicht herausstellt . Allein schon vom Blick auf die Landkarte , in ( der )m1 die große Stadt Berlin mitten im Land Brandenburg liegt , macht überdeutlich , daß es in Zukunft ( eine sehr enge wirtschaftliche , verkehrsmäßige , kulturelle , bildungsmäßige Verflechtung der beiden Länder )m2 geben wird . Dies ist auch aus alternativer Sicht wünschenswert . (ID: 12130x12138); (Calc-Prob:53) E.1.3. Relative proximity in context (59) 152 a. Gott guckt uns nicht zu , der hat der Welt längst den Rücken gekehrt “, läßt der spanische Dichter Ramon del Valle-Inclan einen seiner Helden gleich zu Beginn ( seiner )m1 Grotske “Wunderworte “erklären . Und wirklich : viel kann Gott in dem kleinen galizischen Dorf trotz all der katholischen E.1. False positives b. c. d. e. f. Frömmigkeit ( seiner )m2 Bewohner nicht verloren haben . Als die Witwe Juana la Reina plötzlich auf offener Straße stirbt und ihrem irren Sohn Laureano ein einträgliches Geschäft hinterläßt , möchte sich so mancher in ihrer Verwandtschaft gerne vor den Karren spannen , in dem die Verblichene ihren irren Laureano von Jahrmarkt zu Jahrmarkt geschoben hatte . (ID: 80x85); (Calc-Prob:83) Und so mischten sich beim nicht gerade enthusiastischen Schlußapplaus , als Armin Holz zu ( seinen )m1 Schauspielern auf die Bühne kam , unter die wenigen Bravo-Rufe auch lautstarke Unmutsbekundungen . Man hatte sich am Ende einer erfolgreichen Spielzeit von dieser letzten Premiere im Deutschen Theater , mit der übrigens Ignaz Kirchner ursprünglich ( sein )m2 Debüt als neues Ensemblemitglied hätte geben sollen , mehr versprochen . Kirchner wird nun erst im Herbst mit einer Langhoff-Inszenierung seine Arbeit aufnehmen . (ID: 232x245); (Calc-Prob:53) Als habe ( er )m1 von Anfang an klarstellen wollen , daß hier lediglich eine “Schauperposse “zu sehen ist , läßt Armin Holz den Figuren des Stücks gar keine Gelegenheit , sich erst einmal sinnhaft vorzustellen .[. . . ] Bernd Stempel als Pedro Gailo muß in seinem viel zu kleinen Anzug und einem zweifarbigen Rauschebart über die Bühne stelzen und sich von der überzeichneten Zuhälterfigur Septimo ( Ulrich Haß ) verhöhnen lassen , seine Frau Mari-Gaila und Schwester Marica agieren gleich zu Beginn so exaltiert , daß einem der später so wüst entbrennende Streit über den lukrativen Karren nicht mehr bizarr , sondern nur noch folgerichtig erscheinen kann . So kann die Groteske nicht zu ( ihrer )m2 Wirkung kommen , kann nichts von der Form ins Formlose umschlagen , vom Maßvollen ins Sinnlose kippen . (ID: 156x179); (Calc-Prob:53) In der Inszenierung vn Armin Holz in den Kammerspielen des Deutschen Theaters scheint dieser bittere Kern hinter viel Regie-Schnickschnack wieder zu verschwinden . Als habe ( er )m1 von Anfang an klarstellen wollen , daß hier lediglich eine “Schauperposse “zu sehen ist , läßt Armin Holz den Figuren des Stücks gar keine Gelegenheit , ( sich )m2 erst einmal sinnhaft vorzustellen . Bernd Stempel als Pedro Gailo muß in seinem viel zu kleinen Anzug und einem zweifarbigen Rauschebart über die Bühne stelzen und sich von der überzeichneten Zuhälterfigur Septimo ( Ulrich Haß ) verhöhnen lassen , seine Frau Mari-Gaila und Schwester Marica agieren gleich zu Beginn so exaltiert , daß einem der später so wüst entbrennende Streit über den lukrativen Karren nicht mehr bizarr , sondern nur noch folgerichtig erscheinen kann . (ID: 156x163); (Calc-Prob:52) Und wirklich : viel kann Gott in dem kleinen galizischen Dorf trotz all der katholischen Frömmigkeit seiner Bewohner nicht verloren haben . Als ( die Witwe Juana la Reina )m1 plötzlich auf offener Straße stirbt und ihrem irren Sohn Laureano ein einträgliches Geschäft hinterläßt , möchte ( sich )m2 so mancher in ihrer Verwandtschaft gerne vor den Karren spannen , in dem die Verblichene ihren irren Laureano von Jahrmarkt zu Jahrmarkt geschoben hatte . Die Totenklage der trauernden Familie mischt sich dann auch schnell mit munteren Jubelgesängen . (ID: 88x93); (Calc-Prob:51) Das gefiel dem Hund so gut , daß ( er )m1 unmittelbar hinter der Tür Stellung bezog und niemanden mehr durchließ . 153 Appendix E. All link errors from Chapter 4 Als ein Bekannter des Hundehalters versuchte , die Wohnung zu räumen , wurde ( er )m2 gebissen und flüchtete ins Wohnzimmer zur Gattin des Besitzers . Erst die Feuerwehr konnte beide durch das Fenster befreien . (ID: 332x338); (Calc-Prob:83) 154 g. Hamburg ( ap ) - Ein zwei Jahre alter Schäferhund namens “Prinz “hat im Hamburger Stadtteil Altona eine Wohnung besetzt . ( Der 24jährige Besitzer )m1 hatte dem Tier am Vortag ( sein )m2 zukünftiges Heim gezeigt . Das gefiel dem Hund so gut , daß er unmittelbar hinter der Tür Stellung bezog und niemanden mehr durchließ . (ID: 325x328); (Calc-Prob:51) h. Bei der Polizei erfuhr die alte Dame , daß es sich bei ( ihrem Fall )m1 nicht um ein Vergehen handele , welches von Amts wegen verfolgt werden könne .[. . . ] Da die Stadt keinen habe , dürfe sie das Urteil auch nicht exekutieren . Helena , ( deren Fall )m2 inzwischen von einigen Zeitungen aufgegriffen wurde , kann bis heute nicht in ihre Wohnung . (ID: 393x423); (Calc-Prob:83) i. Daß Einbruch auch strafbar ist , wenn der Einbrecher nicht mit einem Sack auf dem Rücken und einer Maske vor dem Gesicht das Weite sucht , ist eine Erkenntnis , die auch nach Ansicht des polnischen Bürgerombudsmanns die Auffassungsgabe der polnischen Polizei bei weitem übersteigt . Inzwischen klagt ( dieser )m1 beim Obersten Gerichtshof , ( dessen )m2 Richter vorsichtshalber auch gleich in Urlaub gefahren sind . Einer von ihnen erklärte einer Zeitung schon mal anonym , es sei durchaus rechtens , wenn man wilde Mieter auf eigene Faust rauswerfe . (ID: 484x485); (Calc-Prob:53) j. ( Sie )m1 hat dabei noch Glück gehabt .[. . . ] Eine andere alte Dame , der gleiches widerfuhr , mußte einen Monat auf dem örtlichen Bahnhof nächtigen , sozusagen als Obdachlose . Dann starb ( sie )m2 dort . (ID: 428x435); (Calc-Prob:53) k. Da die Stadt keinen habe , dürfe ( sie )m1 das Urteil auch nicht exekutieren .[. . . ] Helena , deren Fall inzwischen von einigen Zeitungen aufgegriffen wurde , kann bis heute nicht in ihre Wohnung . ( Sie )m2 hat dabei noch Glück gehabt . (ID: 420x428); (Calc-Prob:53) l. Bei der Polizei erfuhr die alte Dame , daß es sich bei ihrem Fall nicht um ( ein Vergehen handele , welches von Amts wegen verfolgt werden könne )m1 .[. . . ] Helena begab sich zu Gericht . ( Dieses )m2 gab ihr recht und verurteilte die wilde Mieterin dazu , die Wohnung zu verlassen . (ID: 396x400); (Calc-Prob:52) m. Doch hätte die nicht gezahlt , hätte ( Helena G. )m1 sie auch nicht rauswerfen können .[. . . ] Denn das darf man nur , wenn man eine Ersatzwohnung ... Aufgrund des gleichen Paragraphen gibt es in Warschau inzwischen Tausende kommunaler Sozialmieter , die ( ihre )m2 Zahlungen eingestellt haben - Kündigung droht ihnen deshalb nicht . (ID: 506x515); (Calc-Prob:51) E.1. False positives n. Aufgrund des gleichen Paragraphen gibt es in Warschau inzwischen Tausende kommunaler Sozialmieter , die ihre Zahlungen eingestellt haben - Kündigung droht ( ihnen )m1 deshalb nicht .[. . . ] Die Stadt hat inzwischen sogar schon private Schuldenjäger beauftragt , die Mieten einzutreiben . Jemanden vor die Tür setzen dürfen ( die )m2 allerdings auch nicht . (ID: 518x523); (Calc-Prob:50) o. Als ob das irgend etwas mit Sport zu tun hätte , daß Hans Anton ein paar Jährchen in faschistischer Uniform rumgekaspert ist und ( seine )m1 Briefe mit “es grüßt mit erhobenem Arm “unterschrieben hat . Und die Geschichte mit dem inzwischen verblichenen Adidas-Chef Horst Dassler , der als erster erkannt hatte , daß man umso mehr Turnschuhe verkauft , je größer der Einfluß auf das IOC ist , und er deshalb ( seinen )m2 Spezl Juan 1980 an die Spitze des Vereins boxte ? Na bittschön , Vollbeschäftigung in Herzogenaurach , und irgendwelche Schuhe müssen die Sportler ja anziehen ! (ID: 574x583); (Calc-Prob:53) p. Mit ( ihrem )m1 neuen Geschäftsführer steht die privatwirtschaftliche Marketing GmbH nunmehr gleichgewichtig neben der öffentlich-rechtlichen Olympia GmbH . An finanzieller Potenz und Aktionsradius wird sie ( ihr )m2 bald überlegen sein . Damit vollzieht sich , was Geschäftsführer Nawrocki immer wollte : Die Unternehmerschaft wird zunehmend das Sagen haben , die öffentliche Hand hat allenfalls für die notwendige Infrastruktur zu sorgen . (ID: 673x680); (Calc-Prob:53) q. Der Fuchs hat seine Schuldigkeit getan , ( der Fuchs )m1 kann gehn .[. . . ] » Leichtfertig « ist , nach Eberhard Diepgens Ansicht , mit den Informationen über die IOC-Mitglieder umgegangen worden . So leichtfertig wie ( er )m2 das dahersagt , so freudig wurde es vom Aufsichtsrat der Marketing GmbH aufgenommen und dem Geschäftsführer ein Strick daraus gedreht . (ID: 638x642); (Calc-Prob:52) r. Daß ( Nawrocki )m1 von dieser bigotten Inszenierung profitiert , ist weder sein Verdienst noch von ihm gewollt . Mit ( ihrem )m2 neuen Geschäftsführer steht die privatwirtschaftliche Marketing GmbH nunmehr gleichgewichtig neben der öffentlich-rechtlichen Olympia GmbH . An finanzieller Potenz und Aktionsradius wird sie ihr bald überlegen sein . (ID: 668x673); (Calc-Prob:51) s. Doch ( der )m1 ist schon längst eingetreten , denn die Olympia-Gerontokraten werden kaum verzeihen , daß öffentlich wurde , worauf eine jede Bewerbungsstrategie fußt : daß sie korrumpierbar sind .[. . . ] Daß » sexuelle Neigungen « zur Zielpalette der » persönlichen Ansprache « gehören , mag dem die Krone aufsetzen , doch lenkt dieser Umstand eher von der Normalität dieser Bestechlichkeit ab . Daß Nawrocki von dieser bigotten Inszenierung profitiert , ist weder ( sein Verdienst )m2 noch von ihm gewollt . (ID: 657x671); (Calc-Prob:50) t. ( Der Aufsichtsrat der Olympia Marketing GmbH )m1 ernannte ihn am Donnerstag abend überraschend zum Geschäftsführer der Firma . Er löst auf dem Posten Nikolaus Fuchs ab , der wegen der Intimdatei über IOC-Mitglieder ( seinen )m2 Hut nehmen mußte . 155 Appendix E. All link errors from Chapter 4 u. v. w. x. y. z. 156 Fuchs ist geschäftsführender Gesellschafter der Bossard Consultants , die die Datenmasken als » Spielmaterial « für die Olympia GmbH gefertigt hatte . (ID: 782x793); (Calc-Prob:51) Die Alternativplanung des Senats sei teurer und schwieriger . ( Die AL-Abgeordnete Michaele Schreyer )m1 erklärte , daß ( sich )m2 das Land Berlin damals nicht die entsprechenden Rechte gesichert habe , sei eine bewußte Entscheidung des damaligen Bürgermeisters Momper gewesen . OBERBAUMBRÜCKE (ID: 881x882); (Calc-Prob:51) » So ein Käufer weiß ja , worauf ( er )m1 sich einläßt « , begründet dies Nagel . Kündigt der Vermieter und nutzt nachher die Wohnung nicht selbst , muß ( er )m2 nachweisen , daß die Kündigung nicht mißbräuchlich war - bisher liegt die Beweislast beim Mieter . Dann wird eventuell Schadensersatz fällig . (ID: 1149x1154); (Calc-Prob:83) Nur die FDP und ( ihre )m1 Ministerin wehren sich mit Händen und Füßen dagegen . Der Mieterschutz , so verkündeten Irmgard Schwaetzer und ( ihre )m2 FDP-Kollegin Sabine Leuthheuser-Schnarrenberger aus dem Justizressort , reiche aus . Das sieht man in Berlin anders . (ID: 1129x1136); (Calc-Prob:83) In der Folge steigen auch ( die Transportkosten )m1 um das 16- bis 22fache .[. . . ] Nur die von Monopolen geprägte Industriestruktur wird so statisch bleiben wie sie ist , und damit auch das Preisdiktat , das lediglich von den Planbehörden direkt zu den Monopolbetrieben verschoben wurde . Das Schwindelgefühl könnte bei den G-7-Herren nach dem Absteigen vom Gipfelkarussell zurückkehren , wenn ( sie )m2 weitere Fakten fest in den Blick nehmen - wie die Zunahme der Tauschgeschäfte auf 60 bis 70 Prozent des Geschäftsvolumens aller Betriebe oder das um elf Prozent sinkende Bruttosozialprodukt . (ID: 8325x8338); (Calc-Prob:52) Daß sich die Nachfolgerepubliken der UdSSR bereits im Oktober zügig auf eine Neudefinition der Rubelzone und die Aufteilung der Altschulden einigen werden , dürfte von den russischen Realitäten weit abgehobenes IWF-Wunschdenken sein . Und Stufe drei des Plans , in ( der )m1 endlich der sechs Milliarden Dollar teure RubelStabilisierungsfonds zum Einsatz kommen soll , ist qua Programm auf den St. Nimmerleinstag verschoben : Voraussetzung sei , so Camdessus , daß die wirtschaftliche Entwicklung ( sich )m2 stabilisiere . Die jedoch schlingert auf Abwärtskurs . (ID: 8295x8305); (Calc-Prob:52) Nun steht es still - und alle Welt wundert sich , daß es sich nicht von der Stelle bewegt hat . Bei der hohen Drehgeschwindigkeit - immer um den Mittelpunkt jenes 24 Milliarden Dollar schweren Hilfspakets für Rußland - ist es eher ein Wunder , daß ( eine neue Einsicht )m1 dennoch aufspringen konnte : die Erkenntnis , daß ( sich )m2 mit einem IWFStandardprogramm die Schwierigkeiten Rußlands nicht Schlag auf Schlag lösen lassen , wie Boris Jelzin den Kohls und Bushs klarmachen konnte . Rußlands Präsident war deshalb wohl wirklich “sehr zufrieden mit dem Treffen “der G7-Regierungschefs aus den USA , Japan , der Bundesrepublik , Frankreich , Großbritannien , Italien und Kanada - auch wenn es nur die konkrete Zusage gab , die erste IWFKredittranche von einer Milliarde Dollar im August zu überweisen . (ID: 8205x8206); (Calc-Prob:52) E.1. False positives E.1.4. Reflexive pronouns and non-subjects (60) a. b. c. d. e. f. g. Und so mischten sich beim nicht gerade enthusiastischen Schlußapplaus , als Armin Holz zu seinen Schauspielern auf die Bühne kam , unter die wenigen Bravo-Rufe auch lautstarke Unmutsbekundungen . Man hatte ( sich )m1 am Ende ( einer erfolgreichen Spielzeit )m2 von dieser letzten Premiere im Deutschen Theater , mit der übrigens Ignaz Kirchner ursprünglich sein Debüt als neues Ensemblemitglied hätte geben sollen , mehr versprochen . Kirchner wird nun erst im Herbst mit einer Langhoff-Inszenierung seine Arbeit aufnehmen . (ID: 239x240); (Calc-Prob:51) Und die Behörden reagieren darauf genau so indolent wie im Fall Helena G. Die Rechtslage , die so entstanden ist , spricht ( so ziemlich allem )m1 Hohn , was ( sich )m2 Juristen je ausgedacht haben : Bricht jemand in eine Wohnung ein , wird er in Polen , wie überall auf der Welt , mit bis zu mehreren Jahren Gefängnis bestraft . (ID: 452x455); (Calc-Prob:53) Uaaaaaah ! Die Realität noch viel grauenerregender , als ( es )m1 ( sich )m2 der wildeste Hammerwerfer in seinen perversesten Träumen ersinnen könnte ? Die Briten Vyv Simson und Andrew Jennings haben ihre Version vom olympischen Märchen unter dem Titel Geld , Macht und Doping auch schon als Buch veröffentlicht , was dem IOC-Präsidenten Juan Antonio Samaranch wenig gefallen hat . (ID: 541x542); (Calc-Prob:52) Die Herren der Ringe , ARD , Do. , 23 Uhr Haben nun also Edwin Kleins Bitterer Sieg mit einem wohligen Gruseln , aber doch in der festen Überzeugung studiert , es handele ( sich )m1 um ( einen Roman )m2 , mithin um Fiktion ! Und was müssen wir nun mitbekommen ? (ID: 535x536); (Calc-Prob:51) » Heckelmann kann sehr wohl entscheiden , Flüchtlinge , die mit dem Flugzeug kommen , unbürokratisch nach Berlin einreisen zu lassen « , so Christoph Koch , Dozent an der FU . Der Innensenator solle ( sich )m1 bei der Innenministerkonferenz für ( eine Aufhebung des Visumszwangs an den deutschen Grenzen )m2 einsetzen . Außerdem fordert die Initiative einen sofortigen Abschiebestopp . (ID: 738x742); (Calc-Prob:51) Die Alternativplanung des Senats sei teurer und schwieriger . Die AL-Abgeordnete Michaele Schreyer erklärte , daß ( sich )m1 das Land Berlin damals nicht die entsprechenden Rechte gesichert habe , sei ( eine bewußte Entscheidung des damaligen Bürgermeisters Momper )m2 gewesen . OBERBAUMBRÜCKE (ID: 882x886); (Calc-Prob:51) So verteilte die russische Delegation eine Prognose über die Entwicklung der russischen Wirtschaft . Beim Haushaltsdefizit orientierten ( sich )m1 die amtlichen Statistiker noch an der GaidarCamdessus-Vereinbarung : Es soll ( sich )m2 1992 auf deutlich weniger als die festgeschriebenen fünf Prozent , nämlich 2,3 Prozent des Bruttosozialprodukts , belaufen . Gegenwärtig jedoch liegt es bei 17 Prozent . (ID: 8248x8252); (Calc-Prob:83) 157 Appendix E. All link errors from Chapter 4 h. i. j. k. l. m. 158 Das Münchner Gipfelkarussell hat sich mit hoher Geschwindigkeit gedreht . Nun steht es still - und alle Welt wundert ( sich )m1 , daß es ( sich )m2 nicht von der Stelle bewegt hat . Bei der hohen Drehgeschwindigkeit - immer um den Mittelpunkt jenes 24 Milliarden Dollar schweren Hilfspakets für Rußland - ist es eher ein Wunder , daß eine neue Einsicht dennoch aufspringen konnte : die Erkenntnis , daß sich mit einem IWF-Standardprogramm die Schwierigkeiten Rußlands nicht Schlag auf Schlag lösen lassen , wie Boris Jelzin den Kohls und Bushs klarmachen konnte . (ID: 8195x8197); (Calc-Prob:83) Es wurde , wie der russische Wirtschaftsstar Gregori Jawlinski anläßlich des Gipfels erinnerte , bereits 1990 und 1991 Michail Gorbatschow als Kreditlinie eingeräumt . Der Kern , um ( den )m1 das Karussell ( sich )m2 drehte , löst sich somit auf . Hat es den Münchner Rummel überhaupt gegeben ? (ID: 8376x8378); (Calc-Prob:52) Allein der Weg dahin ist weit und mühsam und dauert ganz offensichtlich länger als jene “wenigen Wochen “, die Camdessus für die Regelung der Eigentums- und Investitionsfragen vorsieht . Daß ( sich )m1 die Nachfolgerepubliken der UdSSR bereits im Oktober zügig auf ( eine Neudefinition der Rubelzone )m2 und die Aufteilung der Altschulden einigen werden , dürfte von den russischen Realitäten weit abgehobenes IWF-Wunschdenken sein . Und Stufe drei des Plans , in der endlich der sechs Milliarden Dollar teure RubelStabilisierungsfonds zum Einsatz kommen soll , ist qua Programm auf den St. Nimmerleinstag verschoben : Voraussetzung sei , so Camdessus , daß die wirtschaftliche Entwicklung sich stabilisiere . (ID: 8284x8289); (Calc-Prob:51) Nun steht es still - und alle Welt wundert sich , daß es sich nicht von der Stelle bewegt hat . Bei der hohen Drehgeschwindigkeit - immer um den Mittelpunkt jenes 24 Milliarden Dollar schweren Hilfspakets für Rußland - ist es eher ein Wunder , daß eine neue Einsicht dennoch aufspringen konnte : die Erkenntnis , daß ( sich )m1 mit ( einem IWFStandardprogramm )m2 die Schwierigkeiten Rußlands nicht Schlag auf Schlag lösen lassen , wie Boris Jelzin den Kohls und Bushs klarmachen konnte . Rußlands Präsident war deshalb wohl wirklich “sehr zufrieden mit dem Treffen “der G7-Regierungschefs aus den USA , Japan , der Bundesrepublik , Frankreich , Großbritannien , Italien und Kanada - auch wenn es nur die konkrete Zusage gab , die erste IWFKredittranche von einer Milliarde Dollar im August zu überweisen . (ID: 8206x8207); (Calc-Prob:51) Offenbar schien es nicht im Interesse Radio Bremens zu sein , deutlich zu machen , warum die Lesben / Frauen so zahlreich ihren Unmut kundtaten . Die Sendung mutete sich als eine hysterische Stimmungsmache an , in der ( sich )m1 Frau Roggendorf im Vorfeld der ganzen Affäre verbal zu verteidigen suchte und die Lesben / Frauen gleich mit Buttersäure um ( sich )m2 warfen , um ihr Dasein als “kleine radikale Minderheit “unter Beweis zu stellen . Der benannte Buttersäureanschlag war eine Reaktion auf eine Veranstaltung unter der Leitung von Egbert Richter im “Ambrosia “, “Sexualität und Wohngemeinschaft “. (ID: 2717x2725); (Calc-Prob:83) Offenbar schien es nicht im Interesse Radio Bremens zu sein , deutlich zu machen , warum die Lesben / Frauen so zahlreich ihren Unmut kundtaten . Die Sendung mutete ( sich )m1 als eine hysterische Stimmungsmache an , in der ( sich )m2 Frau Roggendorf im Vorfeld der ganzen Affäre verbal zu verteidigen suchte und die Les- E.1. False positives n. ben / Frauen gleich mit Buttersäure um sich warfen , um ihr Dasein als “kleine radikale Minderheit “unter Beweis zu stellen . Der benannte Buttersäureanschlag war eine Reaktion auf eine Veranstaltung unter der Leitung von Egbert Richter im “Ambrosia “, “Sexualität und Wohngemeinschaft “. (ID: 2715x2717); (Calc-Prob:83) Sie alle versuchen ihren eigenen Weg zu gehen , aber am Schluß reißt der Strudel der Maueröffnung ihnen den Boden unter den Füßen weg . Es gibt keine DDR mehr , in ( der )m1 man ( sich )m2 einrichten , oder für die man sich engagieren kann . Der schmerzhafte Entschluß zur Flucht , der Vertrauensbruch mit denen , die blieben , hat plötzlich keinen Sinn mehr . (ID: 1841x1842); (Calc-Prob:52) E.1.5. Problems with substring-matches (61) a. b. c. d. e. f. Das gefiel ( dem Hund )m1 so gut , daß er unmittelbar hinter der Tür Stellung bezog und niemanden mehr durchließ .[. . . ] Erst die Feuerwehr konnte beide durch das Fenster befreien . Herrchen wollte ( den Hundefänger )m2 holen . (ID: 331x345); (Calc-Prob:52) Das gefiel ( dem Hund )m1 so gut , daß er unmittelbar hinter der Tür Stellung bezog und niemanden mehr durchließ . Als ein Bekannter ( des Hundehalters )m2 versuchte , die Wohnung zu räumen , wurde er gebissen und flüchtete ins Wohnzimmer zur Gattin des Besitzers . Erst die Feuerwehr konnte beide durch das Fenster befreien . (ID: 331x335); (Calc-Prob:52) Um so mehr , als man das Absurde an dieser Praxis noch auf die Spitze treiben kann . Im Fall Helena G. verurteilte das Gericht ( die wilde Mieterin )m1 zur Zahlung von ( Miete )m2 . Doch hätte die nicht gezahlt , hätte Helena G. sie auch nicht rauswerfen können . (ID: 502x503); (Calc-Prob:52) In ( letzter Zeit )m1 kümmern sich die Besetzer allerdings nicht mehr sonderlich darum , ob eine Wohnung bewohnt ist oder nicht .[. . . ] Inzwischen klagt dieser beim Obersten Gerichtshof , dessen Richter vorsichtshalber auch gleich in Urlaub gefahren sind . Einer von ihnen erklärte ( einer Zeitung )m2 schon mal anonym , es sei durchaus rechtens , wenn man wilde Mieter auf eigene Faust rauswerfe . (ID: 444x490); (Calc-Prob:52) Als ob das irgend etwas mit Sport zu tun hätte , daß Hans Anton ein paar Jährchen in faschistischer Uniform rumgekaspert ist und seine Briefe mit “es grüßt mit ( erhobenem Arm )m1 “unterschrieben hat .[. . . ] Schließlich stehen sie im Dienst einer großen und gerechten Sache , dem Bankkonto des IOC . Also , wenn wir die Briten richtig verstanden haben wollen , handelt es sich bei Juan und seinen 94 Komplizen aus ( dem Lausanner Marmorpalast )m2 um die korrupteste Bande auf Gottes Erdboden . (ID: 577x623); (Calc-Prob:52) Das kathartische Schauspiel wurde inszeniert , um , wie es so schön heißt , weiteren Schaden von ( der Bewerbung )m1 abzuwenden . 159 Appendix E. All link errors from Chapter 4 g. h. i. j. k. l. 160 Doch der ist schon längst eingetreten , denn die Olympia-Gerontokraten werden kaum verzeihen , daß öffentlich wurde , worauf ( eine jede Bewerbungsstrategie )m2 fußt : daß sie korrumpierbar sind . Daß » sexuelle Neigungen « zur Zielpalette der » persönlichen Ansprache « gehören , mag dem die Krone aufsetzen , doch lenkt dieser Umstand eher von der Normalität dieser Bestechlichkeit ab . (ID: 656x659); (Calc-Prob:52) ( Senat )m1 soll Visumzwang für Kriegsflüchtlinge aufheben[. . . ] Mit einer Mahnwache vor der Innenverwaltung am Fehrbelliner Platz machten etwa zehn Leute gestern vormittag auf die Lage der Flüchtlinge aus dem Kriegsgebiet aufmerksam . In einer Petition forderten sie ( Innensenator Dieter Heckelmann ( )m2 CDU ) auf , den Visumszwang für Kriegsflüchtlinge aus allen Teilen des ehemaligen Jugoslawien in Berlin aufzuheben . (ID: 701x724); (Calc-Prob:52) Allerdings sind im Wirtschaftsplan des Jahres 1992 der Olympia GmbH unter dem Haushaltstitel » Agenturleistungen « für » ( Bewerbungsstrategie )m1 « 1,9 Millionen Mark veranschlagt .[. . . ] An der Marketing GmbH sind neun namhafte Unternehmen beteiligt . Nawrocki , der immer privat organisierten Olympischen Spielen das Wort geredet hat , sieht durch seine Ernennung das unternehmerische Engagement bei ( der Berliner Bewerbung )m2 aufgewertet . (ID: 828x854); (Calc-Prob:52) Wie Nawrocki gestern erklärte , erhält Fuchs keine Abfindung , da ( sein Vertrag )m1 regulär am 15. August ausläuft .[. . . ] Nawrocki selbst erhält für seinen Doppeljob kein zusätzliches Salär . ( Sein Geschäftsführervertrag mit der Marketing GmbH )m2 ist unbefristet . (ID: 814x836); (Calc-Prob:52) Der Aufsichtsrat der Olympia Marketing GmbH ernannte ihn am Donnerstag abend überraschend zum ( Geschäftsführer der Firma )m1 .[. . . ] Fuchs ist geschäftsführender Gesellschafter der Bossard Consultants , die die Datenmasken als » Spielmaterial « für die Olympia GmbH gefertigt hatte . Wie der Aufsichtsratsvorsitzende Peter Weichhardt erklärte , sei man mit ( dem Geschäftsführerwechsel )m2 der Auffassung des Regierenden Bürgermeisters Diepgen gefolgt , daß mit den Intimdaten » leichtfertig « umgegangen worden sei . (ID: 786x802); (Calc-Prob:52) ( Der Aufsichtsrat der Olympia Marketing GmbH )m1 ernannte ihn am Donnerstag abend überraschend zum Geschäftsführer der Firma .[. . . ] Fuchs ist geschäftsführender Gesellschafter der Bossard Consultants , die die Datenmasken als » Spielmaterial « für die Olympia GmbH gefertigt hatte . Wie ( der Aufsichtsratsvorsitzende Peter Weichhardt )m2 erklärte , sei man mit dem Geschäftsführerwechsel der Auffassung des Regierenden Bürgermeisters Diepgen gefolgt , daß mit den Intimdaten » leichtfertig « umgegangen worden sei . (ID: 782x801); (Calc-Prob:52) Der IWF hält standardprogrammgemäß den Blick fest geheftet auf das Haushaltsdefizit , die Stabilisierung der Währung und ( freie Preise )m1 .[. . . ] Unregelmäßige Lieferungen von Bauteilen und Rohstoffen , so die russischen Statistiker , werden die Exporte Rußlands um 17 bis 22 Prozent drücken . ( Die Energiepreise )m2 werden nicht - wie der IWF fordert - freigegeben , aber um das 30fache erhöht . (ID: 8273x8321); (Calc-Prob:52) E.1. False positives m. n. o. p. q. r. s. Sobald der Schwindel ihrer rasanten Karussellfahrt nachläßt und der verschwommene Blick auf die Welt wieder klarer wird , werden auch die Kanten ( der Vereinbarung zwischen dem russischen Premierminister Jegor Gaidar und dem IWF-Exekutiv-Direktor Michel Camdessus )m1 Kontur gewinnen .[. . . ] So verteilte die russische Delegation eine Prognose über die Entwicklung der russischen Wirtschaft . Beim Haushaltsdefizit orientierten sich die amtlichen Statistiker noch an ( der GaidarCamdessus-Vereinbarung )m2 : Es soll sich 1992 auf deutlich weniger als die festgeschriebenen fünf Prozent , nämlich 2,3 Prozent des Bruttosozialprodukts , belaufen . (ID: 8240x8250); (Calc-Prob:52) Das Münchner Gipfelkarussell hat sich mit ( hoher Geschwindigkeit )m1 gedreht .[. . . ] Nun steht es still - und alle Welt wundert sich , daß es sich nicht von der Stelle bewegt hat . Bei ( der hohen Drehgeschwindigkeit )m2 - immer um den Mittelpunkt jenes 24 Milliarden Dollar schweren Hilfspakets für Rußland - ist es eher ein Wunder , daß eine neue Einsicht dennoch aufspringen konnte : die Erkenntnis , daß sich mit einem IWFStandardprogramm die Schwierigkeiten Rußlands nicht Schlag auf Schlag lösen lassen , wie Boris Jelzin den Kohls und Bushs klarmachen konnte . (ID: 8192x8199); (Calc-Prob:52) Erstmals seit langer Zeit schaffte die Vielspringerin und Aktivistin in Sachen Dopingbekämpfung die 2-m-Marke nicht und scheiterte bei ( 1,98 m )m1 .[. . . ] Der Keniate Yobes Ondieki lief über 5.000 Meter in 13:03,58 Minuten eine neue Jahresweltbestleistung . Damit war er über fünf Sekunden schneller als der hoffnungsvolle Ohrringträger Dieter Baumann ( Leverkusen ) in ( seinem )m2 deutschen Rekordlauf am 6. Juni in Sevilla . (ID: 8433x8441); (Calc-Prob:53) Erstmals seit ( langer Zeit )m1 schaffte die Vielspringerin und Aktivistin in Sachen Dopingbekämpfung die 2-m-Marke nicht und scheiterte bei 1,98 m .[. . . ] Mit 10,06 Sekunden gab ihm der Nigerianer Olapade Adeniken um 1/100-Sekunde das Nachsehen . Mit ( der gleichen knappen Zeitspanne )m2 unterlag im 200-m-Lauf der Afrikaner dem Weltmeister Michael Johnson ( USA ) , der für die halbe Stadionrunde 20,10 Sekunden benötigte . (ID: 8427x8466); (Calc-Prob:52) Erstmals seit langer Zeit schaffte die Vielspringerin und Aktivistin in Sachen Dopingbekämpfung die 2-m-Marke nicht und scheiterte bei ( 1,98 m )m1 .[. . . ] Die große Verliererin des WM-Jahres aus Jamaika setzte sich mit 22,18 Sekunden gegen ihre Landsfrau Juliet Cuthbert durch . Dagegen mußte sich im 100-Meter-Sprint ( der Männer )m2 überraschend der OlympiaZweite Linford Christie ( Großbritannien ) vor rund 12.000 Zuschauern erstmals in dieser Saison geschlagen geben . (ID: 8433x8456); (Calc-Prob:52) Der Satz der 27jährigen ist der weiteste Sprung ( einer Frau )m1 in diesem Jahr .[. . . ] Eine starke Rückkehr feierte die Weltmeisterschafts-Dritte Merlene Ottey über 200 Meter . Die große Verliererin des WM-Jahres aus Jamaika setzte sich mit 22,18 Sekunden gegen ( ihre Landsfrau Juliet Cuthbert )m2 durch . (ID: 8407x8454); (Calc-Prob:52) Der Satz der 27jährigen ist der weiteste Sprung einer Frau in ( diesem Jahr )m1 .[. . . ] Eine starke Rückkehr feierte die Weltmeisterschafts-Dritte Merlene Ottey über 200 Meter 161 Appendix E. All link errors from Chapter 4 . Die große Verliererin ( des WM-Jahres )m2 aus Jamaika setzte sich mit 22,18 Sekunden gegen ihre Landsfrau Juliet Cuthbert durch . (ID: 8409x8448); (Calc-Prob:52) 162 t. Dazu hat die Mutter eines dreijährigen Sohnes ( ihren Deutschen Rekord )m1 eingestellt .[. . . ] Der Keniate Yobes Ondieki lief über 5.000 Meter in 13:03,58 Minuten eine neue Jahresweltbestleistung . Damit war er über fünf Sekunden schneller als der hoffnungsvolle Ohrringträger Dieter Baumann ( Leverkusen ) in ( seinem deutschen Rekordlauf am 6. Juni in Sevilla )m2 . (ID: 8413x8444); (Calc-Prob:52) u. In der ersten Runde des Federation Cup vom 12. bis 19. Juli in Frankfurt / Main trifft ( die an Nummer eins gesetzte deutsche Mannschaft um Wimbledon-Siegerin Steffi Graf )m1 auf Außenseiter Neuseeland . Das Eröffnungsspiel ( der Mannschaftsweltmeisterschaft der Damen mit K.O.-System )m2 bestreitet Anke Huber am Montag um 11.30 Uhr auf dem Centre Court . PRESS-SCHLAG (ID: 8587x8591); (Calc-Prob:52) v. Das Internationale Olympische Komitee und das Nationale Olympische Komitee Jugoslawiens haben sich darauf geeinigt , daß die Sportler aus Serbien und Montenegro unter der Bezeichnung “Mannschaft der Freundschaft “sowie der Olympia-Fahne und der olympischen Hymne an ( den Start )m1 gehen .[. . . ] Der 800-Meter-Läufer hatte geklagt , er sei beim Ausscheidungslauf von einem Mitläufer getreten worden und deshalb so lahm gewesen . Auch dem Weltmeister im 10.000-Meter-Lauf , Moses Tanui , ist im nachhinein ( der Olympia-Start )m2 genehmigt worden . (ID: 8502x8544); (Calc-Prob:52) w. Die Idee , in Irakisch-Kurdistan Bildungsarbeit zu leisten , entstand im ( vergangenen Jahr )m1 , als ein Asta-Hilfskonvoi nach dem Ende des Golfkriegs in das Kriegsgebiet gereist war .[. . . ] Auch die Berliner Landesstelle für Entwicklungszusammenarbeit der Senatsverwaltung für Wirtschaft signalisierte Bereitschaft , 30.000 Mark zu übernehmen , sofern die Restfinanzierung gesichert sei . Das Kurdistan-Komitee hofft nun , das restliche Geld möglichst schnell zusammenzubekommen , damit der Schulbetrieb auch tatsächlich zum ( Winterhalbjahr )m2 aufgenommen werden kann . (ID: 12356x12419); (Calc-Prob:52) x. » Wir werden hierbleiben und unser Dorf und ( unsere Häuser )m1 wiederaufbauen . «[. . . ] Alle drei Dörfer sind auf einer Straße zu erreichen und bereits wieder an die Wasserversorgung angeschlossen . Ein Teil ( der Wohnhäuser )m2 wurde bereits wiederaufgebaut . (ID: 12293x12385); (Calc-Prob:52) y. Aber diesmal ist es keine Koketterie , ( keine » Flucht nach vorn )m1 « .[. . . ] Alle Mitwirkenden leben noch ihr Leben , sprechen ihren Jargon , und obwohl es nicht ihre persönliche Geschichte ist , die erzählt wird , spürt man , daß sie alle Gefühle so oder ähnlich erlebt haben . Die Tränen in ( der bewegend einfach gefilmten Fluchtszene )m2 sind so echt wie die Dokumentaraufnahmen , die dazwischengeschnitten werden . (ID: 1865x1877); (Calc-Prob:52) E.1. False positives z. In ( keiner anderen Stadt unseres Kontinents )m1 muß die Feuerwehr so oft raus wie an der Spree .[. . . ] D.N.T.T. gibt sich redlich Mühe , das Publikum zu schockieren . Das sei “die Aufgabe des Theaters in der rauhen und reizüberfluteten Welt von 1992 “- so ein Mitglied des Ensembles gegenüber ( der Stadtzeitschrift ’ Zitty ‘ , die das Spektakel mit veranstaltet )m2 . (ID: 6316x6407); (Calc-Prob:52) E.1.6. “Es“(“it“) as expletive pronoun in German (62) a. b. c. d. e. So leichtfertig wie er das dahersagt , so freudig wurde ( es )m1 vom Aufsichtsrat der Marketing GmbH aufgenommen und dem Geschäftsführer ein Strick daraus gedreht .[. . . ] Ein Kopf mußte rollen , damit sich die anderen aus der Schlinge ziehen konnten . Das kathartische Schauspiel wurde inszeniert , um , wie ( es )m2 so schön heißt , weiteren Schaden von der Bewerbung abzuwenden . (ID: 644x654); (Calc-Prob:83) Damit vollzieht sich , was Geschäftsführer Nawrocki immer wollte : Die Unternehmerschaft wird zunehmend das Sagen haben , die öffentliche Hand hat allenfalls für die notwendige Infrastruktur zu sorgen . ( Das Beispiel Fuchs )m1 zeigt , daß Unternehmer auch mit ihresgleichen nicht zimperlich umgehen , wenn ( es )m2 ihren Interessen entspricht . Nawrocki hat sich bislang für olympische Verhältnisse erstaunlich gut gehalten , doch nun sitzt er auf zwei Schleuderstühlen . (ID: 688x690); (Calc-Prob:52) Und anders geht ( es )m1 auch kaum . » Einen vertragestreuen Mieter kriegen Sie heutzutage nur über eine Eigenbedarfskündigung raus « , heißt ( es )m2 in Hausbesitzerkreisen . Und dies alles wird vom Steuerzahler noch bezuschußt . (ID: 1080x1084); (Calc-Prob:83) Außerdem , erzählt Frau Fiedler , gebe ( es )m1 häufig westdeutsche Hausbesitzer , die renitenten Mietern drohten , nach Berlin zu ziehen und dann eben Eigenbedarf anzumelden .[. . . ] Vor allem bei Mietern , die gerade eine Mieterhöhung oder eine Modernisierung verweigert haben , ist das der Fall . Hanka Fiedler : » Wenn ( es )m2 schriftliche Unterlagen über solche Rechtsstreitigkeiten gibt , hat der Mieter vor Gericht gute Karten nachzuweisen , daß der Eigenbedarf nur vorgeschoben ist . « (ID: 1039x1051); (Calc-Prob:83) Nun steht es still - und alle Welt wundert sich , daß ( es )m1 sich nicht von der Stelle bewegt hat .[. . . ] Bei der hohen Drehgeschwindigkeit - immer um den Mittelpunkt jenes 24 Milliarden Dollar schweren Hilfspakets für Rußland - ist es eher ein Wunder , daß eine neue Einsicht dennoch aufspringen konnte : die Erkenntnis , daß sich mit einem IWF-Standardprogramm die Schwierigkeiten Rußlands nicht Schlag auf Schlag lösen lassen , wie Boris Jelzin den Kohls und Bushs klarmachen konnte . Rußlands Präsident war deshalb wohl wirklich “sehr zufrieden mit dem Treffen “der G-7Regierungschefs aus den USA , Japan , der Bundesrepublik , Frankreich , Großbritannien , Italien und Kanada - auch wenn ( es )m2 nur die konkrete Zusage gab , die erste IWFKredittranche von einer Milliarde Dollar im August zu überweisen . 163 Appendix E. All link errors from Chapter 4 (ID: 8196x8227); (Calc-Prob:83) 164 f. ( Es )m1 wurde , wie der russische Wirtschaftsstar Gregori Jawlinski anläßlich des Gipfels erinnerte , bereits 1990 und 1991 Michail Gorbatschow als Kreditlinie eingeräumt .[. . . ] Der Kern , um den das Karussell sich drehte , löst sich somit auf . Hat ( es )m2 den Münchner Rummel überhaupt gegeben ? (ID: 8372x8381); (Calc-Prob:53) g. Besonders beeindruckt habe sie ( das Selbstbewußtsein )m1 und der Lebenswille der kurdischen Bevölkerung .[. . . ] » Einen derartigen Grad an Zerstörung kann man sich überhaupt nicht vorstellen . ( Es )m2 ist der helle Wahnsinn . « (ID: 12394x12400); (Calc-Prob:52) h. Mit all diesen obskuren Äußerungen fällt Frau Rogendorf nicht nur den Lesben / Frauen auf den Wecker , die ( ihr )m1 Leben selbstbestimmt und -bewußt leben , sondern auch all den Frauen / Lesben , die sich für die Abschaffung des § 218 einsetzen . Wenn “eine Gesellschaft sich nur über das Kind entwickeln kann “, wie ( es )m2 Frau Roggendorf behauptet , ist zu fragen , wie die Gesellschaft mit den ungewollten Kindern klarkommt , für die wir Frau Roggendorf hiermit dankbar die Adoptionsurkunde ausstellen wolllen . Amen . (ID: 2775x2787); (Calc-Prob:50) i. Aber diesmal ist ( es )m1 keine Koketterie , keine » Flucht nach vorn « . Alle Mitwirkenden leben noch ihr Leben , sprechen ihren Jargon , und obwohl ( es )m2 nicht ihre persönliche Geschichte ist , die erzählt wird , spürt man , daß sie alle Gefühle so oder ähnlich erlebt haben . Die Tränen in der bewegend einfach gefilmten Fluchtszene sind so echt wie die Dokumentaraufnahmen , die dazwischengeschnitten werden . (ID: 1863x1871); (Calc-Prob:83) j. Der Regisseur bezeichnet ihn als Film von der Straße für die Straße , und ( es )m1 ist ein Kompliment für ihn , wenn man die Darsteller » unbeholfen « nennt . Aber diesmal ist ( es )m2 keine Koketterie , keine » Flucht nach vorn « . Alle Mitwirkenden leben noch ihr Leben , sprechen ihren Jargon , und obwohl es nicht ihre persönliche Geschichte ist , die erzählt wird , spürt man , daß sie alle Gefühle so oder ähnlich erlebt haben . (ID: 1859x1863); (Calc-Prob:83) k. ( Es )m1 gibt keine gesellschaftliche Kraft , die die Vorteile eines gemeinsamen Landes nicht herausstellt . Allein schon vom Blick auf die Landkarte , in der die große Stadt Berlin mitten im Land Brandenburg liegt , macht überdeutlich , daß ( es )m2 in Zukunft eine sehr enge wirtschaftliche , verkehrsmäßige , kulturelle , bildungsmäßige Verflechtung der beiden Länder geben wird . Dies ist auch aus alternativer Sicht wünschenswert . (ID: 12125x12135); (Calc-Prob:53) l. Weit und breit gibt ( es )m1 keine Bildungsmöglichkeit , ein Desaster nicht nur für die Kinder , sondern auch für die kurdische Kultur und Sprache . Gäbe ( es )m2 in Kani Balav eine Schule , würde dort Ende September das neue Schuljahr beginnen . Um diese Utopie zu verwirklichen , reiste die Berlinerin Ulrike Hoffmann vier Wochen durch die irakisch-kurdischen Berge . (ID: 12325x12331); (Calc-Prob:83) E.1. False positives m. Am 14. 7. um 20 Uhr im Tempodrom , In den Zelten , Tiergarten Umsonst und draußen ist ( es )m1 immer voll , da braucht ( es )m2 keine Werbung . Deshalb hier nur ein paar Anmerkungen zu Rico Rodriguez , weil er mehr ist als nur ein Musiker , der auch noch ganz gut ins diesjährige Konzept der » Heimatklänge « paßt . (ID: 12700x12701); (Calc-Prob:83) n. Aber so geht ( es )m1 nicht : Das Urteil des Verwaltungsgerichtes Bremen hat auf den Wangen der Herren Wilhelm und Jachmann rote Streifen hinterlassen .[. . . ] Man wird jetzt laut überlegen müssen , ob dieses Amt nicht langsam aufgelöst werden sollte . Nicht nur , daß ( es )m2 kaum noch lohnende Feindbilder gibt . (ID: 13419x13427); (Calc-Prob:83) o. Noch wesentlich mehr Wohnungen wird ( es )m1 nach dem Urteil pro Jahr treffen , meint Hartmann Vetter , Geschäftsführer des Berliner Mietervereins .[. . . ] Zum einen liegen Tausende von Anträgen auf Umwandlung auf Halde . Inzwischen können auch Wohnungen in Ost-Berlin umgewandelt und verkauft werden , soweit ( es )m2 die ungeklärten Eigentumsverhältnisse zulassen , so daß Vetter mit bis zu 10.000 Umwandlungen im Jahr rechnet . (ID: 964x975); (Calc-Prob:83) p. Damit wurde eine laufende Umwandlungswelle gestoppt : Über 85.000 Westberliner Altbauwohnungen waren bis zu diesem Zeitpunkt umgewandelt worden , circa 5.000 bis 6.000 waren ( es )m1 im Jahr . Noch wesentlich mehr Wohnungen wird ( es )m2 nach dem Urteil pro Jahr treffen , meint Hartmann Vetter , Geschäftsführer des Berliner Mietervereins . Zum einen liegen Tausende von Anträgen auf Umwandlung auf Halde . (ID: 961x964); (Calc-Prob:83) q. In zwei Jahren werde ( es )m1 tausend Filme über diese historische Umbruchszeit geben , und viel professionellere dazu .[. . . ] Aber der Laienfilmer Worm scharte seine Freunde , Kollegen und Stammgäste um sich , drehte mit einfachsten Mitteln trotzdem seinen Film und behielt recht . Bis heute gibt ( es )m2 keine vergleichbare Auseinandersetzung , zumindest nicht in Spielfilmform . (ID: 1789x1801); (Calc-Prob:83) r. Wieviel Hoffnungen , Wünsche , Träume , Utopien gab ( es )m1 ?[. . . ] Als Mario Worm , ein Ostberliner Kneipenwirt , schon 1989 mit der Idee , das alles in einem Film festzuhalten , hausieren ging , riet man ihm ab . In zwei Jahren werde ( es )m2 tausend Filme über diese historische Umbruchszeit geben , und viel professionellere dazu . (ID: 1783x1789); (Calc-Prob:83) s. Sicher , » Keine Gewalt « ist ein Laienfilm , auf S-VHS gedreht , mit schlechtem Ton , technisch und dramturgisch oft unzulänglich . Der Regisseur bezeichnet ihn als Film von der Straße für die Straße , und ( es )m1 ist ( ein Kompliment für ihn )m2 , wenn man die Darsteller » unbeholfen « nennt . Aber diesmal ist es keine Koketterie , keine » Flucht nach vorn « . (ID: 1859x1861); (Calc-Prob:50) t. In dem Leitantrag des Vorstandes , der zur Zeit erarbeitet wird , wird ( es )m1 Forderungen zur Bekämfung der Fluchtursachen sowie nach eine Einwanderungsgesetz und einer Beschleunigung des Asylverfahrens geben . Der innenpolitische Sprecher der CDU , Ralf Borttscheller , nannte ( es )m2 gestern positiv , wenn sich Wedemeier von der “starren Haltung “der SPD absetze . 165 Appendix E. All link errors from Chapter 4 Borttscheller . (ID: 1999x2008); (Calc-Prob:83) u. Dabei gab ( es )m1 laut Vorstandsmitglied Heiner Erling eine eindeutige Mehrheit für den Erhalt des Artikels 16 in seiner jetzigen Form . In dem Leitantrag des Vorstandes , der zur Zeit erarbeitet wird , wird ( es )m2 Forderungen zur Bekämfung der Fluchtursachen sowie nach eine Einwanderungsgesetz und einer Beschleunigung des Asylverfahrens geben . Der innenpolitische Sprecher der CDU , Ralf Borttscheller , nannte es gestern positiv , wenn sich Wedemeier von der “starren Haltung “der SPD absetze . (ID: 1988x1999); (Calc-Prob:83) v. Jetzt sitzt sie auf einem der Holzstühle , wippt das Kind auf den Knien . “Daß ( es )m1 einen Wickelraum gibt , ist ( ein Gerücht )m2 . “ Sie habe ihn jedenfalls noch nicht gesehen . (ID: 3813x3815); (Calc-Prob:50) E.1.7. Units, currencies, month names, weekdays and the like (63) 166 a. Von dieser Regelung profitiert bespielsweise auch die Bundesbauministerin Irmgard Schwaetzer ( FDP ) selbst mit ( ihren circa 15.000 Mark monatlichen Bruttoeinkommen )m1 und ihrem nicht unerheblichen Steuersatz . Die Ministerin besaß bis Anfang 1991 eine Wohnung in der Bonner Riemannstraße in Bonn , mit deren Erwerb sie ( bis zu tausend Mark Steuern )m2 im Monat sparen konnte . Auf großstädtische Verhältnisse umgerechnet nehmen sich solche Summen noch ganz anders aus . (ID: 1097x1108); (Calc-Prob:83) b. Eine leere Eigentumswohnung bringt hingegen unverändert ( zwischen 4.000 und 5.000 Mark )m1 - den Mieter herauszuklagen , lohnt sich da schon .[. . . ] Über den sogenannten Häuslebauerparagraphen 10e werden Erwerber von Eigentumswohnungen - auch bewohnten - großzügige Steuererleichterungen eingeräumt , als ob sie eine neue Wohnung schaffen würden . Von dieser Regelung profitiert bespielsweise auch die Bundesbauministerin Irmgard Schwaetzer ( FDP ) selbst mit ( ihren circa 15.000 Mark monatlichen Bruttoeinkommen )m2 und ihrem nicht unerheblichen Steuersatz . (ID: 1077x1097); (Calc-Prob:83) c. Eine vermietete Eigenumswohnung kostet , so Makler Bendzko , ( zwischen 2.700 und 3.000 Mark den Quadratmeter )m1 .[. . . ] Diese Preise werden nach dem Urteil mindestens stagnieren , wenn nicht sinken , schätzt Bendzko . Eine leere Eigentumswohnung bringt hingegen unverändert ( zwischen 4.000 und 5.000 Mark )m2 - den Mieter herauszuklagen , lohnt sich da schon . (ID: 1072x1077); (Calc-Prob:83) d. Denn das macht Hunderttausende von ( Mark )m1 aus . Eine vermietete Eigenumswohnung kostet , so Makler Bendzko , ( zwischen 2.700 und 3.000 Mark den Quadratmeter )m2 . Diese Preise werden nach dem Urteil mindestens stagnieren , wenn nicht sinken , schätzt Bendzko . (ID: 1068x1072); (Calc-Prob:83) e. ( Nur zehn Prozent der umgewandelten Wohnungen )m1 sind an die dort wohnenden Mieter verkauft worden , ein Drittel der Mieter hat die Wohnung verlassen müssen , viele E.1. False positives f. g. h. i. j. k. andere haben mit Kündigungsprozessen zu kämpfen .[. . . ] Denn das ist häufig der Fall . ( Zwischen 60 und 70 Prozent der Vermieter )m2 , schätzt Hanka Fiedler , wollen nur den Mieter herausbekommen , um die Wohnung , die dann im Preis steigt , besser verkaufen zu können . (ID: 988x1062); (Calc-Prob:83) Aber schon bei der Inflationsrate , die nach Gaidars Plänen bis Ende des Jahres von monatlich 20 auf ( höchstens zehn Prozent )m1 sinken soll , steigen die Statistiker aus dem gemeinsamen Boot mit Gaidar aus . Sie errechneten ( 30 bis 35 Prozent monatliche Inflationsrate )m2 . Der IWF hält standardprogrammgemäß den Blick fest geheftet auf das Haushaltsdefizit , die Stabilisierung der Währung und freie Preise . (ID: 8261x8267); (Calc-Prob:83) Gegenwärtig jedoch liegt es bei ( 17 Prozent )m1 . Aber schon bei der Inflationsrate , die nach Gaidars Plänen bis Ende des Jahres von monatlich 20 auf ( höchstens zehn Prozent )m2 sinken soll , steigen die Statistiker aus dem gemeinsamen Boot mit Gaidar aus . Sie errechneten 30 bis 35 Prozent monatliche Inflationsrate . (ID: 8255x8261); (Calc-Prob:83) Das Gipfelkarussell ist abgebaut / Sein Kern , das Hilfsprogramm für Rußland , verschwindet zusehends / Der aufgeschobene Schuldendienst hat Rußland bis heute ( 2,5 Milliarden Dollar )m1 gekostet[. . . ] Nun steht es still - und alle Welt wundert sich , daß es sich nicht von der Stelle bewegt hat . Bei der hohen Drehgeschwindigkeit - immer um den Mittelpunkt jenes ( 24 Milliarden Dollar )m2 schweren Hilfspakets für Rußland - ist es eher ein Wunder , daß eine neue Einsicht dennoch aufspringen konnte : die Erkenntnis , daß sich mit einem IWFStandardprogramm die Schwierigkeiten Rußlands nicht Schlag auf Schlag lösen lassen , wie Boris Jelzin den Kohls und Bushs klarmachen konnte . (ID: 8188x8200); (Calc-Prob:83) Eine Niederlage mußte auch Weltmeister Samuel Matete ( Sambia ) über ( 400 Meter Hürden )m1 hinnehmen .[. . . ] Seine 48,18 Sekunden reichten nicht , um Kevin Young ( USA , 47,97 ) zu schlagen . Über ( 800 Meter )m2 setzte sich der Kenianer William Tanui mit 1:43,62 Minuten gegen den Weltjahresbesten Johnny Gray ( USA , 1:44,19 ) durch . (ID: 8476x8481); (Calc-Prob:83) Mit der gleichen knappen Zeitspanne unterlag im 200-m-Lauf der Afrikaner dem Weltmeister Michael Johnson ( USA ) , der für die halbe Stadionrunde ( 20,10 Sekunden )m1 benötigte .[. . . ] Eine Niederlage mußte auch Weltmeister Samuel Matete ( Sambia ) über 400 Meter Hürden hinnehmen . ( Seine 48,18 Sekunden )m2 reichten nicht , um Kevin Young ( USA , 47,97 ) zu schlagen . (ID: 8473x8478); (Calc-Prob:83) Eine starke Rückkehr feierte die Weltmeisterschafts-Dritte Merlene Ottey über ( 200 Meter )m1 .[. . . ] Mit der gleichen knappen Zeitspanne unterlag im 200-m-Lauf der Afrikaner dem Weltmeister Michael Johnson ( USA ) , der für die halbe Stadionrunde 20,10 Sekunden benötigte . Eine Niederlage mußte auch Weltmeister Samuel Matete ( Sambia ) über ( 400 Meter 167 Appendix E. All link errors from Chapter 4 Hürden )m2 hinnehmen . (ID: 8447x8476); (Calc-Prob:83) 168 l. Mit ( 10,06 Sekunden )m1 gab ihm der Nigerianer Olapade Adeniken um 1/100-Sekunde das Nachsehen . Mit der gleichen knappen Zeitspanne unterlag im 200-m-Lauf der Afrikaner dem Weltmeister Michael Johnson ( USA ) , der für die halbe Stadionrunde ( 20,10 Sekunden )m2 benötigte . Eine Niederlage mußte auch Weltmeister Samuel Matete ( Sambia ) über 400 Meter Hürden hinnehmen . (ID: 8461x8473); (Calc-Prob:83) m. Die große Verliererin des WM-Jahres aus Jamaika setzte sich mit ( 22,18 Sekunden )m1 gegen ihre Landsfrau Juliet Cuthbert durch .[. . . ] Dagegen mußte sich im 100-Meter-Sprint der Männer überraschend der Olympia-Zweite Linford Christie ( Großbritannien ) vor rund 12.000 Zuschauern erstmals in dieser Saison geschlagen geben . Mit ( 10,06 Sekunden )m2 gab ihm der Nigerianer Olapade Adeniken um 1/100-Sekunde das Nachsehen . (ID: 8452x8461); (Calc-Prob:83) n. Damit war er ( über fünf Sekunden )m1 schneller als der hoffnungsvolle Ohrringträger Dieter Baumann ( Leverkusen ) in seinem deutschen Rekordlauf am 6. Juni in Sevilla .[. . . ] Eine starke Rückkehr feierte die Weltmeisterschafts-Dritte Merlene Ottey über 200 Meter . Die große Verliererin des WM-Jahres aus Jamaika setzte sich mit ( 22,18 Sekunden )m2 gegen ihre Landsfrau Juliet Cuthbert durch . (ID: 8439x8452); (Calc-Prob:83) o. Der Keniate Yobes Ondieki lief über ( 5.000 Meter )m1 in 13:03,58 Minuten eine neue Jahresweltbestleistung .[. . . ] Damit war er über fünf Sekunden schneller als der hoffnungsvolle Ohrringträger Dieter Baumann ( Leverkusen ) in seinem deutschen Rekordlauf am 6. Juni in Sevilla . Eine starke Rückkehr feierte die Weltmeisterschafts-Dritte Merlene Ottey über ( 200 Meter )m2 . (ID: 8435x8447); (Calc-Prob:83) p. Lausanne ( dpa ) - 17 Tage vor Eröffnung des Olympiaspektakels in Barcelona ist Heike Drechsler bereits in weltbester Flugform : Mit ( 7,48 Meter )m1 segelte sie beim GrandPrix-Meeting der Leichtathleten am Mittwoch abend in Lausanne nur um vier Zentimeter am Weltrekord von Galina Tschistjakowa ( 7,52 ) vorbei .[. . . ] Erstmals seit langer Zeit schaffte die Vielspringerin und Aktivistin in Sachen Dopingbekämpfung die 2-m-Marke nicht und scheiterte bei 1,98 m . Der Keniate Yobes Ondieki lief über ( 5.000 Meter )m2 in 13:03,58 Minuten eine neue Jahresweltbestleistung . (ID: 8396x8435); (Calc-Prob:83) q. Heike Hochsprung-Henkel wurde geschlagen , Heike Drechsler mit neuer Weltjahresbestleistung nach Barcelona Lausanne ( dpa ) - 17 Tage vor Eröffnung des Olympiaspektakels in Barcelona ist Heike Drechsler bereits in weltbester Flugform : Mit ( 7,48 Meter )m1 segelte sie beim GrandPrix-Meeting der Leichtathleten am Mittwoch abend in Lausanne nur um ( vier Zentimeter )m2 am Weltrekord von Galina Tschistjakowa ( 7,52 ) vorbei . Der Satz der 27jährigen ist der weiteste Sprung einer Frau in diesem Jahr . (ID: 8396x8402); (Calc-Prob:52) E.1. False positives r. Die Endrunde besteht aus zwei Spielen , die am ( 12. und 19. Juli )m1 ausgetragen werden .[. . . ] TENNIS In der ersten Runde des Federation Cup vom 12. bis ( 19. Juli )m2 in Frankfurt / Main trifft die an Nummer eins gesetzte deutsche Mannschaft um Wimbledon-Siegerin Steffi Graf auf Außenseiter Neuseeland . (ID: 8566x8581); (Calc-Prob:83) s. In Bonn werden die Stimmen immer lauter , die ein Ende der Berliner Olympia-Bewerbung für ( das Jahr 2000 )m1 voraussagen .[. . . ] WECHSEL Ivo Knoflicek hat in Österreich endlich einen neuen Verein gefunden : Der frühere CSFRNationalspieler wurde vom FC St. Pauli Hamburg zunächst für ( ein Jahr )m2 an Vorwärts Steyr ausgeliehen . (ID: 8506x8574); (Calc-Prob:83) t. 1 Nachdem der Olympiasieger von Seoul , Paul Ereng , sich in der vergangenen Woche nicht für Barcelona hatte qualifizieren können , hat am ( Mittwoch )m1 eine Sportkommission in Nairobi entschieden , daß Ereng doch mit darf .[. . . ] Flamengo Rio de Janeiro hat sich als erstes Team für die Endrunde um die brasilianische Fußball-Meisterschaft qualifiziert . Sie besiegte am ( Mittwoch )m2 die Burschen aus Santos mit 3:1 . (ID: 8533x8553); (Calc-Prob:83) u. Wer an die jüngsten Abstiegsdramen in ( der Ersten Liga )m1 denkt , der weiß , was für ein süßes Versprechen das Gedränge um Platz 17 birgt .[. . . ] Das Beste am ganzen Programm aber bleibt seine epische Anlage : Bis über finale Siege und Niederlagen entschieden ist , werden 49.680 Minuten gespielt sein . Welcher Erstligist kann da noch singen “( Niemals Zweite Liga )m2 “? (ID: 8710x8724); (Calc-Prob:83) v. In dieser Hinsicht ist ( die Liga )m1 doch wieder zweigeteilt , diesmal allerdings im besten , weil dramaturgisch spannendsten Sinne : Wer nicht um den Aufstieg spielt , der spielt gegen den Abstieg .[. . . ] Sieben der 24 Mannschaften werden nach dieser Saison ihre regionalen Oberligen verstärken . Wer an die jüngsten Abstiegsdramen in ( der Ersten Liga )m2 denkt , der weiß , was für ein süßes Versprechen das Gedränge um Platz 17 birgt . (ID: 8698x8710); (Calc-Prob:83) E.1.8. Problems with the alias feature (64) a. 1 This Ivo Knoflicek hat in Österreich endlich einen neuen Verein gefunden : Der frühere CSFRNationalspieler wurde vom ( FC St. Pauli Hamburg )m1 zunächst für ein Jahr an Vorwärts Steyr ausgeliehen .[. . . ] TENNIS In der ersten Runde ( des Federation Cup vom 12. bis 19. Juli in Frankfurt / Main )m2 trifft die an Nummer eins gesetzte deutsche Mannschaft um Wimbledon-Siegerin Steffi Graf auf Außenseiter Neuseeland . (ID: 8573x8583); (Calc-Prob:50) example does not show a clear disreference as both events could have happened on the same Wednesday (“Mittwoch”). 169 Appendix E. All link errors from Chapter 4 b. c. d. e. f. WECHSEL Ivo Knoflicek hat in Österreich endlich einen neuen Verein gefunden : ( Der frühere CSFRNationalspieler )m1 wurde vom ( FC St. Pauli Hamburg )m2 zunächst für ein Jahr an Vorwärts Steyr ausgeliehen . Leihgebühr : 100.000 Mark . (ID: 8572x8573); (Calc-Prob:50) Vorweg : ( Sieben spannende Seiten Widmungen an die Menschen , die Torsten Schmidt als Junkies knipste )m1 :[. . . ] S. P. Ausstellung : bis 6. 8. im Schlachthof ; Buch : “Ich bin einmalig , und daß ich lebe , das freut mich . Menschen in der Drogenszene “, Rasch und Röhring Verlag , ( 29.80 DM )m2 (ID: 2927x2954); (Calc-Prob:50) Naheliegende Frage : Warum wehrt sie sich dann so vehement gegen ( ein generelles Verbot für Zigarettenreklame )m1 ?[. . . ] Warum wird überhaupt noch geworben , wenn es so nutzlos ist ? Acht von zwölf Gesundheitsministern ( der EG )m2 haben sich im November des Vorjahres für ein generelles Verbot von Zigarettenreklame ausgesprochen , vier ( die Minister der Bundesrepublik , Großbritanniens , Dänemarks und der Niederlande ) waren dagegen . (ID: 7826x7828); (Calc-Prob:50) So konnte den Käufern ein imaginärer Wert des Unternehmes von ( drei Millionen Mark )m1 vorgespiegelt werden , der sich bei näherem Hinsehen auf im Grunde unverkäufliche Altbestände bezog .[. . . ] Noch kurioser sind die Vorgänge um den Theaterverlag “Henschel-Schauspiel “, neben dem Leipziger “Reclam-Verlag “, der mittlerweile an den Alteigentümer , das heißt die Reclam GmbH & Co KG im schwäbischen Ditzingen , zurückgegeben wurde , war dies das einzige ostdeutsche Editionshaus , in dem Mitarbeiter und Autoren gleich Anfang 1990 die Geschicke selbst in die Hand nahmen , um das Überleben des kritisch engagierten Projekts zu sichern . Nach dem Vorbild des Verlags der Autoren gründeten 68 Bühnenautoren , Übersetzer und Mitarbeiter eine GmbH , in der der Henschel-Buchverlag als ehemaliges Mutterhaus lediglich mit einer Sacheinlage vertreten war , nämlich den alten Textbüchern im damaligen Wert von ( 60.000 DM )m2 . (ID: 9919x10017); (Calc-Prob:50) Die Räumung , die am Mittwoch abend gegen 19 Uhr erfolgte , wurde von AL , ( PDS )m1 und dem BUND verurteilt .[. . . ] Eine Öffnung der Oberbaumbrücke für den Autoverkehr werde die Klimabelastung verschärfen - insbesondere den Sommersmog . Nach der Räumung am Mittwoch war es zu ( zwei Demonstrationen gekommen , die sich gegen die bisherige Planung des Senats richteten , die Oberbaumbrücke für den Autoverkehr zu öffnen )m2 - statt Individualverkehr soll die Tram fahren . (ID: 11404x11464); (Calc-Prob:50) E.1.9. First markable begins with “kein“ (65) 170 a. » ( Keine Gewalt )m1 « war ein Slogan der großen Novemberdemonstrationen .[. . . ] Im puren , physischen Sinn war es auch eine weitgehend » gewaltlose « Revolution . ( Die andere Gewalt , die Gewalt der Geschichte , des Alltags , unserer Gefühle und Vorurteile )m2 , entlud sich dagegen , und sie entlädt sich immer noch . (ID: 1881x1892); (Calc-Prob:83) E.1. False positives b. c. d. e. f. g. h. Der schmerzhafte Entschluß zur Flucht , der Vertrauensbruch mit denen , die blieben , hat plötzlich ( keinen Sinn mehr )m1 .[. . . ] » Keine Gewalt « war ein Slogan der großen Novemberdemonstrationen . Im ( puren , physischen Sinn )m2 war es auch eine weitgehend » gewaltlose « Revolution . (ID: 1850x1884); (Calc-Prob:83) “Druckräume sind ( keine Lösung )m1 “[. . . ] Dadurch werden dann Schulhöfe und Spielplätze wieder frei , und vielleicht findet sich dann auch der Wohnraum . ( Eine wirklich billige und bequeme Lösung für den Senat )m2 . (ID: 2800x2838); (Calc-Prob:83) Zum Vergleich werden 30 Kinder aus der Region Plön ( Schleswig-Holstein ) untersucht , in der es ( keine Atomanlagen )m1 gibt .[. . . ] Parallel dazu wird nach Ablagerungen langlebiger Radionukleide in Mensch und Natur gesucht . Bisher sind aufgrund der amtlichen Messungen keine besonderen radioaktiven Belastungen aus ( den Atomanlagen in Krümmel und Geesthacht )m2 bekannt . (ID: 3493x3508); (Calc-Prob:83) Weil die Sachbearbeiterin Lindner - nach Rücksprache mit der Geschäftsleitung - dem homosexuellen Buchhändler und Konzertveranstalter Hasso Müller-Kittnau ( 39 ) und seinem Lebensgefährten in Saarbrücken ( keine Wohnung )m1 vermieten wollte , geriet die noble Weltfirma unter Beschuß : Der Schwulenverband in Deutschland ( SVD ) forderte am Mittwoch Lesben und Schwule auf , bei Vertragsabschlüssen mit Versicherungen den “Diskriminierungsfall “bei der Allianz zu berücksichtigen . Auf Nachfrage erklärte gestern die Sachbearbeiterin Lindner , daß Müller-Kittnau und seinem Freund ( die Wohnung in Saarbrücken )m2 nicht verweigert worden sei , weil es sich um ein homosexuelles Paar gehandelt habe - “obgleich in dieser Wohnanlage sehr konservative Mieter wohnen “. Vielmehr habe Müller-Kittnau von der Anmietung der umgehend neu zu belegenden Wohnung Abstand genommen , weil er erst zu einem späteren Zeitpunkt habe einziehen wollen . (ID: 5614x5633); (Calc-Prob:83) ( Keine Allianz fürs Leben )m1 [. . . ] Versicherungskonzern diskriminiert Schwule / Homosexuelles Paar als Mieter abgelehnt Frankfurt / Main ( taz ) - Bei ( der Allianz Grundstücks AG in Karlsruhe )m2 ist die Belegschaft verunsichert . (ID: 5594x5602); (Calc-Prob:83) ( Keine Zahlen über rassistische Straftaten )m1 [. . . ] Von Bernd Siegler Nürnberg ( taz ) - Während das Bundeskriminalamt ( BKA ) angesichts der Vielzahl von rassistisch motivierten Straftaten betont , daß es “keinen Grund zur Entwarnung “gebe , weigert sich die Bundesregierung hartnäckig , ( monatliche Zahlen über solche Straftaten und daraus resultierende Ermittlungsverfahren )m2 zu veröffentlichen . (ID: 6526x6544); (Calc-Prob:83) Der Text am Rand erklärt dem Betrachter , daß Prostituierte ( kein Recht auf Klage )m1 haben , wenn ein Freier sie um die Bezahlung für erbrachte Dienstleistungen prellt .[. . . ] Sie halten sich an der Hand oder im Arm und könnten genausogut Hans und Sabine von gegenüber sein . Dazu gibt es knappe Sprüche , die aus dem Mund eines Werbetexters stammen könnten : » Er hat ( ein Recht auf seine Lust )m2 und Sie Lust auf ihr Recht « oder : » Er ist potent 171 Appendix E. All link errors from Chapter 4 und Sie kompetent « . (ID: 13240x13265); (Calc-Prob:83) E.1.10. Disagreement in gender and number (66) a. b. c. d. e. 172 Als habe ( er )m1 von Anfang an klarstellen wollen , daß hier lediglich eine “Schauperposse “zu sehen ist , läßt Armin Holz den Figuren des Stücks gar keine Gelegenheit , sich erst einmal sinnhaft vorzustellen .[. . . ] Bernd Stempel als Pedro Gailo muß in seinem viel zu kleinen Anzug und einem zweifarbigen Rauschebart über die Bühne stelzen und sich von der überzeichneten Zuhälterfigur Septimo ( Ulrich Haß ) verhöhnen lassen , seine Frau Mari-Gaila und Schwester Marica agieren gleich zu Beginn so exaltiert , daß einem der später so wüst entbrennende Streit über den lukrativen Karren nicht mehr bizarr , sondern nur noch folgerichtig erscheinen kann . So kann die Groteske nicht zu ( ihrer )m2 Wirkung kommen , kann nichts von der Form ins Formlose umschlagen , vom Maßvollen ins Sinnlose kippen . (ID: 156x179); (Calc-Prob:53) Und so mischten sich beim nicht gerade enthusiastischen Schlußapplaus , als Armin Holz zu seinen Schauspielern auf die Bühne kam , unter die wenigen Bravo-Rufe auch lautstarke Unmutsbekundungen . Man hatte sich am Ende einer erfolgreichen Spielzeit von dieser letzten Premiere im ( Deutschen Theater )m1 , mit ( der )m2 übrigens Ignaz Kirchner ursprünglich sein Debüt als neues Ensemblemitglied hätte geben sollen , mehr versprochen . Kirchner wird nun erst im Herbst mit einer Langhoff-Inszenierung seine Arbeit aufnehmen . (ID: 242x243); (Calc-Prob:51) Der zweieinhalbstündige Theaterabend in den Kammerspielen des Deutschen Theaters blieb dann auch entsprechend unentschieden . ( Viele Gedanken )m1 , Blitzlichter einer Idee wurden angerissen , aber nicht ausgeführt , wie zum Beispiel die Inzest-Anspielung zwischen Pedro und ( seiner )m2 Tochter Simonina ( verläßlich gut : Ulrike Krumbiegel ) . Ramon del Valle-Inclans Katholizismuskritik ist bei Armin Holz so stark in den Hintergrund gedrängt , daß das Kreuz , als es auf die Bühne geschleppt wird , kaum mehr als ein weiterer Gag sein kann . (ID: 188x193); (Calc-Prob:51) Und wirklich : viel kann Gott in dem kleinen galizischen Dorf trotz all der katholischen Frömmigkeit seiner Bewohner nicht verloren haben . Als die Witwe Juana la Reina plötzlich auf offener Straße stirbt und ihrem irren Sohn Laureano ein einträgliches Geschäft hinterläßt , möchte sich ( so mancher )m1 in ( ihrer )m2 Verwandtschaft gerne vor den Karren spannen , in dem die Verblichene ihren irren Laureano von Jahrmarkt zu Jahrmarkt geschoben hatte . Die Totenklage der trauernden Familie mischt sich dann auch schnell mit munteren Jubelgesängen . (ID: 94x95); (Calc-Prob:51) Da die Stadt keinen habe , dürfe sie das Urteil auch nicht exekutieren . Helena , ( deren Fall )m1 inzwischen von einigen Zeitungen aufgegriffen wurde , kann bis heute nicht in ( ihre )m2 Wohnung . Sie hat dabei noch Glück gehabt . (ID: 423x426); (Calc-Prob:51) E.1. False positives f. g. h. i. j. k. l. Der Exmissionstitel ist allerdings vergilbt , weil die städtischen Behörden , eigentlich zuständig , Helena zu ( ihrem )m1 Recht zu verhelfen , es ablehnten einzugreifen . Begründung : Nach polnischem Mietrecht dürfe man einen Mieter nur aus der Wohnung entfernen , wenn man in der Lage sei , ( ihm )m2 Ersatzraum zur Verfügung zu stellen . Da die Stadt keinen habe , dürfe sie das Urteil auch nicht exekutieren . (ID: 406x416); (Calc-Prob:50) Der 71jährige Ober-Olympier kommt aber auch wirklich nicht gut weg : Nicht nur , daß er die ihm anvertrauten Coubertinschen Ideale verraten und auf ihren Trümmern ein Wirtschaftsunternehmen aufgebaut hat , kreiden ihm ( die Briten )m1 an . Auch ( seine )m2 dunkelbraune Vergangenheit in Francos Unrechtsstaat paßt ihnen nicht . Als ob das irgend etwas mit Sport zu tun hätte , daß Hans Anton ein paar Jährchen in faschistischer Uniform rumgekaspert ist und seine Briefe mit “es grüßt mit erhobenem Arm “unterschrieben hat . (ID: 564x565); (Calc-Prob:51) Der 71jährige Ober-Olympier kommt aber auch wirklich nicht gut weg : Nicht nur , daß er die ihm anvertrauten Coubertinschen Ideale verraten und auf ihren Trümmern ein Wirtschaftsunternehmen aufgebaut hat , kreiden ihm die Briten an . Auch ( seine )m1 dunkelbraune Vergangenheit in Francos Unrechtsstaat paßt ( ihnen )m2 nicht . Als ob das irgend etwas mit Sport zu tun hätte , daß Hans Anton ein paar Jährchen in faschistischer Uniform rumgekaspert ist und seine Briefe mit “es grüßt mit erhobenem Arm “unterschrieben hat . (ID: 565x569); (Calc-Prob:50) So leichtfertig wie er ( das )m1 dahersagt , so freudig wurde es vom Aufsichtsrat der Marketing GmbH aufgenommen und dem Geschäftsführer ein Strick daraus gedreht . Ein Kopf mußte rollen , damit sich ( die anderen )m2 aus der Schlinge ziehen konnten . Das kathartische Schauspiel wurde inszeniert , um , wie es so schön heißt , weiteren Schaden von der Bewerbung abzuwenden . (ID: 643x651); (Calc-Prob:52) Denn ( Vermieter , die eine Wohnung freibekommen wollen )m1 , lassen sich einiges einfallen , um die Mieter herauszuekeln , berichtet Frau Fiedler .[. . . ] Denn das ist häufig der Fall . Zwischen 60 und 70 Prozent ( der Vermieter )m2 , schätzt Hanka Fiedler , wollen nur den Mieter herausbekommen , um die Wohnung , die dann im Preis steigt , besser verkaufen zu können . (ID: 1023x1061); (Calc-Prob:83) Die große Verliererin des WM-Jahres aus Jamaika setzte sich mit 22,18 Sekunden gegen ( ihre )m1 Landsfrau Juliet Cuthbert durch .[. . . ] Dagegen mußte sich im 100-Meter-Sprint der Männer überraschend der Olympia-Zweite Linford Christie ( Großbritannien ) vor rund 12.000 Zuschauern erstmals in dieser Saison geschlagen geben . Mit 10,06 Sekunden gab ( ihm )m2 der Nigerianer Olapade Adeniken um 1/100-Sekunde das Nachsehen . (ID: 8453x8462); (Calc-Prob:50) In Bonn werden die Stimmen immer lauter , die ein Ende der Berliner Olympia-Bewerbung für das Jahr 2000 voraussagen . “( Berlin 2000 )m1 ist für ( mich )m2 eigentlich schon tot “, meinte der sportpolitische Sprecher der SPD-Bundestagsfraktion , Wilhelm Schmidt , nach der peinlichen Affäre um die Daten-Sammlung über Mitglieder des IOC . Schmidt behauptet , daß ihm konkrete Informationen über den bevorstehenden Rücktritt 173 Appendix E. All link errors from Chapter 4 m. von Nikolaus Fuchs vorliegen würden . (ID: 8509x8510); (Calc-Prob:51) Mit all diesen obskuren Äußerungen fällt Frau Rogendorf nicht nur den Lesben / Frauen auf den Wecker , die ( ihr )m1 Leben selbstbestimmt und -bewußt leben , sondern auch all den Frauen / Lesben , die sich für die Abschaffung des § 218 einsetzen . Wenn “eine Gesellschaft sich nur über das Kind entwickeln kann “, wie es Frau Roggendorf behauptet , ist zu fragen , wie die Gesellschaft mit den ungewollten Kindern klarkommt , für die ( wir )m2 Frau Roggendorf hiermit dankbar die Adoptionsurkunde ausstellen wolllen . Amen . (ID: 2775x2791); (Calc-Prob:50) E.2. False negatives E.2.1. Reflexive pronouns with non-subjects or considerable sentence distance (67) a. b. c. d. e. 174 Der 71jährige Ober-Olympier kommt aber auch wirklich nicht gut weg : Nicht nur , daß er die ( ihm )m1 anvertrauten Coubertinschen Ideale verraten und auf ihren Trümmern ein Wirtschaftsunternehmen aufgebaut hat , kreiden ihm die Briten an .[. . . ] Der Spanier-Hansl kümmert sich wenigstens . Um ( sich )m2 und auch um seine Kumpels . (ID: 558x595); (Calc-Prob:0) Immer mehr Menschen “verschwinden “oder werden von “offenkundig staatlich geduldeten Todesschwadronen “ermordet , sagte Deile . Er appellierte an ( Kinkel )m1 , ( sich )m2 für die politischen Häftlinge einzusetzen . SEITE 8 (ID: 5030x5031); (Calc-Prob:7) Auch beim Nato-Verbündeten Türkei , in die ( Außenminister Klaus Kinkel )m1 in der kommenden Woche reisen wird , werde unvermindert gefoltert .[. . . ] Immer mehr Menschen “verschwinden “oder werden von “offenkundig staatlich geduldeten Todesschwadronen “ermordet , sagte Deile . Er appellierte an Kinkel , ( sich )m2 für die politischen Häftlinge einzusetzen . (ID: 5023x5031); (Calc-Prob:0) Zwar hatte ( er )m1 zu den 30 bis 40 Jugendlichen gehört , die in der Novembernacht “Neger aufklatschen “wollten . Nur erinnern konnte er ( sich )m2 gestern nicht . Kamp behandelte ihn wie schon andere Zeugen während der Verhandlung : Er schickte ihn unter Polizeibewachung zum Strafsitzen und Nachdenken in ein kleines Kabuff . (ID: 5138x5144); (Calc-Prob:0) ( Hawkins )m1 bietet solides Rhythm & Blues-Entertainment und hat vor allem nicht den Kontakt zur Realität verloren . Daß der ganze Voodoo-Zauber heutzutage kein kleines Kind mehr zum Schwitzen bringt , hat er gut erkannt , stellt sich deshalb konsequent neben ( sich selbst )m2 und gibt dem teilweise mitgealterten Publikum eine gute Zeit . Wenn er in einem quietschbunten Plüschanzug auf die Bühne kommt und den mit einem Totenschädel verzierten Stock auf die Bretter knallt , hat er stets ein Augenzwinkern zur Hand , das gleichsam Screaming Jay Hawkins durch Screaming Jay Hawkins selbst kommentiert . E.2. False negatives (ID: 12673x12682); (Calc-Prob:0) f. 20 Millionen für ( drei ältere rockmusizierende Herrschaften )m1 [. . . ] Nichts hätte konsequenter sein können . Die drei rockmusizierenden älteren Herrschaften von Genesis verbanden ( sich )m2 mit den autofabrizierenden älteren Herren von Volkswagen , um fürderhin für gegenseitige Belebung des Geschäfts zu sorgen . (ID: 12555x12610); (Calc-Prob:0) g. Was soll man von einer Band halten , die sich nach einem italienischen Glasbläsermeister benennt , aber ( sich )m1 konsequent der Vermatschung aller verfügbaren jamaikanischen Musiken widmet ?[. . . ] Kann man nur großartig finden , wenn sie Messer Banzani heißen . Die sechs Herren aus Leipzig haben den Eisernen Vorhang anscheinend recht durchlässig erlebt , sonst könnten sie wohl kaum heute bereits so versiert Ska , Reggae , Dub und Ragga adaptieren , ohne ( sich )m2 lächerlich zu machen . (ID: 12562x12576); (Calc-Prob:0) E.2.2. Semantic Relations between the markables (68) a. Besonders Margit Bendokat als La Tatula , Bärbel Bolle als die verstorbene Juana la Reina und Corinna Harfouch als Mari-Gaila verliehen ( der Inszenierung )m1 wichtige Glanzpunkte . Diese wenigen atmosphärischen Momente lassen ein dichtes , interessantes Stück erahnen , das aber in ( dieser Fassung )m2 weit unter dem Möglichen inszeniert scheint . Gegen Ende zerfaselte der Spannungsbogen immer deutlicher , die Schauspieler agierten zusehends einfallsloser , weil dem Regisseur anscheinend die Einfälle ausgegangen waren . (ID: 218x223); (Calc-Prob:10) b. Als ein Bekannter des Hundehalters versuchte , die Wohnung zu räumen , wurde er gebissen und flüchtete ins Wohnzimmer zur Gattin ( des Besitzers )m1 .[. . . ] Erst die Feuerwehr konnte beide durch das Fenster befreien . ( Herrchen )m2 wollte den Hundefänger holen . (ID: 340x344); (Calc-Prob:4) c. Als ein Bekannter ( des Hundehalters )m1 versuchte , die Wohnung zu räumen , wurde er gebissen und flüchtete ins Wohnzimmer zur Gattin des Besitzers .[. . . ] Erst die Feuerwehr konnte beide durch das Fenster befreien . ( Herrchen )m2 wollte den Hundefänger holen . (ID: 335x344); (Calc-Prob:4) d. ( Der 24jährige Besitzer )m1 hatte dem Tier am Vortag sein zukünftiges Heim gezeigt .[. . . ] Erst die Feuerwehr konnte beide durch das Fenster befreien . ( Herrchen )m2 wollte den Hundefänger holen . (ID: 325x344); (Calc-Prob:7) e. Das gefiel dem Hund so gut , daß er unmittelbar hinter der Tür Stellung bezog und niemanden mehr durchließ . Als ein Bekannter ( des Hundehalters )m1 versuchte , die Wohnung zu räumen , wurde er gebissen und flüchtete ins Wohnzimmer zur Gattin ( des Besitzers )m2 . Erst die Feuerwehr konnte beide durch das Fenster befreien . (ID: 335x340); (Calc-Prob:10) f. Der 24jährige Besitzer hatte dem Tier am Vortag ( sein zukünftiges Heim )m1 gezeigt .[. . . ] 175 Appendix E. All link errors from Chapter 4 g. h. i. j. k. l. 176 Das gefiel dem Hund so gut , daß er unmittelbar hinter der Tür Stellung bezog und niemanden mehr durchließ . Als ein Bekannter des Hundehalters versuchte , ( die Wohnung )m2 zu räumen , wurde er gebissen und flüchtete ins Wohnzimmer zur Gattin des Besitzers . (ID: 329x337); (Calc-Prob:5) Edgar ist neu in ( der Stadt )m1 , aber schon jetzt Stammgast in rund 200 Kneipen und Cafes von Altona bis Pöseldorf .[. . . ] Doch Bromeck hat Größeres im Sinn , die Vorinvestitionen von “weit über 100000 Mark “sollen sich bald lohnen : Edgar auf der Karte ist Platzhalter für potente Kunden , die derzeit eifrig akquiriert werden . Denn wie in Kopenhagen etwa der Bier-Riese Tuborg seine Image-Werbung längst vom dortigen Edgar-auf-der-Karte-Vorbild Gocard drucken und vertreiben läßt , sollen auch in ( Hamburgs )m2 Kneipen demnächst Werbepostkarten von hiesigen Unternehmen zur Gratis-Mitnahme einladen . (ID: 4822x4893); (Calc-Prob:3) Edgar ist neu in ( der Stadt )m1 , aber schon jetzt Stammgast in rund 200 Kneipen und Cafes von Altona bis Pöseldorf .[. . . ] Es meldet sich eine Werbeagentur “Bromeck “. “Die Idee mit den Gratispostkarten stammt aus Kopenhagen “, erklärt der bereits werbegewiefte studierte Jurist Christian Meckenstock , der mit seiner Kollegin Nana Bromberg eigens eine Agentur gründete , um die Kampagne Edgar auf der Karte von einem Büro im PPS-Bunker aus in ( Hamburg )m2 und auf Sylt zu starten . (ID: 4822x4853); (Calc-Prob:3) Daraufhin weigerte sich ( Daimler-Benz )m1 , ihn nach Abschluß seiner Ausbildung als Schlosser zu übernehmen - der Artikel sei nämlich ein “Bekenntnis zur Gewalt “. Es sei zu befürchten , daß der junge Mann in bestimmten Situationen auch im Betrieb Gewalt befürworten werde , argumentierte ( das Unternehmen )m2 . Das Bundesarbeitsgericht teilte den Standpunkt und wies die Klage auf Einstellung ab . (ID: 6010x6022); (Calc-Prob:7) So konnte ( den Käufern )m1 ein imaginärer Wert des Unternehmes von drei Millionen Mark vorgespiegelt werden , der sich bei näherem Hinsehen auf im Grunde unverkäufliche Altbestände bezog .[. . . ] Schließlich brauchten die bayerischen Jungunternehmer auch nur für zehn Arbeitsplätze Beschäftigungsgarantien und Finanzierungszusagen für ganze 18 Monate abzugeben ( bei trickreich vollzogenem rückwirkendem Verkauf zum 1. Januar 1992 sogar nur für zwölf Monate ) , während man von der Belegschaft , die den Verlag übernehmen wollte , Finanzierungsgarantien für vier bis fünf Jahre verlangt hatte . Aber selbst mit diesen 18 Monaten scheinen ( die neuen Eigentümer )m2 überfordert zu sein , denn ihr heimatliches Kalender-Unternehmen wirft längst nicht soviel ab , wie Volk und Welt momentan an jährlichen Defiziten einfährt . (ID: 9917x9950); (Calc-Prob:9) Frieda Mermet heißt seine Angebetete , und eine einfache Wäscherin in einem Schweizer Bergkaff ist sie . Beim Walser-Ensemble , das die Briefe an ( die » Liebe Frau Mermet )m1 « auf die Bühne bringt , hockt ( die Holde )m2 tatsächlich im Dichterolymp . Ein goldener Bilderrahmen , ganz Gelsenkirchener Barock , schwebt über der Szene . (ID: 13029x13033); (Calc-Prob:9) In ( der Inszenierung vn Armin Holz in den Kammerspielen des Deutschen Theaters )m1 scheint dieser bittere Kern hinter viel Regie-Schnickschnack wieder zu verschwinden .[. . . ] Besonders Margit Bendokat als La Tatula , Bärbel Bolle als die verstorbene Juana la Reina E.2. False negatives und Corinna Harfouch als Mari-Gaila verliehen der Inszenierung wichtige Glanzpunkte . Diese wenigen atmosphärischen Momente lassen ein dichtes , interessantes Stück erahnen , das aber in ( dieser Fassung )m2 weit unter dem Möglichen inszeniert scheint . (ID: 153x223); (Calc-Prob:10) m. So kann ( die Groteske )m1 nicht zu ihrer Wirkung kommen , kann nichts von der Form ins Formlose umschlagen , vom Maßvollen ins Sinnlose kippen .[. . . ] Ramon del Valle-Inclans Katholizismuskritik ist bei Armin Holz so stark in den Hintergrund gedrängt , daß das Kreuz , als es auf die Bühne geschleppt wird , kaum mehr als ein weiterer Gag sein kann . Das Bühnenbild von Peter Schubert erzielt mit großen Stahlkäfigen zwar Wirkung , die lachsfarbene Wandbespannung erscheint aber für die Atmosphäre ( des Stückes )m2 fast wieder zu schick . (ID: 178x208); (Calc-Prob:4) n. Der Bürgersmann Pedro beweint zudem noch seine eheliche Ehre , die ( seine Angetraute )m1 gerade mit dem sittenlosen Gaukler Septimo verspielt .[. . . ] Als habe er von Anfang an klarstellen wollen , daß hier lediglich eine “Schauperposse “zu sehen ist , läßt Armin Holz den Figuren des Stücks gar keine Gelegenheit , sich erst einmal sinnhaft vorzustellen . Bernd Stempel als Pedro Gailo muß in seinem viel zu kleinen Anzug und einem zweifarbigen Rauschebart über die Bühne stelzen und sich von der überzeichneten Zuhälterfigur Septimo ( Ulrich Haß ) verhöhnen lassen , ( seine Frau Mari-Gaila )m2 und Schwester Marica agieren gleich zu Beginn so exaltiert , daß einem der später so wüst entbrennende Streit über den lukrativen Karren nicht mehr bizarr , sondern nur noch folgerichtig erscheinen kann . (ID: 135x173); (Calc-Prob:17) o. Der 24jährige Besitzer hatte ( dem Tier )m1 am Vortag sein zukünftiges Heim gezeigt . Das gefiel ( dem Hund )m2 so gut , daß er unmittelbar hinter der Tür Stellung bezog und niemanden mehr durchließ . Als ein Bekannter des Hundehalters versuchte , die Wohnung zu räumen , wurde er gebissen und flüchtete ins Wohnzimmer zur Gattin des Besitzers . (ID: 326x331); (Calc-Prob:5) p. Hamburg ( ap ) - ( Ein zwei Jahre alter Schäferhund namens “Prinz )m1 “hat im Hamburger Stadtteil Altona eine Wohnung besetzt . Der 24jährige Besitzer hatte ( dem Tier )m2 am Vortag sein zukünftiges Heim gezeigt . Das gefiel dem Hund so gut , daß er unmittelbar hinter der Tür Stellung bezog und niemanden mehr durchließ . (ID: 322x326); (Calc-Prob:3) q. Aufgrund des gleichen Paragraphen gibt es in ( Warschau )m1 inzwischen Tausende kommunaler Sozialmieter , die ihre Zahlungen eingestellt haben - Kündigung droht ihnen deshalb nicht . ( Die Stadt )m2 hat inzwischen sogar schon private Schuldenjäger beauftragt , die Mieten einzutreiben . Jemanden vor die Tür setzen dürfen die allerdings auch nicht . (ID: 511x519); (Calc-Prob:5) r. Das anfängliche Entsetzen , das viele beim Lesen ( der düsteren Texte von Garcia Marquez )m1 empfinden , haben die Schauspieler und Schauspielerinnen rasch überwinden .[. . . ] Natürlich gebe es in den Texten reichlich Todesbilder , meint Cristina Tetzner , eine der SchauspielerInnen , “aber sie sind auch gespickt mit Bildern der Sinnlichkeit “. Und ihr Szenen-Partner Henner Schneider entdeckte in ( den Geschichten )m2 trotz der 177 Appendix E. All link errors from Chapter 4 ihm sehr fremden Denkweise sogar humorvolle Stellen . (ID: 2174x2190); (Calc-Prob:0) E.2.3. Both markables contain a common, possibly appositive proper name (69) a. b. c. d. e. f. 178 ( Ramon Valle-Inclan )m1 [. . . ] Erst in den 70er Jahren entstanden Übersetzungen und Inszenierungen , die jedoch für die grell-grausigen aber poetischen Stücke keine überzeugenden Lösungen fanden . Da dies auch Armin Holz nicht gelungen zu sein scheint ( siehe nebenstehende Rezension ) , bleibt ( Valle-Inclan , der Exzentriker der Moderne )m2 , auch weiterhin ein Geheimtip . (ID: 1x60); (Calc-Prob:13) ( Ramon Valle-Inclan )m1 ( Der Spanier Ramon Maria Valle-Inclan ( 1866 - )m2 1939 ) war schon äußerlich ein Bürgerschreck : langer Bart , schwarze Kleidung und ein amputierter Arm . Nach eigenen Angaben entstammte er einer alten Adelsfamilie . (ID: 1x2); (Calc-Prob:27) Bernd Stempel als Pedro Gailo muß in seinem viel zu kleinen Anzug und einem zweifarbigen Rauschebart über die Bühne stelzen und sich von ( der überzeichneten Zuhälterfigur Septimo ( Ulrich )m1 Haß ) verhöhnen lassen , seine Frau Mari-Gaila und Schwester Marica agieren gleich zu Beginn so exaltiert , daß einem der später so wüst entbrennende Streit über den lukrativen Karren nicht mehr bizarr , sondern nur noch folgerichtig erscheinen kann .[. . . ] Musik : Irmin Schmidt . Mit Corinna Harfouch , Bernd Stempel , ( Ulrich Haß )m2 , Margit Bendokat , Elsa GrubeDeister , Ulrike Krumbiegel , Horst Lebinsky , Markus Gertken u. a. (ID: 171x267); (Calc-Prob:0) Man hatte sich am Ende einer erfolgreichen Spielzeit von dieser letzten Premiere im Deutschen Theater , mit der übrigens ( Ignaz Kirchner )m1 ursprünglich sein Debüt als neues Ensemblemitglied hätte geben sollen , mehr versprochen . ( Kirchner )m2 wird nun erst im Herbst mit einer Langhoff-Inszenierung seine Arbeit aufnehmen . Er wird wissen , warum . (ID: 244x248); (Calc-Prob:17) Es meldet sich ( eine Werbeagentur “Bromeck )m1 “.[. . . ] Der Wahlhamburger Kevin Bauer entwarf das Edgar-Logo : Der Mann mit der Pfeife und dem Balken über den Augen . Rund 200000 ihrer , im weitesten Sinne , Kunstpostkarten hat ( Bromeck )m2 in den ersten drei Wochen der Aktion unters Volk gebracht . (ID: 4840x4867); (Calc-Prob:3) “Die Idee mit den Gratispostkarten stammt aus Kopenhagen “, erklärt ( der bereits werbegewiefte studierte Jurist Christian Meckenstock )m1 , der mit seiner Kollegin Nana Bromberg eigens eine Agentur gründete , um die Kampagne Edgar auf der Karte von einem Büro im PPS-Bunker aus in Hamburg und auf Sylt zu starten . Der Name Edgar , großzügig abgeleitet vom Begriff der “Advertising Card “, ist , alte Werbeweisheit , “als Figur ausbaufähig “, ( so Meckenstock )m2 . Der Wahlhamburger Kevin Bauer entwarf das Edgar-Logo : Der Mann mit der Pfeife und dem Balken über den Augen . (ID: 4844x4858); (Calc-Prob:12) E.2. False negatives g. h. i. j. k. l. Im November 1990 sollen sie ( den Angolaner Amadeu Antonio )m1 so zusammengeschlagen haben , daß der 28jährige starb .[. . . ] Sie wehrte sich beredt gegen die Verteidiger , die ihrer Darstellung nicht glauben wollten . Durch ihre Schilderung sind die Vorgänge , die ( Amadeu Antonio )m2 das Leben kosteten , noch weniger klar als vorher . (ID: 5116x5211); (Calc-Prob:5) Die Fässer sollen zur Konditionierung aus ( dem Atommüllager Gorleben )m1 in eine Lagerhalle nach Duisburg-Wanheim transportiert werden .[. . . ] Keiner will die Mol-Fässer haben . Gleichzeitig versuchte die Essener Gesellschaft für Nuklearservice ( GNS ) die 1.000 Atommüllfässer aus ( Gorleben )m2 zur Verarbeitung nach Duisburg zu bringen , konnte der Gewerbeaufsicht aber nie die erforderlichen Papiere präsentieren . (ID: 6075x6104); (Calc-Prob:6) Die fehlende Nachwirkung Bleis dürfte ihren Grund darin haben , daß ( der Schriftsteller Blei )m1 keinen wiedererkennbaren Stil , keinen durchgängigen Erzählton besitzt : weder den ironisch-epischen eines Thomas Mann , den essayistisch-analytischen eines Robert Musil noch den skeptisch-melancholischen eines Joseph Roth .[. . . ] Ein Autor , der sich , wenn auch in parodierender Absicht , so eng an die stilistischen und thematischen Vorgaben anderer Autoren hält , läuft Gefahr , sich selbst zu verlieren . Eine Gefahr , der ( Blei )m2 ironisch ins Auge sah : “Der Blei “, schreibt Blei im Bestiarium , “ist ein Süßwasserfisch , der sich geschmeidig in allen frischen Wassern tummelt und seinen Namen mhd. bli , ahd. blio = licht , klar von der außerordentlich glatten und dünnen Haut trägt , durch welche die jeweilige Nahrung mit ihrer Farbe deutlich sichtbar wird . (ID: 9103x9159); (Calc-Prob:19) So enthält das Bestiarium einen Exkurs - eine andere Lieblingsform Bleis - zur politischen Romantik , der sich als wortgetreue Wiedergabe eines Kapitels aus der Politischen Romantik von Carl Schmitt herausstellt . Bleis Essay über ( den Poeten , Maler und Giftmörder Thomas Griffith Wainewright ( )m1 1794-1852 ) hingegen ist eine parodierende Paraphrase von Oscar Wildes Lobeshymne auf ( Wainewright )m2 in Pen , Pencil and Poison . Das Oeuvre Bleis ist voll von literarischen Längs- und Querbezügen , die - mit oder ohne Autorennennung - in die eigene literarische Produktion mit eingearbeitet wurden . (ID: 9137x9140); (Calc-Prob:6) Daran ist dann offensichtlich auch der Versuch von Schmidt-Braul gescheitert , ( den Luftfahrtunternehmer Dornier )m1 für die Übernahme zu gewinnen . ( Silvius Dornier )m2 wußte aus seinen zähen Verhandlungen mit Daimler-Benz , denen er einen Großteil seiner Aktien verkauft hatte , daß auch bei der Treuhand etwas rauszuschlagen wäre . Wenn sie schon die teure Immobilie einsackte , sollte sie wenigstens noch den Verlag entschulden und etwas Geld für die Anschubfinanzierung locker machen . (ID: 9868x9870); (Calc-Prob:7) Als mir Anfang des Jahres ( Martin Flug )m1 sein Manuskript Treuhand-Poker - Die Mechanismen des Ausverkaufs auf den Verlagstisch legte , schien mir manches recht überzogen , und ich bat ihn , mir die haarsträubendsten Geschichten mit Dokumenten zu belegen , da ich wenig Lust verspürte , gleich nach Erscheinen verklagt zu werden .[. . . ] Doch in jedem einzelnen Fall konnte er mich von der Sauberkeit seiner Recherche überzeugen , und die Tatsache , daß bis heute - drei Monate nach der Erstauslieferung - keine Einstweiligen Verfügungen bei uns herniedergegangen sind , scheinen ihm zusätzlich recht zu 179 Appendix E. All link errors from Chapter 4 geben . Heute würde ich wahrscheinlich nicht mehr so skeptisch fragen , denn das , was ich in den letzten Wochen in meiner unmittelbaren Umgebung , der Ostberliner Verlagsszene , erlebt habe , stellt ( Flugs )m2 Report noch um einiges in den Schatten . (ID: 9647x9680); (Calc-Prob:9) 180 m. “Das ist die alte Übung , Herr Präsident , auch wir müssen umlernen “, mußte sich Bundeskanzler Helmut Kohl auf seiner Pressekonferenz mit ( dem russischen Präsidenten Boris Jelzin )m1 am Mittwoch in München verteidigen . Er hatte ( seinen neuen “Duzfreund “Boris )m2 nämlich aus Versehen als sowjetischen Präsidenten bezeichnet und wurde daraufhin auch prompt von ihm unterbrochen . WAFFENSCHMUGGEL IM KLEINEN RAHMEN (ID: 10091x10097); (Calc-Prob:8) n. Gurke des Tages : ( Helmut Kohl )m1 “Das ist die alte Übung , Herr Präsident , auch wir müssen umlernen “, mußte sich ( Bundeskanzler Helmut Kohl )m2 auf seiner Pressekonferenz mit dem russischen Präsidenten Boris Jelzin am Mittwoch in München verteidigen . Er hatte seinen neuen “Duzfreund “Boris nämlich aus Versehen als sowjetischen Präsidenten bezeichnet und wurde daraufhin auch prompt von ihm unterbrochen . (ID: 10084x10089); (Calc-Prob:27) o. Da hat er das wildgemusterte Hemd und die Schirmmütze » von irgendwoher , wo es Elefanten gibt « , abgelegt , den Blues Max auf den Bügel gehängt und genießt erst mal als ( Max Lässer )m1 ein kühles Pils .[. . . ] Seit drei Jahren füllen die beiden in der Schweiz als » Blues Max « große Säle , erzählen ihre mal melancholischen , mal ironischen , auch traurigen Geschichten von Onkel Hermann , der einfach gar nie den Blues hat , vom armen Kasimir Benz oder von dem Mann , der seinen Himmel auf Erden nicht mehr finden konnte . » Manchmal weinen die Leut’ , odr ! « weiß ( der Blues Max )m2 . (ID: 12946x12965); (Calc-Prob:13) p. ( Frieda Mermet )m1 heißt seine Angebetete , und eine einfache Wäscherin in einem Schweizer Bergkaff ist sie . Beim Walser-Ensemble , das die Briefe an ( die » Liebe Frau Mermet )m2 « auf die Bühne bringt , hockt die Holde tatsächlich im Dichterolymp . Ein goldener Bilderrahmen , ganz Gelsenkirchener Barock , schwebt über der Szene . (ID: 13022x13029); (Calc-Prob:12) q. ( Westend )m1 Be Thy Name[. . . ] Kaum gehn die Bauarbeiten mal voran im Kulturzentrum Walle , da haben sich seine glücklichen Alten schon einen Namen für das Balg ausgesucht , weil ’s ja “Kulturwerkstatt für ArbeitnehmerInnen “nicht im Ernst heißen kann . Der Trägerverein , also u.a. DGB und Kultursenatorium , haben sich nach langem Ringen und einem Wettbewerb zum ( Namen “Westend )m2 “entschlossen . (ID: 13383x13401); (Calc-Prob:6) r. Wunderworte , 1920 geschrieben und in Deutschland zuvor erst zweimal aufgeführt , ist ein sogenanntes “Esperpento “( zu deutsch : Schauerposse ) ( des Spaniers Ramon del Valle-Inclan )m1 , eine Tragi-Komödie , ein deformiertes Zerrbild der Wirklichkeit .[. . . ] Viele Gedanken , Blitzlichter einer Idee wurden angerissen , aber nicht ausgeführt , wie zum Beispiel die Inzest-Anspielung zwischen Pedro und seiner Tochter Simonina ( verläßlich gut : Ulrike Krumbiegel ) . ( Ramon del Valle-Inclans )m2 Katholizismuskritik ist bei Armin Holz so stark in den Hintergrund gedrängt , daß das Kreuz , als es auf die Bühne geschleppt wird , kaum mehr als E.2. False negatives s. t. u. v. w. ein weiterer Gag sein kann . (ID: 140x196); (Calc-Prob:6) ( Der Bürgersmann Pedro )m1 beweint zudem noch seine eheliche Ehre , die seine Angetraute gerade mit dem sittenlosen Gaukler Septimo verspielt .[. . . ] Als habe er von Anfang an klarstellen wollen , daß hier lediglich eine “Schauperposse “zu sehen ist , läßt Armin Holz den Figuren des Stücks gar keine Gelegenheit , sich erst einmal sinnhaft vorzustellen . Bernd Stempel als ( Pedro Gailo )m2 muß in seinem viel zu kleinen Anzug und einem zweifarbigen Rauschebart über die Bühne stelzen und sich von der überzeichneten Zuhälterfigur Septimo ( Ulrich Haß ) verhöhnen lassen , seine Frau Mari-Gaila und Schwester Marica agieren gleich zu Beginn so exaltiert , daß einem der später so wüst entbrennende Streit über den lukrativen Karren nicht mehr bizarr , sondern nur noch folgerichtig erscheinen kann . (ID: 130x164); (Calc-Prob:12) Und während ( Bruder Pedro )m1 und Schwester Marica noch darüber feilschen , wer den “Karren “nun an welchen Tagen bekommen soll , ist es für Pedros Frau Mari-Gaila schon ausgemacht , daß sie mit Laureano auf und davon gehen wird .[. . . ] Als habe er von Anfang an klarstellen wollen , daß hier lediglich eine “Schauperposse “zu sehen ist , läßt Armin Holz den Figuren des Stücks gar keine Gelegenheit , sich erst einmal sinnhaft vorzustellen . Bernd Stempel als ( Pedro Gailo )m2 muß in seinem viel zu kleinen Anzug und einem zweifarbigen Rauschebart über die Bühne stelzen und sich von der überzeichneten Zuhälterfigur Septimo ( Ulrich Haß ) verhöhnen lassen , seine Frau Mari-Gaila und Schwester Marica agieren gleich zu Beginn so exaltiert , daß einem der später so wüst entbrennende Streit über den lukrativen Karren nicht mehr bizarr , sondern nur noch folgerichtig erscheinen kann . (ID: 113x164); (Calc-Prob:12) Gott guckt uns nicht zu , der hat der Welt längst den Rücken gekehrt “, läßt ( der spanische Dichter Ramon del Valle-Inclan )m1 einen seiner Helden gleich zu Beginn seiner Grotske “Wunderworte “erklären .[. . . ] Der Bürgersmann Pedro beweint zudem noch seine eheliche Ehre , die seine Angetraute gerade mit dem sittenlosen Gaukler Septimo verspielt . Wunderworte , 1920 geschrieben und in Deutschland zuvor erst zweimal aufgeführt , ist ein sogenanntes “Esperpento “( zu deutsch : Schauerposse ) ( des Spaniers Ramon del Valle-Inclan )m2 , eine Tragi-Komödie , ein deformiertes Zerrbild der Wirklichkeit . (ID: 77x140); (Calc-Prob:9) Als die Witwe Juana la Reina plötzlich auf offener Straße stirbt und ( ihrem irren Sohn Laureano )m1 ein einträgliches Geschäft hinterläßt , möchte sich so mancher in ihrer Verwandtschaft gerne vor den Karren spannen , in dem die Verblichene ihren irren Laureano von Jahrmarkt zu Jahrmarkt geschoben hatte .[. . . ] Die Totenklage der trauernden Familie mischt sich dann auch schnell mit munteren Jubelgesängen . Denn soviel ist klar : Wer ( Laureano )m2 versorgt , besitzt im armen Galizien allemal eine einträgliche Geldquelle . (ID: 91x108); (Calc-Prob:5) Und wirklich : viel kann Gott in dem kleinen galizischen Dorf trotz all der katholischen Frömmigkeit seiner Bewohner nicht verloren haben . Als die Witwe Juana la Reina plötzlich auf offener Straße stirbt und ( ihrem irren Sohn Laureano )m1 ein einträgliches Geschäft hinterläßt , möchte sich so mancher in ihrer Verwandtschaft gerne vor den Karren spannen , in dem die Verblichene ( ihren irren Laureano )m2 von 181 Appendix E. All link errors from Chapter 4 x. y. z. 182 Jahrmarkt zu Jahrmarkt geschoben hatte . Die Totenklage der trauernden Familie mischt sich dann auch schnell mit munteren Jubelgesängen . (ID: 91x100); (Calc-Prob:5) Einmal ins ( Ritz )m1 Paris ( dpa ) - Einer amerikanischen Touristin sind im ( berühmten Hotel “Ritz “an der Place Vendome )m2 in Paris Schmuckstücke im Schätzwert von 2,5 Millionen Franc ( knapp 750.000 Mark ) gestohlen worden . Es soll sich um drei Diebe gehandelt haben . (ID: 346x351); (Calc-Prob:12) Als ( Mario Worm , ein Ostberliner Kneipenwirt )m1 , schon 1989 mit der Idee , das alles in einem Film festzuhalten , hausieren ging , riet man ihm ab .[. . . ] In zwei Jahren werde es tausend Filme über diese historische Umbruchszeit geben , und viel professionellere dazu . Aber ( der Laienfilmer Worm )m2 scharte seine Freunde , Kollegen und Stammgäste um sich , drehte mit einfachsten Mitteln trotzdem seinen Film und behielt recht . (ID: 1784x1792); (Calc-Prob:35) ( Der innenpolitische Sprecher der CDU , Ralf Borttscheller )m1 , nannte es gestern positiv , wenn sich Wedemeier von der “starren Haltung “der SPD absetze . ( Borttscheller )m2 . “Das ehrt ihn . “ (ID: 2007x2013); (Calc-Prob:12) Bibliography Ernst Althaus, Nikiforos Karamanis, and Alexander Koller. Computing locally coherent discourses. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, ACL ’04, pages 399–406, Barcelona, Spain, 2004. Association for Computational Linguistics. Giuseppe Attardi, Stefano Dei Rossi, and Maria Simi. Tanl-1: Coreference resolution by parse analysis and similarity clustering. In Proceedings of the 5th International Workshop on Semantic Evaluation, SemEval ’10, pages 108–111, Los Angeles, California, 2010. Association for Computational Linguistics. Amit Bagga and Breck Baldwin. Algorithms for scoring coreference chains. In Proceedings of the First International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference, pages 563–566, 1998. Egon Balas, Fred Glover, and Stanley Zionts. An additive algorithm for solving linear programs with Zero-One variables. Operations Research, 13(4):517–549, 1965. Nikhil Bansal, Avrim Blum, and Shuchi Chawla. Correlation clustering. In Proceedings of the 43rd Annual IEEE Symposium on Foundations of Computer Science, pages 238–247, 2002. E.T. Bell. Exponential numbers. The American Mathematical Monthly, 41(7):411–419, August 1934. Eric Bengtson and Dan Roth. Understanding the value of features for coreference resolution. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 294–303, Honolulu, Hawaii, October 2008. Association for Computational Linguistics. Adam L. Berger, Vincent J. Della Pietra, and Stephen A. Della Pietra. A maximum entropy approach to natural language processing. Comput. Linguist., 22:39–71, March 1996. Gosse Bouma and Anne marie Mineur. Coreference related linguistic phenomena. MIT Press, 2006. Thorsten Brants. Tnt: A statistical part-of-speech tagger. In Proceedings of the sixth conference on Applied natural language processing, ANLC ’00, pages 224–231, Seattle, Washington, 2000. Association for Computational Linguistics. Samuel Broscheit, Massimo Poesio, Simone Paolo Ponzetto, Kepa Joseba Rodriguez, Lorenza Romano, Olga Uryupina, Yannick Versley, and Roberto Zanoli. Bart: A multilingual anaphora resolution system. In Proceedings of the 5th International Workshop on Semantic Evaluation, SemEval ’10, pages 104–107, Los Angeles, California, 2010a. Association for Computational Linguistics. Samuel Broscheit, Simone Paolo Ponzetto, Yannick Versley, and Massimo Poesio. Extending bart to provide a coreference resolution system for german. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). European Language Resources Association (ELRA), 2010b. 183 Bibliography Andre Burkovski, Gunther Heidemann, Hamidreza Kobdani, and Hinrich Schütze. Self organizing maps in nlp: Exploration of coreference feature space. In Proceedings of the 8th International Workshop on Self-Organizing Maps, 2011. Canoo.net. Canoo.net - the pronoun es, Last checked 13. February 2011a. URL http://www.canoo. net/services/OnlineGrammar/InflectionRules/FRegeln-P/Pron-es.html. Canoo.net. Canoo.net - possessive pronouns and determiners, Last checked 20. February 2011b. URL http://www.canoo.net/services/OnlineGrammar/InflectionRules/ FRegeln-P/Pron-Poss3.html. Canoo.net. Canoo.net - reflexive verbs, Last checked 14. November 2011c. URL http://www. canoo.net/services/OnlineGrammar/Wort/Verb/Valenz/Reflexiv.html. Claire Cardie and Kiri Wagstaff. Noun phrase coreference as clustering. In Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC1999), pages 82–89, College Park, Maryland, USA, 1999. Jonathan H Clark and Jose P Gonzalez-brenes. Coreference resolution : Current trends and future directions. Language and Statistics II Literature Review, page 14, 2008. William W. Cohen. Fast effective rule induction. In Proceedings of the Twelfth International Conference on Machine Learning, pages 115–123, Tahoe City, California, USA, 1995. Morgan Kaufmann. Hamish Cunningham, Diana Maynard, Kalina Bontcheva, and Valentin Tablan. Gate: an architecture for development of robust hlt applications. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, pages 168–175, Philadelphia, Pennsylvania, 2002. Association for Computational Linguistics. Walter Daelemans, Jakub Zavrel, Ko van der Sloot, and Antal van den Bosch. TiMBL: Tilburg memory based learner, version 5.0: reference guide [= ILK Research Group technical report 03-10]. Tilburg University, Tilburg, 2003. Arthur P. Dempster. A generalization of bayesian inference. In Ronald R. Yager and Liping Liu, editors, Classic Works of the Dempster-Shafer Theory of Belief Functions, volume 219 of Studies in Fuzziness and Soft Computing, pages 73–104. Springer, 2008. Pradheep Elango. Coreference resolution: A survey. Master’s thesis, University of Wisconsin Madison, 2005. Thomas Finley and Thorsten Joachims. Supervised clustering with support vector machines. In Proceedings of the International Conference on Machine Learning (ICML), pages 217–224, 2005. Yoav Freund and Robert E. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37:277–296, December 1999. Johan Hall and Joakim Nivre. A dependency-driven parser for german dependency and constituency representations. In Proceedings of the Workshop on Parsing German, PaGe ’08, pages 47–54, Columbus, Ohio, 2008. Association for Computational Linguistics. Sven Hartrumpf. Coreference resolution with syntactico-semantic rules and corpus statistics. In Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7, ConLL ’01, pages 17:1–17:8, Toulouse, France, 2001. Association for Computational Linguistics. Erhard W. Hinrichs, Katja Filippova, and Holger Wunsch. What treebanks can do for you: Rule-based and machine-learning approaches to anaphora resolution in German. In Proceedings of the 4th Workshop on Treebanks and Linguistic Theories (TLT 2005), pages 77–88, Barcelona, Spain, 2005a. 184 Bibliography Erhard W. Hinrichs, Sandra Kübler, and Karin Naumann. A unified representation for morphological, syntactic, semantic, and referential annotations. In Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky, CorpusAnno ’05, pages 13–20, Ann Arbor, Michigan, 2005b. Association for Computational Linguistics. Lynette Hirschman and Nancy Chinchor. Muc-7 coreference task definition (version 3.0). In Proceedings of the 7th Message Understanding Conference (MUC-7), 1997. J Hobbs. Resolving pronoun references, pages 339–352. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1986. IMS-Wikipedia. Ims-wikipedia sukre, Last checked 02. March 2011. URL http://wiki.ims. uni-stuttgart.de/Sukre. Thorsten Joachims. Making large-scale support vector machine learning practical, pages 169–184. MIT Press, Cambridge, MA, USA, 1999. Wiltrud Kessler. Analysis and Visualization of Coreference Features. Diploma thesis, University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Germany, April 2010. Manfred Klenner and Étienne Ailloud. Optimization in coreference resolution is not needed: a nearlyoptimal algorithm with intensional constraints. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, EACL ’09, pages 442–450, Athens, Greece, 2009. Association for Computational Linguistics. Hamidreza Kobdani and Hinrich Schütze. Sucre: A modular system for coreference resolution. In Proceedings of the 5th International Workshop on Semantic Evaluation, SemEval ’10, pages 92–95, Los Angeles, California, 2010a. Association for Computational Linguistics. Hamidreza Kobdani and Hinrich Schütze. Feature engineering of coreference resolution in multiple languages. Manuscript, University of Stuttgart, 2010b. Hamidreza Kobdani, Hinrich Schütze, Andre Burkovski, Wiltrud Kessler, and Gunther Heidemann. Relational feature engineering of natural language processing. In Proceedings of the 19th ACM international conference on Information and knowledge management, CIKM ’10, pages 1705–1708, Toronto, ON, Canada, 2010. ACM. T. Kohonen. Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43: 59–69, 1982. Beata Kouchnir. A machine learning approach to German pronoun resolution. In Proceedings of the Annual Meeting on Association for Computational Linguistics 2004 workshop on Student research, ACLstudent ’04, Barcelona, Spain, 2004. Association for Computational Linguistics. Xiaoqiang Luo. On coreference resolution performance metrics. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 25–32, Vancouver, British Columbia, Canada, 2005. Association for Computational Linguistics. Andrew Mccallum and Ben Wellner. Toward conditional models of identity uncertainty with application to proper noun coreference. In Proceedings of the IJCAI Workshop on Information Integration on the Web. MIT Press, 2003. Andrew Mccallum and Ben Wellner. Conditional models of identity uncertainty with application to noun coreference. In Neural Information Processing Systems (NIPS), Vancouver, British Columbia, Canada, 2004. 185 Bibliography Joseph F Mccarthy and Wendy G Lehnert. Using decision trees for coreference resolution. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 1050–1055, Montreal, Quebec, Canada, 1995. A. McEnery, I. Tanaka, and S. Botley. Corpus annotation and reference resolution. In Proceedings of a Workshop on Operational Factors in Practical, Robust Anaphora Resolution for Unrestricted Texts, ANARESOLUTION ’97, pages 67–74, Madrid, Spain, 1997. Association for Computational Linguistics. Christoph Müller and Michael Strube. Multi-level annotation of linguistic data with MMAX2. In Sabine Braun, Kurt Kohn, and Joybrato Mukherjee, editors, Corpus Technology and Language Pedagogy: New Resources, New Tools, New Methods, pages 197–214. Peter Lang, Frankfurt a.M., Germany, 2006. Vincent Ng. Supervised noun phrase coreference research: the first fifteen years. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL ’10, pages 1396–1411, Uppsala, Sweden, 2010. Association for Computational Linguistics. Vincent Ng and Claire Cardie. Improving machine learning approaches to coreference resolution. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 104– 111, Philadelphia, PA, USA, 2002. Cristina Nicolae and Gabriel Nicolae. Bestcut: A graph algorithm for coreference resolution. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, EMNLP ’06, pages 275–283, Sydney, Australia, 2006. Association for Computational Linguistics. J. Ross Quinlan. C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1993. Altaf Rahman and Vincent Ng. Supervised models for coreference resolution. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 968–977, Singapore„ 2009. William M. Rand. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336):846–850, 1971. Marta Recasens and Eduard Hovy. Blanc: Implementing the rand index for coreference evaluation. Natural Language Engineering, 17:485–510, 2010. Marta Recasens, Lluís Màrquez, Emili Sapena, M. Antònia Martí, Mariona Taulé, Véronique Hoste, Massimo Poesio, and Yannick Versley. Semeval-2010 task 1: Coreference resolution in multiple languages. In Proceedings of the 5th International Workshop on Semantic Evaluation, SemEval ’10, pages 1–8, Los Angeles, California, 2010. Association for Computational Linguistics. Michael Schiehlen. Optimizing algorithms for pronoun resolution. In Proceedings of the 20th international conference on Computational Linguistics, COLING ’04, pages 515–521, Geneva, Switzerland, 2004. Association for Computational Linguistics. Helmut Schmid. Improvements in part-of-speech tagging with an application to German. In Proceedings of the EACL SIGDAT-Workshop, pages 47–50, Dublin, 1995. Helmut Schmid and Florian Laws. Estimation of conditional probabilities with decision trees and an application to fine-grained pos tagging. In Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1, COLING ’08, pages 777–784, Manchester, United Kingdom, 2008. Association for Computational Linguistics. 186 Bibliography SfS Universität Tübingen. TüBa-D/Z Release 6, Last checked 30. December 2010. URL http:// www.sfs.uni-tuebingen.de/tuebadz.shtml. Wojciech Skut and Thorsten Brants. A maximum-entropy partial parser for unrestricted text. In Proceedings of the Sixth Workshop on Very Large Corpora, pages 143–151, Montreal, Quebec, 1998. Wee Meng Soon, Hwee Tou Ng, and Daniel Chung Yong Lim. A machine learning approach to coreference resolution of noun phrases. Computational Linguistics, 27(4):521–544, 2001. Veselin Stoyanov, Nathan Gilbert, Claire Cardie, and Ellen Riloff. Conundrums in noun phrase coreference resolution: Making sense of the State-of-the-Art. In Proceedings of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP 2009), pages 656–664, Suntec, Singapore, August 2009. Association for Computational Linguistics. Michael Strube, Stefan Rapp, and Christoph Müller. The influence of minimum edit distance on reference resolution. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10, EMNLP ’02, pages 312–319, Philadelphia, PA, USA, 2002. Association for Computational Linguistics. STTS. The stuttgart-tübingen-tag-set, Last checked 02. March 2011. URL http://www.ims. uni-stuttgart.de/projekte/corplex/TagSets/stts-table.html. Roland Stuckardt. Three algorithms for competence-oriented anaphor resolution. In Proceedings of the 5th Discourse Anaphora and Anaphor Resolution Colloquium (DAARC2004), pages 157–163, Sao Miguel/Azores, 2004. Yannick Versley. Parser evaluation across text types. In Proceedings of the 4th Workshop on Treebanks and Linguistic Theories (TLT 2005), Barcelona, Spain, 2005. Yannick Versley. A constraint-based approach to noun phrase coreference resolution in German newspaper text. In Proceedings of the Konferenz zur Verarbeitung Natürlicher Sprache (KONVENS), Constance, Germany, 2006. Yannick Versley, Simone Paolo Ponzetto, Massimo Poesio, Vladimir Eidelman, Alan Jern, Jason Smith, Xiaofeng Yang, and Alessandro Moschitti. Bart: A modular toolkit for coreference resolution. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), Marrakech, Morocco, may 2008. European Language Resources Association (ELRA). Renata Vieira and Massimo Poesio. An empirically based system for processing definite descriptions. Comput. Linguist., 26:539–593, December 2000. Marc Vilain, John Burger, John Aberdeen, Dennis Connolly, and Lynette Hirschman. A model-theoretic coreference scoring scheme. In Proceedings of the 6th conference on Message understanding, MUC6 ’95, pages 45–52, Columbia, Maryland, 1995. Association for Computational Linguistics. Robert A. Wagner and Michael J. Fischer. The string-to-string correction problem. Journal of the ACM, 21:168–173, January 1974. Christopher Walker, Stephanie Strassel, Julie Medero, and Kazuaki Maeda. ACE 2005 Multilingual Training Corpus. Linguistic Data Consortium, Philadelphia, 2006. Ralph Weischedel, Sameer Pradhan, Lance Ramshaw, Martha Palmer, Nianwen Xue, Mitchell Marcus, Ann Taylor, Craig Greenberg, Eduard Hovy, Robert Belvin, and Ann Houston. OntoNotes Release 2.0. Linguistic Data Consortium, Philadelphia, 2008. 187 Bibliography Xiaofeng Yang, Guodong Zhou, Jian Su, and Chew Lim Tan. Coreference resolution using competition learning approach. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1, ACL ’03, pages 176–183, Sapporo, Japan, 2003. Association for Computational Linguistics. Desislava Zhekova and Sandra Kübler. Ubiu: A language-independent system for coreference resolution. In Proceedings of the 5th International Workshop on Semantic Evaluation, SemEval ’10, pages 96–99, Los Angeles, California, 2010. Association for Computational Linguistics. 188