Feature Engineering for Coreference Resolution in German

Transcription

Universität Stuttgart
Institut für Maschinelle Sprachverarbeitung
Azenbergstraße 12
D-70174 Stuttgart
Diploma Thesis
Feature Engineering
for Coreference Resolution in German
Improving the link feature set of SUCRE for German by using a more
linguistic background
Submitted by:
Patrick Ziering
Commenced: October 1, 2010
Completed: March 31, 2011
Diploma Thesis No. 97
Supervisor: Prof. Dr. Hinrich Schütze, Hamidreza Kobdani
Examiner: Prof. Dr. Hinrich Schütze
Declaration
I declare that this thesis was composed by myself and that the work contained herein is my own, except
where explicitly stated otherwise in the text.
Stuttgart, March 31, 2011
Patrick Ziering
Abstract
This diploma thesis concerns the link feature engineering based on linguistic analysis for the coreference
resolution part in the SUCRE system. The architecture of SUCRE’s coreference resolution is divided
into two steps: classification and clustering. The feature research provided in this thesis modifies the
input for the classifier (a decision tree classifier) and thereby indirectly takes effect on the clustering
step that results in a coreference partition. The feature research includes two parts: linguistic analysis of
misclassifications and implementation of new link features that model the linguistic phenomena detected
in the analysis. Among others, the linguistic issues concern the indefiniteness of the anaphor, the right
antecedent for a relative pronoun, German morphology (i.e. compound words and inflected nouns),
non-coreferring nouns like units, currencies and the like, quantified noun phrases, semantic relatedness
or appositive proper names. After the implementation of the new link features and selecting the ones
that perform best, the final feature set is evaluated. Here, a clear improvement shows up. Considering
the MUC-B3 -F-score (the harmonic mean of the F-scores of MUC and B3 as a trade-off of advantages
and disadvantages of both), the performance increases from 67.9% in SemEval-2010 up to 73.0% with
the final configuration. In detail, MUC-precision increases about 10%, but MUC-recall decreases about
6%. With respect to B3 , precision increases about 12.4% and recall decreases about 1.6%. This lack of
increase of recall arises from the fact that most new implemented features are based on the analysis of
false positives (rather than false negatives). However, these results show a great success that confirms
this research method. There are some drawbacks in the expressiveness of the feature definition language
used in SUCRE. By extending its expressiveness, further linguistic phenomena can be modeled in future
and thereby further improvements of SUCRE’s performance can be achieved.
Table of Contents
1 Introduction
3
2 The coreference resolution task and its progress in German
2.1 What is coreference resolution? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1 Coreference vs. Anaphora . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.2 NLP tasks which use coreference resolution . . . . . . . . . . . . . . . . . . .
2.2 Detection of markables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Supervised coreference models based on machine learning . . . . . . . . . . . . . . . .
2.3.1 Mention-Pair Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 Entity-Mention Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.3 Ranking Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Unsupervised coreference resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5 Coreference resolution for German - approaches in the past 10 years . . . . . . . . . . .
2.5.1 Coreference Resolution with Syntactico-Semantic Rules and Corpus Statistics (Hartrumpf, 2001) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5.2 The Influence of Minimum Edit Distance on Reference Resolution - (Strube
et al., 2002) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5.3 A Constraint-based Approach to Noun Phrase Coreference Resolution in German
Newspaper Text - (Versley, 2006) . . . . . . . . . . . . . . . . . . . . . . . . .
2.5.4 Optimization in Coreference Resolution Is Not Needed: A Nearly-Optimal Algorithm with Intensional Constraints - (Klenner and Ailloud, 2009) . . . . . . .
2.5.5 Extending BART to provide a Coreference Resolution System for German (Broscheit et al., 2010b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6 Evaluation scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6.1 MUC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6.2 B3 (B-Cubed) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6.3 CEAF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6.4 BLANC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6.5 A comparative example (borrowed from (Luo, 2005)) . . . . . . . . . . . . . . .
2.7 SemEval-2010 Task 1: Coreference Resolution in Multiple Languages . . . . . . . . . .
30
33
33
35
36
36
38
44
3 The SUCRE system
3.1 The project . . . . . . . . . . . . . . .
3.2 The architecture . . . . . . . . . . . . .
3.2.1 Preprocessing . . . . . . . . . .
3.2.2 Features in SUCRE . . . . . . .
3.2.3 The Relational Database Model
3.2.4 Coreference Resolution . . . . .
3.3 Visualization with SOMs . . . . . . . .
3.3.1 Self Organizing Map . . . . . .
49
49
50
51
52
52
54
55
55
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
6
7
7
8
8
11
11
12
16
16
19
23
26
VII
Table of Contents
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
57
59
61
62
4 Linguistic error analysis
4.1 The initial configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.1 The initial results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.2 The link features in the prefilter . . . . . . . . . . . . . . . . . . . . .
4.1.3 The link features for the feature vectors . . . . . . . . . . . . . . . . .
4.1.4 The performance of the 40 features . . . . . . . . . . . . . . . . . . .
4.2 One problem with distance features . . . . . . . . . . . . . . . . . . . . . . .
4.3 Error analysis in false positives . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.1 The second markable is indefinite . . . . . . . . . . . . . . . . . . . .
4.3.2 Wrong assignment of a relative pronoun . . . . . . . . . . . . . . . . .
4.3.3 Relative proximity in context . . . . . . . . . . . . . . . . . . . . . . .
4.3.4 Reflexive pronouns and non-subjects . . . . . . . . . . . . . . . . . . .
4.3.5 Problems with substring-matches . . . . . . . . . . . . . . . . . . . .
4.3.6 “Es“(“it“) as expletive pronoun in German . . . . . . . . . . . . . . .
4.3.7 Units, currencies, month names, weekdays and the like . . . . . . . . .
4.3.8 Problems with the alias feature . . . . . . . . . . . . . . . . . . . .
4.3.9 First markable begins with “kein“ . . . . . . . . . . . . . . . . . . . .
4.3.10 Disagreement in gender and number . . . . . . . . . . . . . . . . . . .
4.4 Error analysis in false negatives . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.1 Reflexive pronouns with non-subjects or considerable sentence distance
4.4.2 Semantic Relations between the markables . . . . . . . . . . . . . . .
4.4.3 Both markables contain a common, possibly appositive proper name . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
63
64
64
65
65
68
71
71
72
74
77
79
80
82
84
84
85
86
89
89
90
92
5 Implementation of the features
5.1 Features for False Positives . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.1 The second markable is indefinite . . . . . . . . . . . . . . . . . .
5.1.2 Wrong assignment of a relative pronoun . . . . . . . . . . . . . . .
5.1.3 Reflexive pronouns and non-subjects . . . . . . . . . . . . . . . . .
5.1.4 Problems with substring-matches . . . . . . . . . . . . . . . . . .
5.1.5 “Es“(“it“) as expletive pronoun in German . . . . . . . . . . . . .
5.1.6 Units, currencies, month names, weekdays and the like . . . . . . .
5.1.7 First markable begins with “kein“ . . . . . . . . . . . . . . . . . .
5.1.8 Disagreement in gender and number . . . . . . . . . . . . . . . . .
5.2 Features for False Negatives . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.1 Both markables contain a common, possibly appositive proper name
5.3 Features from inspirations of German approaches in (2.5) . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
95
95
95
98
100
101
104
106
107
108
111
111
112
6 Evaluation of the implemented link features
6.1 The final link feature set . . . . . . . . . . . . . . . .
6.2 The final prefilter feature set . . . . . . . . . . . . . .
6.3 Evaluation of improvement steps . . . . . . . . . . . .
6.4 The final scores . . . . . . . . . . . . . . . . . . . . .
6.5 The performance of each feature in the final feature set
6.6 Additional evaluations . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
115
115
117
118
120
123
123
3.4
3.5
3.6
3.3.2 Application of SOMs in coreference resolution
The multi-lingual aspect in SUCRE . . . . . . . . . .
Evaluation results in SemEval-2010 . . . . . . . . . .
The dataset for German coreference resolution . . . . .
7 Summary and conclusions
VIII
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
127
7.1
7.2
7.3
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
A The Stuttgart-Tübingen tag set (STTS)
B The pseudo language for SUCRE’s link feature definition
B.1 Markable keywords . . . . . . . . . . . . . . . . . . . . . .
B.2 Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . .
B.3 Arithmetic operations . . . . . . . . . . . . . . . . . . . . .
B.4 Arithmetic predicates . . . . . . . . . . . . . . . . . . . . .
B.5 Boolean operations . . . . . . . . . . . . . . . . . . . . . .
B.6 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . .
135
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
137
137
137
138
138
138
139
C A python script for computing the BLANC-score
141
D Upper and lower bounds / evaluation results in (Versley, 2006)
143
E All link errors from Chapter 4
E.1 False positives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
E.1.1 The second markable is indefinite . . . . . . . . . . . . . . . . . . . .
E.1.2 Wrong assignment of a relative pronoun . . . . . . . . . . . . . . . . .
E.1.3 Relative proximity in context . . . . . . . . . . . . . . . . . . . . . . .
E.1.4 Reflexive pronouns and non-subjects . . . . . . . . . . . . . . . . . . .
E.1.5 Problems with substring-matches . . . . . . . . . . . . . . . . . . . .
E.1.6 “Es“(“it“) as expletive pronoun in German . . . . . . . . . . . . . . .
E.1.7 Units, currencies, month names, weekdays and the like . . . . . . . . .
E.1.8 Problems with the alias feature . . . . . . . . . . . . . . . . . . . . . .
E.1.9 First markable begins with “kein“ . . . . . . . . . . . . . . . . . . . .
E.1.10 Disagreement in gender and number . . . . . . . . . . . . . . . . . . .
E.2 False negatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
E.2.1 Reflexive pronouns with non-subjects or considerable sentence distance
E.2.2 Semantic Relations between the markables . . . . . . . . . . . . . . .
E.2.3 Both markables contain a common, possibly appositive proper name . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
145
145
145
149
152
157
159
163
166
169
170
172
174
174
175
178
The Bell numbers B(n) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Link feature set used in (Soon et al., 2001) . . . . . . . . . . . . . . . . . . . . . . . . .
Example of a feature vector in (Soon et al., 2001) . . . . . . . . . . . . . . . . . . . . .
Weights and incompatiblef -values for each feature used in (Cardie and Wagstaff, 1999)
F-measure results for the clusterer and some baselines on the MUC-6 datasets . . . . . .
Coreference resolution results in (Hartrumpf, 2001) . . . . . . . . . . . . . . . . . . . .
The initial feature set by (Strube et al., 2002) . . . . . . . . . . . . . . . . . . . . . . .
The first evaluation in (Strube et al., 2002) . . . . . . . . . . . . . . . . . . . . . . . .
Revision of the feature set in (Strube et al., 2002) . . . . . . . . . . . . . . . . . . . . .
6
9
9
14
15
19
21
21
22
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
List of Tables
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
IX
List of Tables
X
2.10
2.11
2.12
2.13
2.14
2.15
2.16
2.17
2.18
2.19
2.20
2.21
2.22
2.23
The second evaluation in (Strube et al., 2002) . . . . . . . . . . . . . . . . . . . . .
Performance comparison of (Strube et al., 2002) and (Versley, 2006) . . . . . . . . .
The global ILP constraints in (Klenner and Ailloud, 2009) . . . . . . . . . . . . . .
CEAF-Results in (Klenner and Ailloud, 2009) . . . . . . . . . . . . . . . . . . . . .
Evaluation results of Broscheit et al. (2010b) . . . . . . . . . . . . . . . . . . . . . .
The BLANC confusion matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The BLANC scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Comparison of evaluation metrics (Luo, 2005) . . . . . . . . . . . . . . . . . . . . .
Corpora used for each langauge in SemEval-2010 . . . . . . . . . . . . . . . . . . .
The training, development and test set of TüBa-D/Z in SemEval-2010 . . . . . . . .
Comparison of architectures of BART, SUCRE, TANL-1 and UBIU in SemEval-2010
The baseline scores for German and English in SemEval-2010 . . . . . . . . . . . .
Official results of SemEval-2010 for German . . . . . . . . . . . . . . . . . . . . .
closed vs. open in SemEval-2010 . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
22
26
29
30
32
37
37
44
45
46
47
47
48
48
3.1
3.2
3.3
3.4
3.5
3.6
Word Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Markable Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Link Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Three features used by Burkovski et al. (2011) . . . . . . . . . . . . . . . . . . . .
Results of SUCRE and the best competitor system, TANL-1, in SemEval-2010 Task 1
The training and test set of TüBa-D/Z in this study . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
53
53
53
57
61
62
4.1
4.2
4.3
4.4
4.5
4.6
4.7
Confusion matrix for classification judgements . . . . . . .
Initial results of SUCRE . . . . . . . . . . . . . . . . . . .
The usage of MUC-B3 . . . . . . . . . . . . . . . . . . . .
Cumulative performance of the 40 original features . . . . .
Reversed cumulative performance of the 40 original features
Results of the new baseline . . . . . . . . . . . . . . . . . .
Table of the possessive pronouns . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
63
64
65
69
70
71
87
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11
5.12
5.13
5.14
5.15
5.16
5.17
5.18
5.19
5.20
5.21
Evaluation of the feature Indef1 .
Evaluation of the feature Relpron1
The final set without Relpron1 . .
Evaluation of the feature Reflex1 .
The final set with Reflex1 . . . .
Evaluation of the feature Substr1 .
Substr2 with compound words . .
The final set with Substr6 . . . .
Evaluation of the feature Es1 . . .
Evaluation of the feature Es2 . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
96
96
97
97
98
99
99
99
100
100
100
101
102
102
102
103
104
104
104
105
105
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
List of Tables
5.22
5.23
5.24
5.25
5.26
5.27
5.28
5.29
5.30
5.31
5.32
5.33
5.34
5.35
5.36
5.37
5.38
5.39
5.40
Evaluation of the feature Es3 . .
The final set without Es3 . . . .
Evaluation of the feature Unit1 .
Evaluation of the feature Unit2 .
Evaluation of the feature Kein1
Evaluation of the feature Kein2
Evaluation of the feature Agree1
The final set without Agree1 . .
The final set with Agree3 . . . .
Evaluation of the feature Proper1
Evaluation of the feature Proper2
Evaluation of the feature Inspire1
The final set with Inspire1 . . .
The final set with Inspire2 . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
106
106
106
107
108
108
109
109
110
110
110
111
112
113
113
113
113
113
114
6.1
6.2
6.3
6.4
6.7
6.5
6.6
6.8
The steps from the new baseline to the final feature set . . . . .
Performance of the improvement steps . . . . . . . . . . . . . .
The final scores . . . . . . . . . . . . . . . . . . . . . . . . . .
SemEval-2010 Results - German, closed, gold vs. Final scores
The performance without vector feature no. 19 . . . . . . . . .
Cumulative performance of the final feature set . . . . . . . . .
Reversed cumulative performance of the final feature set . . . .
The performance with the final features and sentence distance .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
118
120
121
122
123
124
125
125
7.1
Runtime comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
B.1 A value table for the defined boolean operators . . . . . . . . . . . . . . . . . . . . . . 139
D.1 Upper and lower bounds / evaluation results in (Versley, 2006) . . . . . . . . . . . . . . 143
1
List of Tables
2
CHAPTER 1
Introduction
Denoting any kind of entity, one could use different expressions and descriptions. Persons could be
expressed by their full name or title (e.g. Bundeskanzlerin Dr. Angela Merkel), only by their last name
(e.g. Merkel) or in a less formal context by their first name (e.g. Angela). It is also possible to describe
people or things by their properties or functions. In the case of Angela Merkel, a definite description like
die Bundeskanzlerin is also as possible as any kind of pronouns like sie, sich or ihre.
These expressions that could refer to an entity are called a markable. So one can say that every markable referring to the same entity is coreferent to each other and disreferent to any other markable referring
to another entity. That means that Angela Merkel is coreferent to die Bundeskanzlerin but disreferent to
Barack Obama.
This concept of coreference is an equivalence relation. Every group (i.e. cluster, entity) of coreferent
markables constitutes an equivalence class. A pair of markables can be regarded as a link that connects
either two coreferent markables or two disreferent markables. Thus, any markable pair represents a
coreference link or a disreference link.
Creating such clusters of coreferent markables is the task of coreference resolution. A possible
partition of coreference resolution might be: {{Bundeskanzlerin Dr. Angela Merkel, Merkel, die Bundeskanzlerin, sie, . . . },{Barack Obama, er, sein, . . . }, . . . }.
In the course of text comprehension the human mind makes coreference resolution on the fly within a
conversation or while reading a text. Doing this task automatically reveals several problems.
One way of performing coreference resolution automatically is using machine learning methods. Here,
one possibility is to use a classifier which labels every given markable pair with a probability of being
a coreference link. Afterwards markables connected as coreferent with a high probability can be put
together using an appropriate clustering method and the coreference resolution is done.
But such a resolution system brings along various issues. There is a lot of knowledge needed: Partof-Speech-tags (e.g. to identify special kinds of pronouns), grammatical information like gender or
number and semantic information like semantic class (to exclude two incompatible markables as being
coreferent).
Such a knowledge as well as combinations of given information can be modeled as a link feature
between two markables. A lot of those features describe the agreement or disagreement between the
connected markables with respect to a specific atomic feature. For example, information about gender
can produce a link feature like Both markables have the same gender. But one counter-example that this
feature can be used to indicate disreference by mismatch might be the fact that sometimes the grammatical gender of a markable is mixed up with its natural gender. So Mädchen (the girl, neuter) can also be
referred to by female pronouns like sie (she, female). Thus, it is not advisable to exclude two markables
as being coreferent only because they have a different gender. This problem already shows one kind of
difficulty in modeling coreference in natural language with the use of such features.
By using an ordered set of link features a link can be represented as a feature vector where each
component corresponds to the value of a link feature, for example boolean values like (TRUE/FALSE
3
Chapter 1. Introduction
(i.e. 1 or 0)) for a link feature like Both markables have the same number. There are also numeric values
conceivable for a link feature like the sentence distance between the two markables or the edit distance
between the markables’ heads (i.e. the number of predefined string operations for transforming one head
into the other).
The SUCRE system (german: Semi-Überwachte-Ko-Referenz-Erkennung) described in chapter 3 uses
such link features to train a classifier with the resulting feature vectors.
The main aim of this diploma thesis is to improve the performance of SUCRE’s coreference system
for German by link feature engineering. Thereby, the focus is set to the linguistic background of misclassified links. Therefore, these markable pairs (i.e. the links) are considered in their context to see what
linguistic phenomenon is responsible for the disreference (or coreference) of the respective markables.
Afterwards, the goal is to model this phenomenon in a link feature in order to provide a feature set that
is able to rightly classify the links and thereby improve SUCRE’s performance.
The basis for this improvement is the evaluation of the resulting partition of equivalence classes. For
this evaluation, SUCRE uses the four main evaluation measures described in (2.6).
Before analyzing the misclassifications (false positives and false negatives) of the markable links in
SUCRE in chapter 4, chapter 2 presents a brief overview of the coreference resolution task in general,
the three main coreference models in supervised machine learning methods, an unsupervised coreference
approach, the progress for German coreference resolution in the past 10 years and a short survey over the
main evaluation scores used for coreference resolution.
Chapter 3 presents the SUCRE project with its parts and goals. The goal in chapter 5 is to implement
some features in a specific regular definition language defined for SUCRE to model the linguistic phenomena detected in chapter 4. Chapter 6 evaluates the feature engineering that is done in the chapters 4
and 5 and gives an answer to the question, whether the performance of SUCRE increases with the new
link feature set. Chapter 7 finally summarizes the previous chapters and the results and gives an outline
for future work.
4
CHAPTER 2
The coreference resolution task and its progress in German
In this chapter the coreference resolution task is presented. First, the definitions for coreference and
coreference resolution are sketched briefly. One difficult task with end-to-end-coreferent systems (i.e.
systems that take a raw text as input and return the final coreference partition) is the markable detection. This is described in (2.2). In (2.3), the most common coreference models for supervised machine
learning approaches are outlined. Thereafter, (2.4) presents an unsupervised method treating coreference
resolution as a pure clustering task. In (2.5), the progress in German coreference resolution in the past
10 years and its the state of the art are presented. In (2.6) the four most commonly used evaluation scores
(i.e. MUC, B3 , CEAF and BLANC) are introduced and exemplified with an illustrating example. In (2.7)
the competition in coreference resolution, SemEval-2010, is briefly outlined.
2.1. What is coreference resolution?
If people talk about things they have different possibilities for denoting them. This is the result of variety
in natural language, which human speakers exploit to avoid repetitions (Cardie and Wagstaff, 1999). In
the literature, such expressions which can refer to a real-world entity are called markables, mentions,
REs (referring expressions) or CEs (coreference elements). Subsequently these expressions will be
called markable, regardless of the term used in the cited literature.
If two markables refer to the same entity, they are called coreferent, otherwise disreferent. Examples
of coreferent markables are given in (1):
(1)
a.
b.
c.
Angela Merkel ⇔ die Bundeskanzlerin
Barack Obama ⇔ der US-Präsident
eine alte Frau ⇔ sie
Examples of disreferent markables are given in (2):
(2)
a.
b.
c.
der Mann ⇔ sie
die Frauen ⇔ er
Frau Müller ⇔ Herr Müller
There are different kinds of markables. They can be definite or even indefinite noun phrases, proper
names, appositives, any kinds of pronouns and so on. The coreference between two markables can be
seen as an equivalence relation. That means that every markable is coreferent with itself (reflexivity). If
markable a is coreferent with markable b, then markable b is coreferent with markable a (symmetry) and
if the markables a, b as well as b, c are coreferent, then a, c are also coreferent (transitivity). So, clusters
of coreferent markables which are disreferent to all markables outside this cluster are equivalence classes.
5
Chapter 2. The coreference resolution task and its progress in German
To create a partition of equivalence classes over the set of all markables in a context is called coreference resolution. From a more local perspective, coreference resolution is the task of determining
whether two expressions in natural language (denoting an entity) refer to the same entity in the world
(Soon et al., 2001).
While human listeners have just little trouble to do this job of assigning each markable to the appropriate entity, it is a very tough challenge to an NLP system (Cardie and Wagstaff, 1999).
Entities might be expressed only once in the text and thus constitute a singleton cluster. Such a cluster
does not contain coreferent markables. Other entities are expressed several times. These multi-markable
entities contain coreferent markables.
The search space for the right partition is very huge as the number of different partitions, given n markables, equals the Bell number B(n), also called Exponential numbers (Bell, 1934; Hartrumpf, 2001).
Table 2.1 shows some examples of number of markables n and the resulting number of different
possible partitions B(n):
Number of markables n
1
2
3
4
5
10
15
20
25
Number of partitions B(n)
1
2
5
15
32
115,975
≈ 1.38 × 109
≈ 5.17 × 1013
≈ 4.64 × 1018
Table 2.1.: The Bell numbers B(n)
In this thesis, the focus lies on identity coreference resolution of any kind of noun phrase. Other
possibilities are part-whole or similar complex semantic relations rather than identity and the coreference
of situations expressed by clauses rather than just noun phrases (Hartrumpf, 2001).
Much work has been done in the past in this area and the related area of anaphora resolution (2.1.1)
but “most of the work on supervised coreference resolution has been developed for English [. . . ] due to
the availability of large corpora such as ACE” (Broscheit et al., 2010b). In (2.5) the development of and
some approaches about the coreference resolution in German are presented.
2.1.1. Coreference vs. Anaphora
As coreference resolution has been often confused with anaphora resolution (Elango, 2005), a clear
distinction has to be made. Two markables are coreferent, if they refer to the same entity. This can also
be true for a markable m1 that is the anaphoric antecedent to markable m2 . This might be the reason
for the frequent confusion of both tasks. But a markable A is said to be the anaphoric antecedent of
markable B if and only if it is required for getting the meaning of B (Elango, 2005) . This relation
((A, B) ∈ R ⇔ A is the anaphoric antecedent to B) is neither reflexive, nor symmetric, nor transitive,
thus being-the-anaphoric-antecedent-of is no equivalence relation. No partition into equivalence classes
is possible.
Coreferential links could be anaphoric relations (3a) but there are coreferential links where the first
markable is not required for the interpretation of the second markable (3b). Some anaphora relations such
as bound anaphora are not coreferent (3c) (Elango, 2005). Another kind of anaphoric relation which is
not coreferent is that of bridging, where the relation of antecedent and anaphora is that of meronomy or
holonomy, like in (3d), where den Raum and die Tür are in this anaphoric relation.
6
2.2. Detection of markables
(3)
a.
b.
c.
d.
Der Manni sieht sichi im Spiegel.
Ein alter Manni schläft auf einer Parkbank. Am Morgen wachte der Manni auf.
Jeder Hund hat seinen Tag.
Der Junge trat in den Raum. Die Tür schloss sich automatisch.
2.1.2. NLP tasks which use coreference resolution
Many natural language processing (NLP) applications require coreference resolution (Cardie and Wagstaff,
1999). Numerous NLP tasks detect attributes, actions and relations between entities. For this purpose
all information about a given entity has to be discovered. Therefore, a first step to this is to group all
markables referring to a given entity together. Thus, coreference resolution is an important requirement
to such tasks like textual entailment and information extraction (Bengtson and Roth, 2008).
It also has applications in areas like question answering, machine translation, automatic summarization and named entity extraction (Elango, 2005).
2.2. Detection of markables
“The ultimate goal for a coreference system is to process unannotated text.” (Bengtson and Roth, 2008)
Such a system is called end-to-end. Developing such a resolver requires the detection of markables. But
there are problems concerning markable detection, for example markables are often nested, their boundaries mismatch with the gold standard, they are missed or additional markables are detected (Bengtson
and Roth, 2008).
In order to get the input for a coreference classifier, the markables in the text have to be extracted. To
obtain these, some preprocessing steps have to be taken. Soon et al. (2001) propose the following NLP
modules to be used as shown in figure (2.1).
Figure 2.1.: NLP modules for markable detection in (Soon et al., 2001)
The mentioned steps are tokenization, sentence segmentation, morphological processing, POS tagging,
NP identification, NER, nested NP extraction and determination of semantic class. The result of these
steps are well-defined boundaries of the markables and information about the markables which are used
for subsequent feature generation. Soon et al. (2001) used a POS tagger, an NP identifier and a named
entity recognizer all based on the Hidden Markov Model (HMM). NPs and named entites are merged in
such a way that if an NP overlaps with a named entity, their boundaries will be adjusted so that the NP
subsumes the named entity.
The nested NP extraction module determines nested noun phrases for each noun phrase identified so
far. It divides nested NPs into two groups:
7
1. Nested NPs from possessive NPs:
(e.g. {{his}N P long-range stategy}N P , {{Eastern’s}N P parent}N P )
2. Nested NPs that are modifier nouns or prenominals:
(e.g. {{wage}N P reductions}N P , {{Union}N P representatives}N P )
Finally, the set of markables is the union of the set of extracted NPs, named entities and nested NPs.
For non-named-entities, the semantic class will be determined (Soon et al., 2001).
2.3. Supervised coreference models based on machine
learning
There are three most common coreference models that have been developed for implementing coreference resolution as a supervised machine learning task.
2.3.1. Mention-Pair Model
Issues with mention-pair models
This model is based on a binary classifier which determines whether two markables are coreferent or
not (Ng, 2010). Although this approach is very popular, it has some disadvantages. As the coreference
relation is an equivalence relation, it is transitive (i.e. (coref (A, B) ∧ coref (B, C)) ⇒ coref (A, C)).
This property cannot be modeled as it is possible to classify A and B as coreferent and B and C as
coreferent but A and C as disreferent. Therefore it is necessary to do a separate clustering step in order
to achieve a final coreference partition (Ng, 2010).
A second issue that arises is the task of generating training instances. A training instance is a pair
of markables and the corresponding class (i.e. coreferent/disreferent). A simple approach for this may
be generating all possible pairs of markables within a training document. But this method yields an
extremely unbalanced class distribution as most markable pairs are not coreferent. Thus, there have to
be some training instance creation methods for the reduction of class skewness (Ng, 2010).
Training instance creation models
A possible way of reducing class skewness in the training instances has been proposed by Soon et al.
(2001): for a given markable mk a positive instance is created between mk and its closest preceding
antecedent, mj . Negative instances are created between mk and each markable that succeeds mj . An
example given by Soon et al. (2001): if we have a context with six markables in the sequential order:
A0 , · · · , A1 , a, b, B1 , A2 where A0 , A1 and A2 are coreferent with each other and A1 is the closest
preceding antecedent of A2 , then Soon et al. (2001) create a positive training instance of the markable
pair <A1 ,A2 > (rather than of <A0 ,A2 >) and negative instances of the markable pairs <a,A2 >, <b,A2 >
and <B1 ,A2 >.
Ng and Cardie (2002) modifies this method: if the anaphor mk is non-pronominal, then a positive
training instance is created between mk and its closest non-pronominal antecedent. By this modification,
a coreference link between a pronominal markable m1 and a non-pronominal markable m2 is impossible. The reason for this is that it is not easy for a machine learner to learn from an instance where
the antecedent of a non-pronominal markable is a pronoun (Ng, 2010). Another possibility for reducing
class skewness is the usage of a prefilter (i.e. a hard constraint), that filters out markable pairs that are
very unlikely to be classified as coreferent because of obvious incompatibility like in gender or number
agreement.
8
2.3. Supervised coreference models based on machine learning
One possible representation for a training instance is a feature vector as used by Soon et al. (2001).
Here, each dimension corresponds to a feature. Table 2.2 shows the feature set proposed by Soon et al.
(2001).
Feature (abbreviated in (Soon et al., 2001))
Sentence distance (DIST)
Pronominality of the antecedent (I_PRONOUN)
Pronominality of the anaphor (J_PRONOUN)
String match (STR_MATCH)
Definiteness of the anaphor (DEF_NP)
Demonstrative anaphor (DEM_NP)
Number agreement (NUMBER)
Semantic Class Agreement (SEMCLASS)
Gender Agreement (GENDER)
Both-Proper-Names (PROPER_NAME)
Alias (ALIAS)
Appositive anaphor (APPOSITIVE)
Description
The distance between m1 and m2 in terms of sentences
Returns TRUE if that m1 is a pronoun
Returns TRUE if m2 is a pronoun
Returns TRUE if m1 and m2 string matches
Returns TRUE if m2 starts with the
Returns TRUE if m2 starts with this, . . .
Returns TRUE if m1 and m2 agree in number
Returns TRUE if m1 and m2 are in the same class
Returns TRUE if m1 and m2 agree in gender
Returns TRUE if both markables are proper names
Returns TRUE if m1 is the alias of m2 or vice versa
Returns TRUE if m2 is in apposition to m1
Table 2.2.: Link feature set used in (Soon et al., 2001)
Given the excerpt “. . . Frank Newman, 50, vice chairman and . . . ”, Soon et al. (2001) illustrate their
feature set with the feature vector corresponding to the markables Frank Newman and vice chairman,
shown in table 2.3:
Feature
DIST
I_PRONOUN
J_RONOUN
STR_MATCH
DEF_NP
DEM_NP
NUMBER
SEMCLASS
GENDER
PROPER_NAME
ALIAS
APPOSITIVE
Value
0
+
1
1
+
Comments
m1 and m2 are in the same sentence
m1 is not a pronoun
m2 is not a pronoun
m1 and m2 do not match
m2 is not a definite noun phrase
m2 is not a demonstrative noun phrase
m1 and m2 are both singular
m1 and m2 are both persons
m1 and m2 are both males
Only m1 is a proper name
m2 is not an alias of m1
m2 is in apposition to m1
Table 2.3.: Example of a feature vector in (Soon et al., 2001)
Coreference classifiers
After creating a training set, a learning algorithm can be trained. The most popular algorithms for
this task are the decision tree induction systems (e.g. C5, (Quinlan, 1993)). Alternatives to this are
rule learners (e.g. RIPPER (Cohen, 1995)) and memory-based learners (e.g. TiMBL, (Daelemans
et al., 2003)), beside statistical learners such as maximum entropy models (Berger et al., 1996), voted
perceptrons (Freund and Schapire, 1999) and support vector machines (Joachims, 1999).
9
Clustering algorithms
After training a classifier and applying a test sample to it, the classifier’s decisions have to be coordinated
and transformed into a coreference partition (Ng, 2010). The two most common coreference clustering
algorithms are closest-first clustering (Soon et al., 2001) and best-first clustering (Ng and Cardie, 2002).
For a given markable mk the closest-first clustering chooses the markable that is closest to and preceding mk and classified as coreferent to it. If there is no suitable antecedent for mk , the coreference
chain ends.
In order to improve the precision of closest-first clustering, best-first clustering chooses that markable
as antecedent to mk that is most probably the antecedent and coreferent with it.
One problem concerning those two clustering algorithms is that they are too greedy: “clusters are
formed based on a small subset of pairwise decisions made by the model” (Ng, 2010). Furthermore,
coreference classifications are preferred to disreference classifications: consider the markables m1 , m2
and m3 occurring in this order. If m2 is chosen as closest (or best) preceding antecedent to m3 and m1
as closest (or best) preceding antecedent to m2 , then all three markables are assigned to the same cluster
regardless of a possible disreference between m1 and m3 .
There are several algorithms addressing these problems. For example, the correlation clustering
(Bansal et al., 2002) creates a partition with respect to as many classification decisions as possible.
Graph partitioning algorithms (Nicolae and Nicolae, 2006) are used in a weighted graph. In this graph,
each vertex corresponds to a markable and an edge has a weight which corresponds to the coreference
probability. The Dempster-Shafer rule (Dempster, 2008) combines both coreference and disreference
classification decisions for creating a partition (Ng, 2010).
Although there are a lot of coreference clustering algorithms, only few researchers tried to compare
their effectiveness. Ng and Cardie (2002) account for best-first clustering outperforming closest-first
clustering, while Nicolae and Nicolae (2006) show that their minimum-cut-based graph partitioning
algorithm performs better than best-first clustering.
Combining classification and clustering
One problem that occurs when using classification and clustering as two separated steps is that they
are trained independently from each other. If the classification is improved, it might not take effect in
clustering-level accuracy (Ng, 2010), that means “overall performance on the coreference task might not
improve”.
Mccallum and Wellner (2004) and Finley and Joachims (2005) remove the classification and treat
coreference resolution as a supervised clustering task. A similarity metric is learned in order to maximize
the clustering accuracy (Ng, 2010).
The flaw of the mention-pair model
In contrast to the entity-mention model (2.3.2) or the ranking model (2.3.3), the mention-pair model has
some weak points.
The first problem that Ng (2010) mentions is that every candidate antecedent for a markable is examined independently of each other. The model does not assess how good one candidate antecedent
relates to a markable compared against other candidates. There is no way for finding the most probable
candidate antecedent among all candidates.
The second problem is the insufficiency of expressiveness. The information that is only contained in
the two markables may not be enough for the decision whether the two markables are coreferent or not.
In particular, if one markable is a pronoun or any other noun phrase that “lacks descriptive information
such as gender” (e.g. “Clinton”) (Ng, 2010).
10
2.3. Supervised coreference models based on machine learning
2.3.2. Entity-Mention Model
The advantage of the entity-mention model
The entity-mention model attacks the second problem of the mention-pair model (i.e. the lack of expressiveness). Mccallum and Wellner (2003) use the following example for presenting the mentionpair-model’s shortcoming: assume there are three markables Mr Clintonm1 , Clintonm2 and shem3 . With
respect to proximity and no mismatch in atomic features like gender or number, m2 and m3 are classified
as coreferent. Concerning an exact string matching feature, m1 and m2 are classified as coreferent. But
now, due to transitivity, the markables Mr Clintonm1 and shem3 will end up in the same cluster. The
reason for this is the independence of markable-pair classifications (Ng, 2010).
If the model knows that Mr Clintonm1 and Clintonm2 are considered to be coreferent, it will not
classify the markable pair (m2 , m3 ) to be coreferent. This is the basic idea of an entity-mention model.
It can classify whether a markable mk is coreferent with a cluster Cj of markables mj1 . . . mjn that
precede mk .
The training instances
A training instance of the entity-mention model corresponds to a markable mk and a cluster Cj of
markables preceding mk . It is labeled positive in the case that mk should be added to Cj and
negative otherwise. Such an instance can be represented by so-called cluster-level features. These
features can be regarded as a combination of a link feature (i.e. features that are used in the mention-pair
model) and a quantifier (e.g. all, most, any). For example the link feature Both markables have the
same gender can be combined with the quantifier all to create a cluster-level feature that has the value
YES if mk agrees with all markables mji in Cj with respect to gender; otherwise its value is NO. These
cluster-level features increase the expressiveness over the mention-pair model.
2.3.3. Ranking Model
The advantage of the ranking model
Alternative titles for this kind of model are tournament model or twin-candidate model. The entitymention model solves the mention-pair model’s problem of lacking expressiveness but does not address
the comparison of one candidate antecedent to others. This problem is attacked by the ranking model. It
enables to determine which candidate antecedent is the most probable. Ranking embodies a more natural
coreference resolution than classification does since all preceding candidate antecedents are considered
simultaneously. A markable is resolved to that candidate antecedent that has the highest rank (Ng, 2010).
The training and testing of a ranking model
A training instance for the ranking model corresponds to a markable mk and two candidate antecedents
mi and mj , one of which is an antecedent to mk and the other is not (Ng, 2010). Its label indicates
whether the first or second markable fits better as the antecedent to mk .
In the testing step, given a markable mk , each pair of candidate antecedents to mk are applied to the
ranking model. The final antecedent for mk is the candidate that was classified most of the time as better
for mk .
Variants of ranking models
Thanks to the advance of machine learning, all candidate antecedents can be considered simultaneously.
This so-called mention-ranker consistently outperforms the mention-pair model although both models
have the same expressiveness (Ng, 2010).
11
Rahman and Ng (2009) propose a cluster-ranking model for exploiting cluster-level features in order to increase expressiveness. This model ranks clusters Cji of markables mjik that precedes a given
markable ml in order to resolve ml . Such kinds of ranking models address both shortcomings of the
mention-pair model: (1) independent considerations of candidate antecedents and (2) insufficient expressiveness due to only using link features (Ng, 2010).
2.4. Unsupervised coreference resolution
The result of the coreference resolution task is a partition of coreference sets. Thus, it is obvious to assume a clustering algorithm that puts all coreferent markables into one cluster creating the final partition
without using a pairwise classifier before.
In the subsequent section, this method is exemplified by the approach of Cardie and Wagstaff (1999) “Noun Phrase Coreference as Clustering”, that is focussed on the resolution of base (i.e. simplex) noun
phrases.
Advantages of the clustering approach
In contrast to other learning and non-learning approaches, the clustering method has several advantages:
• Clustering is unsupervised. Thus, there is no need for annotated training data.
• The method is domain-independent.
• The method enables to flexibly combine local constraints with global constraints.
Local constraints: These constraints are only used within one closed markable pair.
Global constraints: They also regard the correlation to other markables in its environment (e.g.
the cluster).
Instance representation and feature set
Every markable (rather than every markable pair) in the corpus is represented as a feature vector. For
this representation Cardie and Wagstaff (1999) extract markables and their feature values automatically.
They use 11 features (i.e. local constraints), each corresponding to a dimension in the search space:
1. Words: The words in the markable are treated as a feature.
2. Head word: The last word in a markable is regarded as the head noun (cf. right-hand head rule).
3. Position: The numeric ID of the markable: all markables are enumerated from the beginning of
the text.
4. Pronoun type: Pronouns get a specific type (e.g. POSSESSIVE); other markables get None.
5. Article: The article is Indefinite, Definite or None.
6. Appositive: This binary feature returns TRUE in the case that the markable is an apposition and
FALSE otherwise.
7. Number: The number value of the markable can be singular or plural.
8. Proper name: This binary feature returns TRUE in the case that the markable is a proper name
and FALSE otherwise.
9. Semantic class: The semantic class of the markable is based on WordNet (e.g. Human, Company,
. . . ).
12
10. Gender: The grammatical gender of the markable can have the values Masculine, Feminine,
Neuter or Either.
11. Animacy: Markables with the semantic class Human or Animal are animated, all others are
inanimated.
Given the sample text:
John Simon, Chief Financial Officer of Prime Corp. since 1986,
saw his pay jump 20%, to $1.3 million, as the 37-year-old also
became the financial-services company’s president.
Cardie and Wagstaff (1999) extract 11 markables and their features. The resulting feature vectors (each
dimension as a column; the first two dimensions are combined) are shown in figure 2.2.
Words/HeadH
Position
Article
Appositive
Number
1
2
Pronoun
Type
None
None
John SimonH
Chief Financial
OfficerH
Prime Corp.H
1986H
hisH
payH
20%H
$1.3 millionH
the 37-year-oldH
the financial-services
companyH
presidentH
Semantic
Class
human
human
Gender
Animacy
sing
sing
Proper
Name
TRUE
FALSE
None
None
FALSE
FALSE
masc
either
anim
anim
3
4
5
6
7
8
9
10
None
None
poss
None
None
None
None
None
None
None
None
None
None
None
def
def
FALSE
FALSE
FALSE
FALSE
FALSE
FALSE
FALSE
FALSE
sing
plural
sing
sing
plural
plural
sing
sing
FALSE
FALSE
FALSE
FALSE
FALSE
FALSE
FALSE
FALSE
company
number
human
payment
percent
money
human
company
neuter
neuter
masc
neuter
neuter
neuter
either
neuter
inanim
inanim
anim
inanim
inanim
inanim
anim
inanim
11
None
None
FALSE
sing
FALSE
human
either
anim
Figure 2.2.: Some feature vectors used in (Cardie and Wagstaff, 1999)
The distance measure
The basic idea of clustering coreferent markables is that they are somehow similar to each other, i.e.
they have a small distance to each other in the search space. The clusterer may create a cluster with all
markables that are close to each other (e.g. that are within a coreference radius r).
Cardie and Wagstaff (1999) define the distance between two markables m1 and m2 as shown in formula 2.1.
X
distance(m1 , m2 ) =
wf · incompatiblef (m1 , m2 )
(2.1)
f ∈F
Here, F is the set of features described above. The function incompatiblef returns a value between 0
and 1 denoting the incompatibility of m1 and m2 with respect to feature f . The weight wf corresponds to
the importance of the feature. It ranges between −∞ and +∞. The weight +∞ is used when impossible
markable pairs have to be filtered out (e.g. in a number/proper_name/animacy-mismatch). On the
other hand, the weight −∞ indicates certain coreference between the respective markables (e.g. in an
appositive-match). In the case of both addends (i.e. . . .+(−∞)+. . .+∞+. . .), the positive addend
13
(i.e. ∞) indicating certain disreference has more importance and overlays the negative one. Thus, the
distance-value becomes ∞.
Another remarkable weight is the coreference radius r. This weight indicates a preference for regarding m1 and m2 as disreference. But it might be clustered together anyway, “if there is enough other
evidence that they are similar”.
Table 2.4 shows for each feature f its weight wf and its incompatiblef -value.
Feature f
Words
Head Noun
Position
Pronoun
Article
Words–Substring
Appositive
Number
Proper Name
Semantic Class
Gender
Animacy
Weight wf
10.0
1.0
5.0
r
r
−∞
−∞
∞
∞
∞
∞
∞
incompatiblef -value
(# of mismatching words) / (# of words in the longer NP)
1 if the head nouns differ; else 0
(difference in position) / (maximum difference in document)
1 if NPi is a pronoun and NPj is not; else 0
1 if NPj is indefinite and not appositive; else 0
1 if NPi subsumes (entirely includes as a substring) NPj
1 if NPj is appositive and NPi is its immediate predecessor; else 0
1 if they do not match in number; else 0
1 if both are proper names, but mismatch on every word; else 0
1 if they do not match in class; else 0
1 if they do not match in gender (allows either to match masc or
fem); else 0
1 if they do not match in animacy; else 0
Table 2.4.: Weights and incompatiblef -values for each feature used in (Cardie and Wagstaff, 1999)
The clustering algorithm
The clustering algorithm combines the global constrains and the local constraints and creates a partition
of coreference clusters. It is given in algorithm 1 and 2.
Algorithm 1 The clustering algorithm
Coreference_Clustering(mn ,mn−1 ,...,m1 )
1: Let r be the coreference radius
2: Each markable mi belongs to its own cluster ci : ci ← {mi }
3: for mj := mn to m1 do
4:
for mi := mj−1 to m1 do
5:
d ← distance(mi , mj )
6:
ci ← cluster_of (mi )
7:
cj ← cluster_of (mj )
8:
if d < r ∧ All_m0 s_compatible(ci , cj ) then
9:
cj ← ci ∪ cj
10:
end if
11:
end for
12: end for
First, every markable constitutes its own cluster. Each markable is compared with every preceding
markable. If the distance between two markables mi and mj is less than the coreference radius, their
clusters are checked for merging (cf. algorithm 2): if the distance between every markable in the cluster
of mi and every markable in the cluster of mj is unlike ∞, then the clusters are merged.
14
Algorithm 2 All_m’s_compatible
All_m’s_compatible(ci ,cj )
1: for mα ∈ ci do
2:
for mβ ∈ cj do
3:
if distance(mα , mβ ) = ∞ then
4:
return FALSE
5:
end if
6:
end for
7: end for
8: return TRUE
This way, the clustering algorithm automatically includes the transitive closure (i.e. distance(mi , mj ) <
r ∧ distance(mj , mk ) < r ∧ All_m0 s_compatible(ci , cj ) ⇒ cluster_of (mi )=cluster_of (mj )=
cluster_of (mk )). Certainly, it might be the fact that distance(mi , mk ) ≥ r and thus, mi and mk
would not have been considered coreferent within the local perspective but as long as the distance is less
than ∞, they can be added to the same cluster. So, the clustering algorithm uses a global perspective by
creating a coreference chain.
One problem concerning this algorithm is its greed. Every markable is linked with every compatible
preceding markable within the radius r. To solve this problem, Cardie and Wagstaff (1999) propose the
following modifications:
1. For every markable mj , the algorithm stops, when the first compatible antecedent mi is found.
2. For every markable mj , the algorithm ranks all possible candidate antecedents and chooses only
the best one.
3. The algorithm ranks the coreference links and proceeds in the ranked order.
Results of the clustering approach in (Cardie and Wagstaff, 1999)
Cardie and Wagstaff (1999) use two different corpora in two different variants for the evaluation: the
“dry-run”-MUC-6-coreference-corpus and the “formal-evaluation”-MUC-6-coreference-corpus with 30
documents each. The variants are “official” and “adjusted”. In the first variant, all possible markables are
considered. Cardie and Wagstaff (1999) only extract “base markables” (i.e. simplex markables, without
appositions or the like) and thus, the recall is too low. In the other variant, they adjust the results on the
pure resolution of “base markables”.
As evaluation score, they use MUC (Vilain et al., 1995) (cf. (2.6.1)). The coreference radius r is set to
4. In table 2.5, the MUC-f-measures for all four settings are shown. The clustering results are compared
to three baseline systems: (1) all markables are coreferent with each other; (2) two markables corefer,
if they match in at least one word; (3) two markables corefer, if they match in their head word. The
clustering algorithm outperforms all baseline system.
Algorithm
Clustering
All One Class
Match Any Word
Match Head Noun
Dryrun Data Set
Official Adjusted
52.8
64.9
44.8
50.2
44.1
52.8
46.5
56.9
Formal Run Data Set
Official
Adjusted
53.6
63.5
41.5
45.7
41.3
48.8
45.7
54.9
Table 2.5.: F-measure results for the clusterer and some baselines on the MUC-6 datasets
15
2.5. Coreference resolution for German - approaches in the
past 10 years
Most work about supervised coreference resolution has been done for English like in (Soon et al., 2001),
(Ng and Cardie, 2002) or (Yang et al., 2003). One reason for preferring English was the availability
of coreferentially annotated corpora like ACE (Walker et al., 2006) and OntoNotes (Weischedel et al.,
2008). Given suitable German corpora like TüBa-D/Z (Hinrichs et al., 2005b) it becomes possible to
research the German coreference resolution.
Broscheit et al. (2010b) mention some increasing efforts to the development of a robust coreference
resolution system for German in the past years as in (Stuckardt, 2004), (Schiehlen, 2004), (Kouchnir,
2004) and (Hinrichs et al., 2005a), in particular for anaphora resolution. Versley (2006) researched
names and definite noun phrases. The full coreference resolution task has been worked on by Hartrumpf
(2001), who uses a collection from the German newspaper Süddeutsche Zeitung, that has been annotated
according to MUC guidelines, Strube et al. (2002), who use a corpus containing 242 short German
texts about sights, historic events and persons in Heidelberg, Klenner and Ailloud (2009) and Broscheit
et al. (2010b), who use TüBa-D/Z. As this diploma thesis focusses on the full coreference resolution, this
section only regards the last five sources in detail and will subsequently present the respective approaches.
2.5.1. Coreference Resolution with Syntactico-Semantic Rules and
Corpus Statistics - (Hartrumpf, 2001)
Hartrumpf (2001) presents a hybrid approach (“CORUDIS - COreference RUles with DIsambiguation
Statistics”) that combines “syntactico-semantic” rules with statistics that are derived from an annotated
corpus. He uses the full MUC coreference task (Hirschman and Chinchor, 1997) in this hybrid approach.
As reason for using syntactico-semantic rules Hartrumpf (2001) mentions the exploitation of traditional
linguistic knowledge. One vote for using corpus statistics is the disambiguation of many alternatives that
would emerge with a purely rule-based approach.
Coreference rules
The syntactico-semantic rules define whether two markables are able to corefer. Thus, they “license
possible coreference”. The rules can be language-dependent or universal. Hartrumpf (2001) focusses on
a rule adaption for German.
Figure 2.3 shows an example of a rule Hartrumpf (2001) used for German coreference resolution. Each
rule has a unique name (id) (e.g. ident.n perspro for the identity between a noun and a personal
pronoun) and a premise (pre.), which is a conjunction of several constraints. Some of these concern
one constituent (i.e. a markable) and are called constituent constraints (c-constraints), whereas some
others concern both markables and are called interconstituent constraints (ic-constraints). The first two
constraints in figure 2.3 are c-constraints, whereas the other six ones are ic-constraints.
These constraints use among others the feature CAT that describes the syntactic category (e.g. n
for noun or perspro for personal pronoun) and the features NUM, PERS and GEND that describe the
grammatical attributes number, person and gender. The last feature used in figure 2.3 is called entity
and describes the “semantic classification comprising the semantic sort [. . . ] and semantic Boolean
features” (Hartrumpf, 2001).
Beside these features, there are several predicates used in the coreference rules. Two of them are used
in figure 2.3: =/2 and c-command/2. The first takes two values and returns TRUE in the case that they
are unifiable. The second also takes two arguments and returns TRUE, if the first argument c-commands
(cf. Government and Binding Theory) the second. The function not negates the value returned by a
predicate (e.g. TRUE→FALSE).
16
2.5. Coreference resolution for German - approaches in the past 10 years
id
pre.
desc.
exam.
ident.n perspro
(c1 cat) n
(c2 cat) perspro
(= (c1 num) (c2 num))
(= (c1 pers) (c2 pers))
(= (c1 gend) (c2 gend))
(= (c1 entity) (c2 entity))
(not (c-command c1 c2))
(not (c-command c2 c1))
same gender - anaphoric
Der Mann liest [das Buch]i . Er versteht [es]i nicht.
Figure 2.3.: A coreferene rule used by Hartrumpf (2001)
So, the coreference rule ident.n perspro licenses coreference between a noun headed markable
and a personal pronoun if they are unifiable with respect to number, person, gender and entity and none
of the markables c-command each other.
Coreference annotated corpus
Hartrumpf (2001) uses a collection from the German newspaper Süddeutsche Zeitung, that is annotated
with coreference information according to MUC guidelines adapted from English to German by inserting
SGML tags into the corpus.
An example of this is given in figure 2.4 decoding the sentence “Das Mädchen liest die Zeitung;
danach geht sie mit ihr ins Büro”:
<s><coref id=”125t129”><w>Das</w><w>Mädchen </w></coref>
<w>liest</w>
<coref id=”143t147”><w>die</w> <w>Zeitung</w></coref>
<w>;</w> <w>danach</w> <w>geht</w>
<coref ref=”125t129” type=”ident” ><w>sie</w></coref>
<w>mit</w>
<coref ref=”143t147” type=”ident” ><w>ihr</w></coref>
<w>ins</w> <w>Büro</w> <w>.</w></s>
Figure 2.4.: Coreference annotation in (Hartrumpf, 2001)
The algorithm for the coreference resolution system CORUDIS
The basis for the algorithm is constituted by three kinds of objects:
• All possible anaphors (i.e. all detected markables)
• All candidate antecedents for a markable mj (i.e. all markables preceding mj )
• All coreference rules (Hartrumpf (2001) uses 18 rules)
1. Markable detection and feature extraction:
Each sentence is parsed independently. If it fails, a chunk parser is used instead. In this case, no
full parse is available and thus, predicates like c-command/2 are ignored.
17
2. Collection of all possible coreference rule activations:
All rule premises are tested on all markable pairs, assuming that c1 precedes c2. As the rules
have disjoint premises, for each markable pair, there is at most one coreference rule activated.
3. Selection of one antecedent candidate for each anaphor:
a) All possible and licensed partitions are created incrementally:
The start point is a singleton anaphor. For this singleton cluster, each licensed antecedent
candidate is added separately to that cluster in order to get an extended partition. This process
stops when all possible anaphors have been investigated.
b) As the number of possible partitions given a non-tiny number of markables (cf. table 2.1) is
very huge, the partitions have to be filtered at an early stage in generation. Hartrumpf (2001)
mentions four criteria for this pruning step:
sentence distance The distance between two markables measured in sentences must be
below the limit for the respective coreference rule
paragraph distance The distance between two markables measured in paragraphs must
be below the limit for the respective rule. Usually, pronouns are at most 2 paragraphs
apart from its antecedent, whereas there is no limit for proper names.
semantic compatibility All markables in a cluster have to bear compatible semantics
(e.g. the entity feature).
partition scoring Alternatives with a low score will be discarded.
c) Hartrumpf (2001) describes the score for a partition in the following way: it is the sum of all
estimated probabilites for merging a currently investigated anaphor m with one antecedent
candidate out of C = hc1 , c2 , . . . , ci , . . . , ck i, where the index i indicate the distance to
m. Each coreference between m and ci is licensed by a rule ri . These three items can be
represented in a coreference alternative (m, ci , ri ).
In order to weight those coreference alternatives, they have to be transformed into a more
abstract version that can be compared with those in the corpus. This transformation from the
triple (m, ci , ri ) to a type-based representation is done by an abstraction function a:
a(m, ci , ri ) := (i, ri ) = ai (∈ A).
(2.2)
Consider A to be a list of abstracted coreference alternatives for a possible anaphor m.
Then, the probability that ai is the closest correct antecedent for m can be estimated by the
relative frequency (formula (2.3)), where f (i, A) is the absolute frequency of ai winning as
the closest correct antecedent in the context of abstracted coreference alternatives A:
f (i, A)
rf (i, A) := Pk
l=1 f (l, A)
(2.3)
For sparseness problems a backed-off estimation can be used: if for a context A, there are no
statistical values available, then the context gets scaled-down one by one until the frequency
becomes positive. If this is not the case at all, all candidates receive equal scores.
Finally, the (possibly backed-off) estimation rf b (i, A), where b indicates the number of backoffs (starting with b = 0), is used as estimation for the probability that ci is the closest correct
antecedent for m given the antecedent candidates C:
p(i | C) ≈ rf b (i, A)
18
(2.4)
The evaluation of CORUDIS
The evaluation is done by using a 12-fold cross-validation for 502 anaphors (Hartrumpf, 2001). Table
2.6 shows precision, recall and the harmonic f-measure for predicted coreference links. Three different
methods have been evaluated: (1) the full coreference task including the markable detection, (2) the
“markable-relative” method, that only uses successfully identified markables and (3) the baseline model:
selection of the closest licensed candidate that fulfills the aforementioned distance and compatibility
constraints.
method
evaluation results
(1) coreference (incl. markable identification)
(2) markable-relative coreference evaluation
(3) baseline: always closest candidate
precision
0.82
0.82
0.42
recall
0.55
0.76
0.46
f-measure
0.66
0.79
0.44
Table 2.6.: Coreference resolution results in (Hartrumpf, 2001)
As there have not been any German evaluation results for the MUC coreference task yet, a comparison
to other approaches has not been possible. But, as Hartrumpf (2001) argued, the presented results are
competitive compared with the f-measure (≈ 60%) for English in MUC-7.
Conclusion
This paper shows one of the first approaches to German coreference resolution. Nevertheless, Hartrumpf
(2001) achieves impressive results. There are few similarities between Hartrumpf (2001)’s architecture
and the one used for SUCRE (3.2). He implements coreference rules that discard markable pairs that
cannot represent coreference links. This and the features for pruning the possibilities of partitions can be
recovered in a way in the prefilters in SUCRE. Instead of using pure identity of feature values, Hartrumpf
(2001) implements a unification check. This could be a good way of solving some complications in
SUCRE that come up with the feature value unknown.
Hartrumpf (2001) uses some complex features, for instance the semantic sort, the extension type feature (it returns 0 in the case of an individual, 1 in the case of a set and 2 in the case of a set of a set)
or other complex features based on “extensional and intensional layer features like CARD (cardinality)”
(Hartrumpf, 2001). For implementing such features in SUCRE, there is need for some external information sources. Hartrumpf (2001) also uses some distance features, for instance the distance between the
respective markables in terms of sentences.
2.5.2. The Influence of Minimum Edit Distance on Reference Resolution (Strube et al., 2002)
Strube et al. (2002) use a coreference system that adjusts the algorithm from (Soon et al., 2001) for German data. They present some experiments of coreference resolution based on all anaphoric expressions
including definite noun phrases, proper names and personal, possessive and demonstrative pronouns.
They evaluated the performance of a given feature set on different types of NP-forms (e.g. pronouns,
proper names, definite NPs, . . . ). By adding two further features based on edit distance, Strube et al.
(2002) outperforms their first attempt with a significant improvement.
The data used in (Strube et al., 2002)
Strube et al. (2002) use a corpus containing 242 short German texts about sights, historic events and
persons in Heidelberg. The corpus has a total number of 36,924 tokens and the texts have an average
19
length of 151 tokens.
In the first part of the annotation, automatic POS-tagger and NP-Chunker have been used. The POStagging of the texts was done by using TnT (Brants, 2000). Afterwards, the markables have been detected
by the NP-Chunker Chunkie (Skut and Brants, 1998). The markables have been labeled with several
attributes like NP-form using TnT.
In the second part, the annotation is corrected manually and the coreference information as well as
further features like semantic class are annotated.
In the third part, Strube et al. (2002) create a suitable input to a machine learning algorithm by combining each anaphor with all potential antecedents. Afterwards, all pairs are discarded, if they fall into
one of the groups described below:
• The second markable is an indefinite noun phrase.
• One markable is embedded into the other one.
• Both markables have different semantic class annotations (given that none of the expressions is a
pronoun).
• Either markable is not annotated with 3rd person singular or plural.
• Both markables have different agreement values (given that the anaphor is a pronoun, as German
allows cases where a non-pronominal anaphor disagrees in grammatical gender)
After the filtering step, each pair consisting of an anaphor mj and its closest antecedent mi is labeled
as positive instance, whereas each pair of anaphor and non-antecedent that is closer to it than its closest
antecedent is labeled as negative instance. The other markable pairs (i.e. anaphors and non-antecedents
or true antecedents being further apart from each other than the closest antecedent) were not considered
at all (cf. (Soon et al., 2001)).
This results in 242 texts with 72,093 valid instances of markable pairs.
The initial feature set
Table 2.7 shows the initial feature set used by Strube et al. (2002). They divide their features into three
groups: (1) one feature on the document level with the respective document number, (2) four features for
antecedent and anaphor each, that check for the grammatical function, the form of the noun phrase, the
agreement attributes like person, gender or number and the semantic class like human, concrete object
and abstract object. There is need for the last feature since in German, the gender and the semantic class
do not always agree as it is the case in English (objects can be annotated as male or female in German).
This feature achieves the same as the gender feature in English.
20
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
Document level features
doc_id
document number (1 . . . 250)
NP-level features
ante_gram_func
grammatical function of antecedent (subject, object, other)
ante_npform
form of antecedent (definite NP, indefinite NP, personal pronoun,
demonstrative pronoun, possessive pronoun, proper name)
ante_agree
agreement attributes for person, gender, number
ante_semanticclass semantic class of antecedent (human, concrete object, abstract object)
ana_gram_func
grammatical function of anaphor (subject, object, other)
ana_npform
form of anaphor (definite NP, indefinite NP, personal pronoun, demonstrative pronoun, possessive pronoun, proper name)
ana_agree
agreement attributes for person, gender, number
ana_semanticclass semantic class of anaphor (human, concrete object, abstract object)
Coreference-level features
wdist
distance between anaphor and antecedent in words (1 . . . n)
ddist
distance between anaphor and antecedent in sentences (0, 1, > 1)
mdist
distance between anaphor and antecedent in markables (1 . . . n)
syn_par
anaphor and antecedent have the same grammatical function (yes, no)
string_ident
anaphor and antecedent consist of identical strings (yes, no)
substring_match
one string contains the other (yes, no)
Table 2.7.: The initial feature set by (Strube et al., 2002)
The first evaluation
As classifier, Strube et al. (2002) use a C5.0 decision tree “with standard settings for pre and post pruning”. Features that have discrete values, are used in a binary way (e.g. ante_npform, here, a binary
feature can ask Is ante_npform in {PPER,PPOS,PDS}?).
They used a 10-fold cross validation and achieved a f-measure of 59.97% (precision ≈ 88.6%, recall ≈
45.32%). As this result is not satisfying, Strube et al. (2002) investigated the performance of the features
and figured out that the feature no. 7, ana_npform, is the most important one. In the next step, they split
the entire dataset into subsets that only contain markable pairs with anaphors that belong to a particular
form. The classifier is trained on each of those data sets. As table 2.8 shows, the worst performance is
achieved by definite noun phrases (defNP) and demonstrative pronouns (PDS) with about 15% f-measure.
The results for proper names (NE) are moderate with 65.14%. Only the personal pronouns (PPER) and
possessive pronouns (PPOS) perform well with an f-measure of 82.79% and 84.94%.
Considering the three problematic data sets, the demonstrative pronouns can be ignored as they appeared in only 0.87% of the positive cases, whereas definite noun phrases occur in 38.19% and the proper
names in 31.05% of the positive cases.
defNP
NE
PDS
PPER
PPOS
all
Precision
87.34%
90.83%
25.00%
88.12%
82.69%
88.60%
Recall
8.71%
50.78%
11.11%
78.07%
87.31%
45.32%
F-measure
15.84%
65.14%
15.38%
82.79%
84.94%
59.97%
Table 2.8.: The first evaluation in (Strube et al., 2002)
21
Revision of the inital feature set
As a first step in the revision, Strube et al. (2002) figure out why the dataset of the definite noun phrases
performs that bad. One reason, they suppose, is that the algorithm is based on surface features and
has no access to world knowledge. Another point is, that the string-based features like no. 14 and 15
(string_ident and substring_match) have a high precision but a low recall.
Thus, they try to find a way of improving the recall but not losing too much precision. Considering
some examples like <“Philips”,“Kurfürst Philip”>, <“vier Schülern”,“die Schüler”>, <“die alte Universität”,“der alten Universität”> or <“diese hervorragende Bibliothek”,“dieser Bibliothek”>, they attempt
to weaken the string-based features. As they prefer cheap features (without the need for a syntactic
analysis), they decide to use the minimum edit distance (MED) (Wagner and Fischer, 1974). It computes the similarity between two strings by the minimum number of edit operations (insertion, deletion,
substitution) that are necessary for transforming one string into the other.
They append two further features concerning MED to their initial feature set. One feature for each
editing direction. Although both directions have the same number of edit operations, they have different
values in the case that antecedent and anaphor have a different length. The new features are given in table
2.9, where m represents the length of the antecedent and n the length of the anaphor. (i + d + s) is the
total number of edit operations.
16.
17.
New coreference-level features
ante_med minimum edit distance to anaphor
ante_med = 100 · m−(i+d+s)
m
ana_med minimum edit distance to antecedent
ana_med = 100 · n−(i+d+s)
n
Table 2.9.: Revision of the feature set in (Strube et al., 2002)
The second evaluation
The second evaluation reveals some significant improvements. These are shown in parentheses in table
2.10. The overall f-measure gains about 8%. This improvement can be deduced to the much better
performance of the data sets defNP and NE with improvements in the f-measure about 18% and 11%.
For the demonstrative pronoun, there is no change and for the other two pronominal forms, there is a
slight deterioration.
defNP
NE
PDS
PPER
PPOS
all
Precision
69.26%
90.77%
25.00%
85.81%
82.11%
84.96%
Recall
22.47%
65.68%
11.11%
77.78%
87.31%
56.65%
F-measure
33.94% (+18.1%)
76.22% (+11.08%)
15.38% (±0.0%)
81.60% (−1.19%)
84.63% (−0.31%)
67.98% (+8.01%)
Table 2.10.: The second evaluation in (Strube et al., 2002)
22
Conclusion
In contrast to (Hartrumpf, 2001), the approach of Strube et al. (2002) show many similarities to SUCRE.
Both are guided by (Soon et al., 2001), i.e. they constitute a mention-pair model (cf. (2.3.1)). As corpus,
Strube et al. (2002) use a self-annotated corpus of German texts including the annotation of semanticclasses. SUCRE on the other hand does not provide semantic class annotation within the TüBa-D/Z
corpus. As pre-filters, Strube et al. (2002) use several features that have some disadvantages, when
applied on TüBa-D/Z: no markable pairs are considered in which m2 is indefinite. Although there is a
clear vote for disreference in this case, there are some coreference links with an indefinite anaphor in
TüBa-D/Z, for instance the markable pair <Robert Musil, eines Robert Musils> (cf. (4.3.1)). Moreover,
Strube et al. (2002) discard all markable pairs in which a markable is not in third person and thus ignoring
all direct speeches. Furthermore, they argue that a disagreement in gender with a pronominal anaphor
indicates disreference. Considering a pair like <ein Mädchen, sie>, this restriction cannot hold.
By adding semantic class annotion to SUCRE’s German dataset, the system could adapt Strube et al.
(2002)’s filter for semantic class disagreement.
The feature set used in (Strube et al., 2002) show several similarities with the one used in SUCRE (cf.
(4.1.3)). There are features for checking the NP form (e.g. whether the markable’s part-of-speech tag is
a pronoun, a named entity or a common noun) or for checking the syntactic function of the markables.
Strube et al. (2002) use three features for the distance between m1 and m2 , one in terms of words (or
tokens), one in terms of sentences and one in terms of markables. As will be shown in (4.2), the distance
features lead to problems in SUCRE.
Strube et al. (2002) show the necessity of a more complex string matching feature (e.g. by the use of
minimum edit distance). In the feature development provided by this diploma thesis, this insight will be
confirmed.
2.5.3. A Constraint-based Approach to Noun Phrase Coreference
Resolution in German Newspaper Text - (Versley, 2006)
General remarks
Versley (2006) presents a state of the art system that is based on hard constraints (i.e. constraints that
filter out impossible candidate antecedents) and soft constraints (indicators that, if cumulated, can vote
against a candidate being the right antecedent). Those constraints are weighted and the weights are
estimated with a maximum entropy model. Versley (2006) uses this system to get new insights in the
impact of “commonly used resolution constraints” and to explore new constraints that might support “the
resolution of non-same-head anaphoric definite descriptions”.
His exploration is based on the question, why there are so many differences in the resolution of
pronominal and non-pronominal anaphora with respect to accuracy. Is it due to the minor complexity of
pronominal resolution or does non-pronominal resolution contain "different kinds of non-anaphoricity"?
One reason for this difference is that some definite descriptions are unique in the context and thus not
anaphoric in the sense of anaphoricity (cf. (2.1.1)). However it is possible that these markables corefer if
they are mentioned several times. Using some heuristics for determining whether a markable is unique,
the resolution results can be improved (Versley, 2006).
Therefore, Versley (2006) focusses on coreference resolution of proper names and definite noun
phrases. He works on the TüBa-D/Z corpus (Hinrichs et al., 2005b).
The table of results
In order to see the impact of hard constraints, Versley (2006) uses upper and lower bounds for precision
and recall (Pmax , Rmax , Pmin , Rmin ) for each variant, based on the dataset that comes up with the
filtering by the hard constraint. In addition to the values for precision and recall, the system also returns
the perplexity of the classifier decisions.
23
The complete result table of (Versley, 2006) is given in appendix D. Subsequently, results about
diverse steps are mentioned sporadically.
The weights of the soft constraints
After filtering impossible candidates using the hard constraints, the remaining candidates are ranked with
the use of weighted soft constraints. Therefore, each candidate y is represented as a vector of numerical
features f (y), where f represents a function that maps the candidate to its feature vector representation.
These vectors are multiplied with the vector of feature weights w in a dot product in euclidean space,
h·, ·i, to get a final score for a candidate. Now, the task is to choose the candidate with the largest score
out of the candidate set Y (see formula 2.5):
ŷ
=
P̂ (y)
:=
arg maxhw, f (y)i
y∈Y
ehw,f (y)i
P hw,f (y0 )i
e
(2.5)
(2.6)
y 0 ∈Y
“To choose the constraint weights”, he interprets the score as a loglinear model (see formula 2.6). For
more details on this loglinear model, see (Versley, 2006).
The soft constraints and their performance
In knowledge-poor approaches like in (Strube et al., 2002) (cf. (2.5.2)), the coreference of nouns and
proper names are determined by string or substring match.
Versley (2006) uses both knowledge poor and knowledge rich (i.e. using semantic classes and relations) constraints for the resolution of nominals.
Due to German compounding and morphology, Versley (2006) considers two markables having the
same head if they share at least one letter-4-gram. This leads to an upper bound recall of 76.5%.
In order to raise upper bound precision, he adds a constraint for checking the number agreement, that
yields a precision of 52,1% (which is the same as for identical head).
Additionally, Versley (2006) uses some heuristics due to Vieira and Poesio (2000) for filtering out
markable pairs that share the same concept but have different modifiers (das blaue Auto vs. das rote
Auto). This improves the upper bound precision to 56.4%.
As proper names are different to common nouns (e.g. they uniquely refer to an entity), they will be
treated differently. They only match, if they both share the same name and not only the same common
noun (e.g. Bundeskanzler Helmut Kohl vs. Bundeskanzler Gerhard Schröder). Proper names occur very
often “with modifiers that are indicative for uniqueness” and thus, modifiers are disregarded in the case
of a proper name.
Apart from ranking candidates by their sentence distance, it is possible to ignore them, if they are two
far apart from the anaphor, using an appropriate hard constraint, which improves the overall precision
by more than 5%, up to total of 67.8%. Another possibility is used by Vieira and Poesio (2000) (“a
loose segmentation”). Here, an antecedent is considered, if it is within a window of a certain number of
sentences or it has been mentioned several times. This yields a similar precision but a smaller loss of
recall (Versley, 2006).
Besides the resolution of same-head cases, Versley (2006) involves the so-called coreference bridging
(Vieira and Poesio, 2000) that enables matching markables with different heads (cf. example (4)). The
entities mentioned in m1 and m2 are again mentioned in m3 and m4 but this time, they are neither
pronominal, nor in a same-head relation but in a “semantically poorer form” (e.g. the female pedestrian
vs. the woman) or synonymous (the car vs. the automobile).
24
(4)
a.
b.
Lebensgefährliche Körperverletzungen hat sich [eine 88jährige Fußgängerin]m1 bei einem
Zusammenstoß mit [einem Pkw]m2 zugezogen.
[Die Frau]m3 hatte [das Auto]m4 beim Überqueren der Waller Heerstraße offensichtlich
übersehen.
In the coreference bridging, the grammatical gender may differ as it is the case with m2 (D ER Pkw)
and m4 (DAS Auto). A further problem is given with unique descriptions, that only corefer “if they are
repeated verbatim” (Vieira and Poesio, 2000). But this restriction might be loosened by the discourse
context or world knowledge as it is shown in example (5). Here, m5 and m6 are unique descriptions and
not verbatim identical but corefer with respect to world knowledge.
(5)
a.
b.
[Nikolaus W. Schües]m5 bleibt Präsident der Hamburger Handelskammer.
[Der Geschäftsführer der Reederei “F. Laeisz”]m6 wurde gestern für drei Jahre wiedergewählt.
There is no chance for treating the case exemplified in (5) but semantic relatedness as in (4) can be
handled in the following way:
Versley (2006) classifies all markables into five semantic classes (PERSON, ORGANIZATION, EVENT,
TEMPORAL-ENTITY and OTHERS) and involves features for semantic class match. Additionally, he
enables a “more fine-grained lexical knowledge” with the use of a graph distance measure for the hypo/hypernymy in the GermaNet graph together with a “hard recency limit of 4 sentences, number agreement” and some heuristics for unique descriptions like in (Vieira and Poesio, 2000).
Another feature, that involves an approximation of the information status, is one that checks for the
syntactic role. Discourse-given referents usually occur sentence-initially (i.e. in the canonical subject
position). Thus, by approximating the theme (rather than the rheme) in a sentence, one can conclude that
the subject is more likely to be discourse-old than objects or preposition phrases.
At last, Versley (2006) uses a statistical model for the selectional preferences for verbs based on “the
intuition that it should be possible to exchange two coreferent descriptions against each other” in the
same context: 11 million sentences are parsed with a PCFG parser (Versley, 2005) in order to get the
pairs <subject,verb> and <object,verb>. Afterwards, models for both relations are trained with the use
of the soft clustering system LSC1 , based on the EM-algorithm.
Versley (2006) computes the logarithm q of the probability of how well the anaphor fits in the contexts
of the antecedent and vice versa. If the antecedent is likely occurring in the context of the anaphor or vice
versa, then q is near or even above zero, whereas if one markable does not fit in the other’s context, then q
is negative. Example (6) shows a negative and a positive instance. In (6a), the noun Arbeiterwohlfahrt as
a subject for the verb entlassen cannot be exchanged by the noun Mark as a subject for the verb fliessen.
This leads to a negative q-value of −5.9. On the other hand, the noun Siegerin as an object of the verb
disqualifizieren can be exchanged by a person name as subject of the verb landen. This leads to a positive
q-value of +1.0.
(6)
a.
b.
<ArbeiterwohlfahrtSU BJ , entlassen>
<MarkSU BJ , fliessen>
q = −5.9
<SiegerinOBJ , disqualifizieren>
<PERSONSU BJ , landen>
q = +1.0
The results compared to (Strube et al., 2002)
The final version of Versley (2006)’s coreference system has 70% overall recall and 61.0% overall
precision. The complete results are listed in table D.1 in appendix D.
1 http://www.ims.uni-stuttgart.de/tcl/SOFTWARE/LSC.html
25
Table 2.11 shows a comparison of the performance of (Strube et al., 2002) (cf. table 2.10 in (2.5.2))
and (Versley, 2006) with respect to proper names and definite noun phrases.
Proper names
Definite noun phrases
f-score
(Strube et al., 2002) (Versley, 2006)
76.22%
87.5%
33.94%
46.9%
Table 2.11.: Performance comparison of (Strube et al., 2002) and (Versley, 2006)
So, Versley (2006) outperforms Strube et al. (2002) in the categories proper names and definite noun
phrases.
Conclusion
Among the approaches presented in this diploma thesis, (Versley, 2006) firstly uses the TüBa-D/Z corpus
for German coreference resolution. He implements hard and weighted soft constraints that are reminiscent of the prefilter features and the vector features used in SUCRE (cf. (4.1.2),(4.1.3)).
Versley (2006) focusses on the feature design of proper names and definite noun phrases and thereby
disregarding pronouns, whereas within this diploma thesis, SUCRE performs a full coreference resolution.
Here again, the substring matching feature is more specified than just having an exact string match or
a substring match. Versley (2006) checks for a common 4-gram between the heads of m1 and m2 . So,
he also realized that a simple string matching feature is not expressive enough.
One interesting soft constraint, Versley (2006) implements, is the checking for noun phrases with same
concept but different modifiers (das blaue Auto vs. das rote Auto) using a heuristic of Vieira and Poesio
(2000). This extension could also raise the performance of SUCRE, although the tripartite markable word
access (i.e. first word, head word, last word) might be insufficient (e.g. for the prenominal adjectives).
Versley (2006) uses a very impressive implementation of the so-called “coreference bridging”, including semantic classes and semantic relations by applying a distance measure for the hyponomy relation in
GermaNet. This implemention can also be useful for detecting semantic relations in SUCRE (cf. (4.4.2)).
By following Versley (2006), as discourse-given entities usually occur in the canonical subject position, an approximation of the information status can be done by checking whether m2 is a subject. This
feature will be addressed again in chapter 5.
2.5.4. Optimization in Coreference Resolution Is Not Needed: A
Nearly-Optimal Algorithm with Intensional Constraints - (Klenner
and Ailloud, 2009)
The architecture
Klenner and Ailloud (2009)’s architecture contains two parts. The first part constitutes the memory-based
pairwise classifier, TiMBL, (= Tilburg Memory-Based Learner), (Daelemans et al., 2003), whose output
is the input for the second part, a modification of a Zero-One Integer Linear Programming system (ILP)
based on the Balas Algorithm (Balas et al., 1965). This system is used as clustering method for searching
for a global optimum of consistency of the resulting coreference partition.
The corpus
The data is extracted from the TüBa-D/Z corpus (Hinrichs et al., 2005b). They use a corpus version with
about 1,100 German newspaper texts and about 25,000 sentences. In their statistics, Klenner and Ailloud
26
(2009) mention 13,818 anaphoric relations, 1,031 cataphoric relations (that is the case when the anaphor
precedes the antecedent) and 12,752 true coreferent markable pairs. Considering the pronouns, there
are 3,295 relative pronouns, 8,929 personal pronouns, 2,987 reflexive pronouns and 3,021 possessive
pronouns.
The feature set
For the use of the pairwise classifier TiMBL, Klenner and Ailloud (2009) use the following features:
• The distance in terms of sentences and markables
• The part of speech tag of the markables’ heads
• The grammatical functions (i.e. subject/object/. . . )
• The agreement in grammatical functions (e.g. <subject + subject>, <object + object>, . . . )
• String match between the markables’ heads
• Which markable (if any) is a pronoun?
• The word form, given that the part of speech tag of a markable is a pronoun
• The salience of non-pronominal markables
• The semantic class of the markables’ heads
Figure 2.5.: Feature set used in (Klenner and Ailloud, 2009)
Training instance creation
As the dataset contain several long texts, the question comes up, how to create training samples (i.e.
coreference and disreference links). Since, for example two personal pronouns er, one at the beginning
and one at the end of a text, do not indicate coreference, given that there are not enough further mentions
of that referent inbetween, building “a long chain of coreference ’renewals’ that lead somehow from the
first” er to the second.
Thus, the instance generation algorithm of Klenner and Ailloud (2009) uses a 3-sentence window
within which the candidate pairs are generated.
The need for global constraints
A main advantage of the ILP approach is that “prescriptive linguistic knowledge” can be modeled to
global constraints over a set of possible solutions. ILP finds the optimal result out of this set taking the
global constraints into account. One such global constraint is transitivity (cf. 2.1, 2.3.2). The following
example (a translation of an example given by Klenner and Ailloud (2009)) illustrates the need for the
transitivity constraint, in particular when other pairwise restrictions like the binding theory are involved:
(7)
Erm1 erzählt ihmm2 , dass erm3 ihnm4 zutiefst bewundert.
Based on the binding theory, m1 and m2 as well as m3 and m4 are disreferent. That can be modeled as a feature for a pairwise classifier (e.g. it checks for a violation of a binding constraint). One
weak point of the pairwise classifier is its local perspective. It cannot control the consistency of three
or more markables’ coreference state. So, the classifier can rightly predict m3 and m4 to be disreferent but it might falsely predict both (m1 and m3 ) and (m1 and m4 ) to be coreferent, since there
27
is no binding constraint preventing that step. Thus, keeping in mind the symmetric nature of coreference (i.e. coref (m1 , m3 ) ⇔ coref (m3 , m1 )), there is need for the global constraint transitivity:
(coref (m3 , m1 ) ∧ coref (m1 , m4 )) ⇒ coref (m3 , m4 ). But coref (m3 , m4 ) violates the binding constraint. Therefore, it is not possible that both m3 and m4 are coreferent with one and the same third
markable.
A brief once-over in ILP
The basic idea of ILP is to minimize “a weighted linear function” (the so-called objective function) of
variables xi :
F (x1 , . . . , xn ) = w1 x1 + . . . + wn xn
(2.7)
In the case of Zero-One-ILP, the variables xi are binary (i.e. they can only be instantiated by 0 or 1).
Balas’ approach (Balas et al., 1965) organizes the addends in F in a sorted way according to the weights
wi . This way, two basic principles can be followed by minimizing the objective function:
• A solution with as few 1’s as possible is preferred.
• If a global constraint makes a variable xi getting 1, then the index i should be very small.
The algorithm uses a depth-first strategy in the search tree and checks for any constraint violation “of
the branches partially explored”. Additionally, it balances the minimal cost min so far and the cheapest
solution yielded by following the current branch. If setting all variables xm for m > i to 0 returns a
cheaper solution than min, it is usefull to follow that branch, otherwise, the algorithm uses backtracking
for finding other solutions.
This leads to an exponential run time complexity in the worst case.
The constraint-based ILP model of Klenner and Ailloud (2009)
Within the ILP framework, the probabilities for the markable pairs, that are returned by TiMBL, are used
as weights. Klenner and Ailloud (2009) define the classification costs in formula 2.8, where |negij | is
the number of negative samples that are similar to a markable pair hi, ji according to TiMBL’s metric,
whereas |posij | is the respective number for positive examples:
wij =
|negij |
|negij ∪ posij |
(2.8)
If there are few or even no negative instances, the cost wij is small, but if there are much more negative
instances than positive ones, the cost gets high.
Now, Klenner and Ailloud (2009) propose an objective function (fomula 2.9), where O0.5 is the set
containing the pairs hi, ji with a weight less or equal to 0.5 and cij represents the binary variable of
setting the markable pair hi, ji (i < j) to coreferent, whereas cji is the complement of cij (i.e. the
boolean expression for the disreference of hi, ji). The weights for the disreference variables are the
complement to 1.0 (i.e. (1 − wij )). This enables the penalization of setting cji to 1, when wij ≤ 0.5.
min :
X
wij · cij + (1 − wij ) · cji
(2.9)
hi,ji∈O0.5
The global constraints
Given the notions cij and cji , Klenner and Ailloud (2009) can describe some global constraints in terms
of equations of sums of cij (cf. table 2.12).
28
#
1
2
3
4
Constraint
Exclusivity
Clause bound
NP bound
Transitivity
5
BEC
Equation
∀hi, ji ∈ O0.5 (cij + cji = 1)
∀i, j(clause_bound(i, j) ⇒ cji )
∀i, j(np_bound(i, j) ⇒ cji )
∀i, j, k(i<j<k) (cij + cjk ≤ cik + 1)
∀i, j, k(i<j<k) (cik + cjk ≤ cij + 1)
∀i, j, k(i<j<k) (cij + cik ≤ cjk + 1)
∀j(pos(j) ∈ {P
PP OSAT, P RELS}
⇒
cij ≥ 1)
Description
A pair hi, ji is either co- or disreferent.
A pair hi, ji is disreferent if clause_bound
A pair hi, ji is disreferent if np_bound
(coref (i, j) ∧ coref (j, k)) ⇒ coref (i, k)
(coref (i, k) ∧ coref (k, j)) ⇒ coref (i, j)
(coref (j, i) ∧ coref (i, k)) ⇒ coref (j, k)
Boundness enforcement constraint: Every
rel. or poss. pronoun has an antecedent
i
Table 2.12.: The global ILP constraints in (Klenner and Ailloud, 2009)
The exclusivity constraint enforces that any markable pair has to be either coreferent or disreferent. If
it is not coreferent, it has to be disreferent and vice versa.
For the case of binding theory, Klenner and Ailloud (2009) introduce two new predicates: clause_bound
and np_bound. If two markables occur in the same subclause (see m3 and m4 in example (7)) and none
of them is a reflexive pronoun, a possessive pronoun or is an apposition of the other, they are clause
bound and thus disreferent. If two markables occur in the same noun phrase (e.g. ihri Autoj ), they are
np bound and also disreferent.
As shown before, the transitivity can be used for some shortcomings concerning the local perspective
of a pairwise classifier. As symmetry cannot be modeled in terms of cxy (since cyx = ¬cxy ), the second
global constraint, transitivity, is split up in three equations. One problem comes up by treating this
n!
constraint extensional, as one have to check it over the whole candidate set. This results in 3!(n−3)!
·3
equations, given n markables.
The so-called boundness enforcement constraint forces an anaphor that constitutes a relative or possessive pronoun to be coreferent with at least one antecedent.
Klenner and Ailloud (2009) figured out that in the case of a generic ILP model (Klenner and Ailloud,
2009; Althaus et al., 2004) most of the constraints in table 2.12 can be used intensionally and thus run
time complexity can be reduced. For instance, one can treat transitivity intensionally by only maintaining evolving clusters: if the algorithm tries to add a new markable to an existing cluster, it has to be
compatible with all members. Compatibility can be defined in terms of grammatical attributes and the
binding theory: two markables are compatible, if they agree in several grammatical attributes like person,
number or gender and if they are neither clause bound nor np bound.
Thus, instead of checking all markable pairs, Klenner and Ailloud (2009) check the markable pairs
only “on demand”.
Optimization is not needed
The experiments concerning the optimization described by Klenner and Ailloud (2009) are skipped, as
they do not provide relevant information for this diploma thesis. In short, Klenner and Ailloud (2009)
compare the result of the first iteration of Balas’ algorithm (i.e. “Balas-First”) with the following results
in the optimization step with respect to the objective function in formula 2.9. They figured out “that in
more than 90% of all cases, Balas-First already constitutes the optimal solution” , which means “that
the time-consuming search for a less expensive solution ended without further success” (Klenner and
Ailloud, 2009). Moreover, the F1 -score of the optimal solution was even slightly worse than the F1 score of Balast-First (about 0.086%).
Therefore, Klenner and Ailloud (2009) argue that there is no need for optimization in coreference
resolution using ILP. For more details on their argumentation and on their experiments leading them to
this controversial conclusion, see Klenner and Ailloud (2009).
29
Evaluation
For the evaluation, Klenner and Ailloud (2009) use a 5-fold cross-validation with two variants: all markables vs. true markables. The results are computed for the evaluation score CEAF (cf. (2.6.3)) and
compared with the TiMBL baseline, obtained by adding all markables appearing in a predicted coreference link to the same cluster. Table 2.13 shows the results. The improvement in terms of F1 -score in
the variant all markables is 2.4% and in the variant true markables, it is even 7.43%. Here, Balas-First
obviously outperforms the results of the sole pairwise classifier TiMBL.
Precision
Recall
F-measure
all markables
TiMBL B-First
66.52
72.05
57.76
58.00
61.83
64.27
true markables
TiMBL B-First
73.81
84.10
69.28
74.31
71.47
78.90
Table 2.13.: CEAF-Results in (Klenner and Ailloud, 2009)
Conclusion
The approach by Klenner and Ailloud (2009) impressively show the necessity of a good clustering
method, when using a mention-pair model. As pairwise classifier, they apply the memory-based classifier TiMBL. Afterwards, they use an ILP framework for creating coreference chains that are in line with
some global constraints which cannot be regarded in clustering methods like the best-first clustering,
that SUCRE uses. For instance, the global constraints clause_bound cannot be captured by best-first
clustering: given the markables m1 , m2 and m3 , where m2 and m3 are in the same subclause, and the
classifier’s output: <(m3 ,m2 ),0.2>, <(m3 ,m1 ),0.6> and <(m2 ,m1 ),0.7>. Then, best-first clustering
would chain m3 and m1 and then m2 and m1 . Transitively m2 and m3 end up in the same cluster and
thereby violate a clause bound constraint.
Therefore, a further improvement in SUCRE might be instead of best-first clustering the use of an ILP
framework or any other clustering method providing the inclusion of global constraints.
2.5.5. Extending BART to provide a Coreference Resolution System for
German - (Broscheit et al., 2010b)
In contrast to coreference systems that concern English and those systems of Hartrumpf (2001) (2.5.1),
Strube et al. (2002) (2.5.2), Versley (2006) (2.5.3), Klenner and Ailloud (2009) (2.5.4) and others,
Broscheit et al. (2010b) provide a freely available system that enables researcher to explore new coreference techniques for German.
The architecture of BART
The architecture is based on the toolkit BART (Versley et al., 2008), that was originally implemented
as a “modularized version” of (Versley, 2006) and others. It combines these state-of-the-art approaches
with features based on syntax and semantics. Thus, its design is very modular and enables to separately
do feature engineering with the use of different knowledge sources and improve coreference resolution
as a machine learning problem.
Broscheit et al. (2010b) extend BART for the use of German coreference resolution.
30
The corpus
For the extension of BART for coreference resolution in German, there is need for a German dataset.
For this reason, Broscheit et al. (2010b) use version 4 of TüBa-D/Z corpus (Hinrichs et al., 2005b). This
version of TüBa-D/Z comprises 32,945 sentences with 144,942 markables. These markables constitute
52,386 coreference links and 14,073 clusters (Broscheit et al., 2010b).
The classifier
Broscheit et al. (2010b) use a pairwise classifier whose input are feature vectors of markable pairs as
proposed by Soon et al. (2001). They apply several methods for classifiying markable pairs. Beside J48,
they apply an implementation of the C4.5 decision tree learning algorithm, a Maximum entropy classifier
and an architecture that contains a separate classifier for pronouns and non-pronouns (“split”).
Training instance creation
In the preprocessing step, they convert the corpus into the MMAX2 data format (Müller and Strube,
2006), that is used in BART. The markables and their grammatical attributes are extracted by using the
information given in the parse trees (i.e. minimal and maximal noun projections, number, gender, person,
semantic class, . . . ). Markables whose grammatical function is among the ones below are excluded from
the final markable set:
• Appositions and additional parts of a name (e.g. doctoral degree):
[Ute Wedemeier]m1 ,[stellvertretende Vorsitzende der AWO]m2 becomes one markable with respective spans.
• Expressions constituting predicates in a copula construction. Those have the dependency label
PRED:
[John]m1 ist [ein Bauer]m2
• NPs that are governed by the comparative or predicative conjunction als:
[Peter]m1 arbeitet [als Bauarbeiter]m2
• Vorfeld-es and other non-referring es-pronouns
[Ich]m1 finde [es]m2 schade, . . .
Pronouns as it for English and es for German are very often non-referring.
Broscheit et al. (2010b) create the feature vectors as described in Soon et al. (2001).
The feature set
Broscheit et al. (2010b) reimplement the feature set used in (Klenner and Ailloud, 2009) (cf. figure 2.5)
with distance, part-of-speech, grammatical function and string matching. Features for binding theory,
that had been implemented as ILP constraints in (Klenner and Ailloud, 2009), are reimplemented as
features for the binary classifier. Additionally, they use the semantic class approach, proposed by Versley
(2006) (cf. (2.5.3)).
Broscheit et al. (2010b) use three methods for determining the semantic class: (1) they look up the
semantic class in an appropriate lexicon like GermaNet; (2) in case of a proper name, the markables
are checked for honorifics (e.g. Dr. phil.), organizational suffixes (e.g. GmbH) and the like. Finally, a
gazetteer lookup is done (Broscheit et al., 2010b). (3) At the end, the markables are checked for morphological patterns like acronyms (e.g. CDU) or binnen-I gender-neutral forms (e.g. SchneiderInnen).
Broscheit et al. (2010b) add some further features for the coreference in German:
1. 1/2 person: returns TRUE, if both markables are first or second person, and FALSE otherwise.
31
2. Speech: returns TRUE, if both markables are inside quoted speech, and FALSE otherwise.
3. Node distance: returns the number of clause and PP nodes along the path between the markables
in the parse tree
4. Partial match: returns TRUE, if there is a substring match between the markable’s heads, and
FALSE otherwise.
5. GermaNet relation: returns the relation between the markables in GermaNet (i.e. NOT_RELATED,
SIGNIFICANTLY_RELATED or STRONGLY_RELATED) (for more details, see (Broscheit et al.,
2010b)).
The evaluation results
In the creating of the final partition, Broscheit et al. (2010b) use closest-first clustering of the instances
classified as coreferent. They apply the first 1100 documents from the TüBa-D/Z corpus and evaluate
using 5-fold cross validation.
The evaluation results of Broscheit et al. (2010b) are given in table (2.14):
Feature set
Best baseline (MaxEnt “split”)
+ 1/2 Person
+ Node distance
+ Partial match
+ GermanNet relation
+ all features
+ Klenner and Ailloud (2009)
MUC scorer
P
R
F1
75.6 80.8 78.1
76.2 80.9 78.4
75.7 80.9 78.2
77.8 81.3 79.5
76.4 80.6 78.5
78.4 82.2 80.2
-
P
63.2
63.6
63.3
64.4
63.0
66.3
69.3
CEAF
R
67.0
67.4
67.1
68.3
66.8
70.3
73.8
F1
65.0
65.4
65.1
66.3
64.8
68.3
71.5
Table 2.14.: Evaluation results of Broscheit et al. (2010b)
This table shows that given the best baseline feature set using Maximum entropy classifier separated
by markable class (“split”) the additional features increase the performance but nevertheless the system
does not outperform the one developed by Klenner and Ailloud (2009) (cf. table 2.13).
Conclusion
Among SUCRE and four other systems, BART was participant on the SemEval-2010 coreference resolution competition (cf. (2.7)). It cannot be compared with SUCRE as it uses external information
sources like GermaNet and gazetteer lookups, whereas SUCRE only performs “closed”. It is based on
the TüBa-D/Z corpus and also uses semantic class information. The best performance is done with the
maximum entropy classifier (rather than the decision tree method used in SUCRE). A possible prefilter
used in (Broscheit et al., 2010b) is the check for predicates in a copula construction. Such a feature will
be checked in chapter 5.
Moreover, Broscheit et al. (2010b) mention an easy way for filtering non-referring es-pronouns from
the markable set. However, this is not applicable in SUCRE and thus, based on this lack of annotation
for es-pronouns, there will be false positives as described in (4.3.6).
Broscheit et al. (2010b) implements the binding theory mentioned in (Klenner and Ailloud, 2009)(see
2.5.4) as a feature for the pairwise classifier. Such a feature based on the clause-bound property described
in (Klenner and Ailloud, 2009) will be checked in chapter 5.
Furthermore, the feature that checks, whether both markables are in first or second person, will also
be implemented in chapter 5.
32
2.6. Evaluation scores
The feature which checks, whether the respective markable is in quoted speech, cannot be implemented
in SUCRE as the relational database model described in (3.2.3) does not provide information about the
context of a markable (i.e. what tokens (e.g. quotation marks) are before or after).
There is a trend towards using several scores for evaluating a coreference system. This has the advantage
of working against the bias inherent in a particular score (Ng, 2010). None of the scores discussed below
is fully adequate and their measures are not commensurate (Recasens and Hovy, 2010).
Assume the following example (borrowed from (Recasens and Hovy, 2010)) of the output of a coreference system (figure 2.6) and the corresponding gold partition in (figure 2.7). Recasens and Hovy (2010)
treat the term entity as a set of coreferent markables.
Figure 2.6.: Example of a system partition
Figure 2.7.: Example of a gold partition
Given 14 true markables m1 , . . . , m14 , the coreference system returns singleton entities for m1 , m2 ,
m3 , m8 , m10 , m11 and m13 . It predicts the markables m4 to be coreferent with m6 and m5 to be
coreferent with m12 . Moreover, the markables m7 , m9 and m14 occur in a three-markable entity.
If this output is compared with the gold partition, there are some obvious errors: some links are missed
and others are wrongly predicted. For example: the entity S9 misses the markable m14 , that is present
in the corresponding true entity G11 . On the other hand, the predicted entity S10 contains this wrong
assigned markable m14 . The disreferent markables m4 and m6 (cf. G4 and G5 ) are erroneously linked
in S8 .
The issue of evaluating the performance of such a coreference system is the uncertainty of how comparing the true set of entities (figure 2.7) with the predicted set of entities (figure 2.6). The following
questions about evaluation lead to different evaluation measures (Recasens and Hovy, 2010):
• Shall the measure focus on the number of correct coreference links?
• Shall the measure use each equivalence class (entity) as the unit of evaluation?
• Are singleton entities rewarded the same way as it is done for multi-markable entities?
2.6.1. MUC
The official MUC-scoring (Message Understanding Conference) algorithm is developed by Vilain et al.
(1995). It is the oldest and most widely used measure. MUC was defined within the MUC-6 and MUC-7
evaluation tasks on coreference resolution (Recasens and Hovy, 2010). Its f-score is the harmonic mean
33
of precision and recall which are based on the identifcation of unique coreference links (Stoyanov et al.,
2009):
F1M U C =
2 · PM U C · R M U C
PM U C + RM U C
(2.10)
MUC is based on the idea that the minimum number of links that is necessary for setting the predicted
set of entities or the true set of entities (i.e. the sum of numbers of arcs in the spanning trees of the set’s
implicit equivalence graphs (Vilain et al., 1995)) is the total number of markables minus the number of
entities:
minimum number of links (system) = pl =
X
(|Si | − 1)
(2.11)
(|Gi | − 1)
(2.12)
Si
minimum number of links (gold) = tl =
X
Gi
where Si and Gi refer to the predicted and true entities (cf. figure 2.6, 2.7).
Based on these minimum numbers of links, pl and tl , MUC-score counts each necessary coreference
link that occurs both in the predicted set of entities and in the true set of entities (i.e. the number of
common links). “To obtain recall, this number is divided by the minimum number of links required to
specify” the gold partition. “To obtain precision, it is divided by the minimum number of links required
to specify” the predicted partition (Recasens and Hovy, 2010).
Although the minimum number of needed links is constant, “there are combinatorially many [...]
spanning trees for a given equivalence class” (Vilain et al., 1995). Thus, the notion of common links is
not unique. One way of computing recall is proposed by Vilain et al. (1995): for each true entity Gi the
number of missing links missed(Gi ) is computed as in equation (2.13):
missed(Gi ) = |p(Gi )| − 1
(2.13)
where p(Gi ) is the system’s partition of Gi (i.e. the set of predicted entities covering all markables in
Gi ).
The number of correct links in Gi , correct(Gi ) (cf. 2.12), is computed by |Gi | − 1.
The recall for Gi is computed as stated in (2.14):
RM U C (Gi ) =
correct(Gi ) − missed(Gi )
correct(Gi )
Summing over all true entities, the overall MUC recall is given in (2.15)
P
(|Gi | − |p(Gi )|)
Gi
RM U C =
tl
(2.14)
(2.15)
For scoring precision, Vilain et al. (1995) compute the same score but switch their “notion of where
the base sets come from”, i.e. they use Si instead of Gi in all equations:
34
missed(Si )
=
PM U C (Si )
=
PM U C
=
|p(Si )| − 1
correct(Si ) − missed(Si )
correct(Si )
P
(|Si | − |p(Si )|)
Si
pl
(2.16)
(2.17)
(2.18)
This means, that MUC-precision (PM U C ) counts the number of links that has to be removed in order
to get a graph in which no disreferent markables are connected, whereas MUC-recall (RM U C ) counts
the minimum number of links that has to be added to ensure that all markables referring to a given entity
are connected in the graph (Bengtson and Roth, 2008).
The MUC-score has two weak points. Since it is based on the minimum number of missing resp.
wrong links, the result is often counterintuitve: if one classifies a markable into the wrong entity, it will
be penalized with one error in PM U C and RM U C . On the other hand, if one falsely merges two entities,
it only counts as one error in RM U C , although it is further away from the real answer than the first case
(Recasens and Hovy, 2010). This problem results in a too lenient penalization with systems that return
“overmerged entities” (i.e. equivalence classes with too many markables).
The second problem arises based on the fact that the MUC-score only considers coreference links. The
score does not reward singleton clusters that are correctly identified, because there are no coreference
links in these clusters (Ng, 2010). If one adds a singleton entity to the predicted set of entities, the
MUC-score does not show any effect unless the added markable is misclassified in a multi-markable
entity.
2.6.2. B3 (B-Cubed)
The B3 (B-Cubed) f-measure (Bagga and Baldwin, 1998) scales the overlap of predicted clusters and
true clusters. It is the harmonic mean of precision (P) and recall (R):
PB 3
=
RB 3
=
1 X X cm
(
(
))
N
pm
d∈D m∈d
1 X X cm
(
( ))
N
tm
(2.19)
(2.20)
d∈D m∈d
F1B 3
=
2 · PB 3 · R B 3
PB 3 + R B 3
(2.21)
where cm is the number of markables both appearing in the predicted and in the true cluster of m; pm
is the number of markables in the predicted cluster of m and tm is the number of markables in the true
cluster of m. The documents d are out of a document set D and N represents the number of markables
in D.
The B3 f-measure is able to measure the effect of singleton entities and penalizes the clustering of too
many markables in the same entity. It gives more weight to the splitting or merging of larger entities.
Moreover, B3 gives equal weights to all types of entities and markables. (Bengtson and Roth, 2008)
Stoyanov et al. (2009) mention complications using B3 : it presumes that the gold standard and the
coreference system response are clusterings over the same set of markables. But this is not absolutely
true in the case when the system uses a markable detector (2.2) as it is the case with an end-to-end
coreference system. Stoyanov et al. (2009) propose to tag each markable m with twin(m) if it appears
both in the predicted partition and in the gold standard. Untagged markables are regarded as twinless.
3
They suggest two ways of using B3 with twinless markables. One way is called Ball
: here, all markables
are retained but for twinless extracted markables, the precision fraction is p1m and the recall fraction is
1
3
tm . The other way is called B0 : it rejects all twinless extracted markables but penalizes recall by setting
the corresponding recall fraction to 0.
Back to true markables, Recasens and Hovy (2010) describe the following shortcoming of B3 : the
score “squeezes up” too high in the case there are many singletons. As it rapidly approaches 100%, there
is little numerical space for comparing clusterings. Considering the high amount of singleton entities,
this issue becomes more substantial: in the Spanish AnCora-Es corpus, about 86% of all markables are
singletons and in the English ACE-2004 corpus, the proportion is about 61%.
35
2.6.3. CEAF
The CEAF (Constrained Entity-Alignment F-measure) algorithm was developed by Luo (2005). He
“considers that B3 can give counterintuitive results due to the fact that an entity can be used more than
once when aligning the entities” in the predicted set of entities and the true set of entities (Recasens and
Hovy, 2010). In example (2.6.5) the B3 -recall for the system’s output (c) (figure 2.11) is 100%, although
the true entities have not been found. On the other hand, in the predicted partition (d) (figure 2.12) the
precision is 100% although there are wrong predicted entities.
In CEAF, as Luo (2005) argues, there is the best one-to-one mapping between the entities of both
partitions. Every predicted cluster is mapped on at most one true entity. The best of such an alignment is
the one that maximizes a given similarity measure.
Depending on such a similarity measure, Luo (2005) distinguishes between the mention-based CEAF
(CEAF-M) and the entity-based CEAF (CEAF-E). CEAF-M is the most widely used CEAF-score. It
uses the φ3 similarity function employed by Luo (2005).
In the case of true markables, the precision PCEAFM and the recall RCEAFM are identical. They
correspond to the number of common markables between every two aligned entities divided by the total
number of markables.
Let φ be the alignment function that maps each predicted cluster Si on the most similar true entity and
let N be the total number of markables. Then precision, recall and thereby f-score is defined as:
PCEAFM = RCEAFM = F1CEAFM =
X |Si ∩ φ(Si )|
Si
N
(2.22)
Recasens and Hovy (2010) mention that CEAF lacks a good dealing with singleton entities as B3
does. This can be seen in the fact that the B3 and CEAF results are higher than MUC with corpora that
contain singleton markables. Another problem occurs with CEAF-E: in this way of alignment, correct
coreference links might be ignored if the entity finds no corresponding entity in the true set of entities
(Recasens and Hovy, 2010). A third problem is that all entites have equal weights, regardlessly of the
number of markables they contain. This results in an equal penalization for a wrong entity composed of
two small entities or composed of a small and a large entity (Recasens and Hovy, 2010).
2.6.4. BLANC
The “BiLateral Assessment of Noun-Phrase Coreference” is a variation of the rand index (Rand, 1971)
created to suit the coreference task addressing some observed shortcomings to obtain a fine granularity
that allows a better discrimination between coreference systems (Recasens and Hovy, 2010). It rewards
both coreference and disreference links by averaging the corresponding F-measures. It gives weight on
singletons (the main problem with MUC) and does not inflate the score with the singleton’s presence as
it is the case with B3 and CEAF (Recasens and Hovy, 2010).
BLANC is based on the rand index, that divides the sum of all coreferent (N11 ) and disreferent (N00 )
links that come up both in the predicted set of entities and in the true set of entities (i.e. N00 + N11 ) by
the number of all coreferent and disreferent links (i.e. the constant N (N2−1) , where N is the total number
of markables).
Rand =
N00 + N11
N (N − 1)/2
(2.23)
BLANC modifies this approach “such that every decision of coreferentiality is assigned equal importance” (Recasens and Hovy, 2010). This way, it addresses the disequilibria between coreferent markables
and singletons. In contrast to other evaluation measures which have to compare partitions with different
numbers of clusters (B3 ) or different numbers of coreference links (MUC), BLANC uses the fact that the
36
number of coreference links together with the number of disreference links constitute a constant value
across predicted set of entities and the true set of entities (Recasens and Hovy, 2010).
There are two kinds of “decisions” that best describe the intuition of BLANC:
• Coreference decision:
1. Coreference link (c): if the markable pair contains coreferent markables
2. Disreference link (d): if the markable pair contains disreferent markables
• Correctness decision:
3. Right link (r): if the markable pair is coreferent or disreferent both in the predicted set of
entities and the true set of entities
4. Wrong link (w): if the markable pair is coreferent in the predicted set of entities and disreferent in the true set of entities or vice versa
These two decisions can be combined to a judged coreference system output of a markable pair, that resembles a binary classifier’s output (true-positive, true-negative, . . . ): rc, wd, wc, rd. Table (2.15) shows
the BLANC confusion matrix containing these combinations. L corresponds to the constant number of
markable pairs (i.e. coreference + disreference links):
L=
N (N − 1)
= rc + wc + rd + wd
2
(2.24)
Predicted set of entities
True set
of entities
Sums
Coreference
Disreference
Coreference
rc
wc
rc + wc
Disreference
wd
rd
wd + rd
Sums
rc + wd
wc + rd
L
Table 2.15.: The BLANC confusion matrix
Considering the big amount of singletons the BLANC score is bilateral: the precision, recall and
f-measure is computed separately for coreference links and disreference links. Finally, the average of
both (i.e. the arithmetic mean) is the final score. Thus, neither coreference links nor disreference links
contribute more than 50% to the final score (Recasens and Hovy, 2010).
The formulas for the BLANC score are given in table (2.16):
Score
PBLAN C
RBLAN C
F1BLAN C
Coreference
rc
Pc =
rc + wc
Rc =
F1c =
rc
rc + wd
2 · Pc · R c
Pc + R c
Disreference
rd
Pd =
rd + wd
Rd =
F1d =
rd
rd + wc
2 · P d · Rd
Pd + Rd
PBLAN C =
Pc + Pd
2
RBLAN C =
Rc + Rd
2
F1BLAN C =
F1c + F1d
2
Table 2.16.: The BLANC scores
37
In some baseline partitions like in the case when the predicted set of entities or the true set of entities
contains only singletons or only a single entity, the denominators of Pc /Rd or Pd /Rc are zero and thus
these scores are undefined. For this reason, there are some special variations:
• If the predicted set of entities contains only a single entity and
– the true set of entities also contains a single entity ⇒ F1BLAN C = 100%
– the true set of entities contains only singletons ⇒ F1BLAN C = 0%
– the true set of entities contains links of both types ⇒ Pd , Rd , F1d = 0%
• If the predicted set of entities contains only singletons and
– the true set of entities also contains only singletons ⇒ F1BLAN C = 100%
– the true set of entities contains a single entity ⇒ F1BLAN C = 0%
– the true set of entities contains links of both types ⇒ Pc , Rc , F1c = 0%
• If the true set of entities contains both coreference and disreference links and
– the predicted set of entities contains no right coreference link (rc = 0) ⇒ Pc , Rc , F1c = 0%
– the predicted set of entities contains no right disreference link (rd = 0) ⇒ Pd , Rd , F1d = 0%
• If the predicted set of entities contains both coreference and disreference links and
– the true set of entities contains a single entity ⇒ F1BLAN C = Pc , Rc , F1c
– the true set of entities contains only singletons ⇒ F1BLAN C = Pd , Rd , F1d
There is still one weak point in BLANC that comes up when there are partitions near such baselines
sketched above. Assume that all links in the true set of entities are disreferent but one. The predicted
set of entities contains only disreferent links. Given a large set of markables, this should result in a good
score. The issue is that BLANC uses equal importance for the two types of links and thus the single
coreference link in the true set of entities gets equal weight as the disreferent ones. This leads to a too
strict penalization. One way of solving this problem is the introduction of a weighted BLANC-score
with a parameter α:
BLAN Cα = α · Fc + (1 − α) · Fd
(2.25)
In the default version of BLANC, the α parameter would be 0.5. By increasing α, the weight for
coreference links gets larger, by decreasing α (i.e. increasing (1 − α)) the weight for disreference links
gets larger. For the case above, using α = 0.1 would relax the severity.
2.6.5. A comparative example (borrowed from (Luo, 2005))
In the following section, an example of a true partition and four system partitions will be presented.
The coreference-link-based metric, M U C and the cluster-based metrics B 3 and CEAF as well as the
coreference/disreference link averaged score (BLAN C) are used to measure the performance of each
system.
38
Figure 2.8.: True partition
Figure 2.9.: System partition (a)
Figure 2.10.: System partition (b)
Figure 2.11.: System partition (c)
Figure 2.12.: System partition (d)
The comparison of the evaluation metrics when applied on the system outputs in figures (2.9) up to
(2.12) with the true entities in figure (2.8) is given in table (2.17).
MUC
The MUC-score considers the missing or wrong coreference links. In system partition (a) and system
partition (b) there are 9 links in common with the true partition. One further link is added and no link is
missing. Thus, the measures will be the following for system partition (a):
P
PM U C
=
RM U C
=
F1M U C
=
(|Si | − |p(Si )|)
Si
pl
P
(|Gi | − |p(Gi )|)
Gi
tl
PM U C + RM U C
=
(5 − 1) + (7 − 2)
9
=
= 90%
10
10
(5 − 1) + (2 − 1) + (5 − 1)
9
= = 100%
9
9
2 · 0.9 · 1
=
≈ 94.7%
0.9 + 1
=
(2.26)
(2.27)
(2.28)
39
and for system partition (b):
P
(|Si | − |p(Si )|)
(10 − 2) + (2 − 1)
9
Si
PM U C =
=
=
= 90%
pl
10
10
P
(|Gi | − |p(Gi )|)
(5 − 1) + (2 − 1) + (5 − 1)
9
Gi
=
= = 100%
RM U C =
tl
9
9
2 · 0.9 · 1
F1M U C =
=
≈ 94.7%
PM U C + RM U C
0.9 + 1
(2.29)
(2.30)
(2.31)
In system partition (c) all markables refer to one entity. There are also 9 links in common with the
true partition but two further links are added and again no link is missing:
P
(|Si | − |p(Si )|)
PM U C
=
RM U C
=
F1M U C
=
Si
pl
P
(|Gi | − |p(Gi )|)
Gi
=
(12 − 3)
9
=
≈ 82%
11
11
(2.32)
(5 − 1) + (2 − 1) + (5 − 1)
9
= = 100%
9
9
2 · 0.82 · 1
≈ 90.0%
=
0.82 + 1
=
tl
PM U C + RM U C
(2.33)
(2.34)
Here, it becomes apparent that overmerged clusters are underpenalized. In system partition (d) all
markables are singletons, that refer to unique entities. There are no links in common, no further links
added and all true coreference links are missing. In this case, all measures (precision, recall, f-score) are
defined to be 0%.
P
(|Si | − |p(Si )|)
PM U C
=
RM U C
=
F1M U C
=
Si
pl
P
(|Gi | − |p(Gi )|)
Gi
tl
=
=
(1 − 1) + . . . + (1 − 1)
= 0%
0
(5 − 5) + (2 − 2) + (5 − 5)
= 0%
9
2 · PM U C · RM U C
= 0%
P M U C + RM U C
(2.35)
(2.36)
(2.37)
B3 (B-Cubed)
The B3 -score counts the overlapped markables for a given entity and returns the average over all markables. Thus, in contrast to MUC, the coreference links are ignored. Subsequently, the true entity containing (1, . . . , 5) is called Gold1 , the true entity containing 6 and 7 is called Gold2 and the one containing
the other markables (8, . . . , C) is called Gold3 .
In system partition (a), the true entites Gold2 and Gold3 are merged. This has no effect on the B3 ’s
recall but on its precision:
PB 3
=
RB 3
=
1 X X cm
1
5
2
5
(
(
)) =
5· +2· +5·
=
N
pm
12
5
7
7
d∈D m∈d
1 X X cm
1
5
2
5
(
( )) =
5· +2· +5·
=
N
tm
12
5
2
5
d∈D m∈d
40
1 64
·
≈ 76.19%
12 7
(2.38)
1 12
·
= 100%
12 1
(2.39)
F1B 3
2 · 0.7619 · 1
2 · PB 3 · RB 3
=
≈ 86.5%
PB 3 + R B 3
0.7619 + 1
=
(2.40)
In system partition (b), the true entites Gold1 and Gold3 are merged. Again, this merging has no
effect on the recall but on the precision. This time, the number of wrong assigned markables is greater.
Thus, the precision is smaller than in the case above:
PB 3
=
RB 3
=
1 X X cm
1
5
2
5
1
(
(
)) =
5·
+2· +5·
=
· 7 ≈ 58.33%
N
pm
12
10
2
10
12
d∈D m∈d
1 X X cm
1
5
2
5
1 12
(
( )) =
5· +2· +5·
=
·
= 100%
N
tm
12
5
2
5
12 1
(2.41)
(2.42)
d∈D m∈d
F1B 3
2 · 0.5833 · 1
2 · PB 3 · RB 3
=
≈ 73.7%
PB 3 + R B 3
0.5833 + 1
=
(2.43)
In system partition (c), every true entity (i.e. Gold1 , Gold2 and Gold3 ) is merged. Like in the cases
before, the recall remains steady on 100%. The precision is again smaller than in the cases above:
PB 3
=
RB 3
=
1
5
2
5
1 54
1 X X cm
)) =
5·
+2·
+5·
=
·
= 37.5% (2.44)
(
(
N
pm
12
12
12
12
12 12
d∈D m∈d
2
5
1 X X cm
1
5
1 12
(
( )) =
·
= 100%
(2.45)
5· +2· +5·
=
N
tm
12
5
2
5
12 1
d∈D m∈d
F1B 3
2 · PB 3 · R B 3
2 · 0.375 · 1
=
≈ 54.5%
PB 3 + RB 3
0.375 + 1
=
(2.46)
In system partition (d), there are no merged true entities. Quite the contrary - each predicted entity is
a singleton. In this case, the precision is on 100% and the recall decreases rapidly:
B3
P
=
RB 3
=
1 X X cm
1
1
1
1
5· +2· +5·
=
(
(
)) =
N
pm
12
1
1
1
d∈D m∈d
1 X X cm
1
1
1
1
(
( )) =
5· +2· +5·
=
N
tm
12
5
2
5
d∈D m∈d
F1B 3
=
1
· 12 = 100%
12
(2.47)
1
· 3 = 25%
12
(2.48)
2 · PB 3 · RB 3
2 · 1 · 0.25
=
= 40%
PB 3 + R B 3
1 + 0.25
(2.49)
To illustrate the effect of only singletons in the true set of entities, assume that figure (2.12) is the true
set of entities and figure (2.11) is the system’s output. The true set of entities and the system’s output are
completeley converse baselines that “correspond to very bad coreference resolution systems and, ideally,
should be given low scores on an adequate evaluation metric” (Kobdani and Schütze, 2010b) (i.e. every
markable is a singleton vs. every markable is coreferent with each other). Nonetheless, the B3 recall is
100% and the f-score achieves about 15,4%:
PB 3
=
RB 3
=
1 X X cm
1
1
1
(
(
)) =
12 ·
=
· 1 = 0.0833%
N
pm
12
12
12
d∈D m∈d
1 X X cm
1
1
1
(
( )) =
12 ·
=
· 12 = 100%
N
tm
12
1
12
(2.50)
(2.51)
d∈D m∈d
F1B 3
=
2 · P B 3 · RB 3
2 · 0.0833 · 1
=
≈ 15.4%
PB 3 + RB 3
0.0833 + 1
(2.52)
41
CEAF
As CEAF uses a one-to-one alignment, there are predicted entities that have no corresponding true entity.
These entities are marked below as N0 .
In system partition (a), the entity containing (6, . . . , C) is most similar to the true entity Gold3 . Thus,
the two markables 6 and 7 are ignored. The CEAF-M value for precision and recall and thus also for the
F1 -score is:
X |Si ∩ φ(Si )| 5
0
5
10
=
+
+
=
≈ 83.33%
(2.53)
P/R/F1 CEAFM =
N
12 12 12
12
Si
In system partition (b), the entity containing (1, . . . , 5, 8, . . . , C) is most similar Gold1 or Gold3 .
The predicted cluster containing the two markables 6 and 7 is aligned to Gold2 and, since CEAF uses a
one-to-one alignment, Gold1 or Gold3 is ignored. The CEAF-M values are:
X |Si ∩ φ(Si )| 5
2
0
7
=
+
+
=
≈ 58.33%
(2.54)
P/R/F1 CEAFM =
N
12 12 12
12
Si
In system partition (c), all true entities are merged. Thus, only one true entity can be aligned with the
merged predicted entity. As this can be the true entities with most markables (i.e. Gold1 or Gold3 ), this
time, seven markables will be ignored:
X |Si ∩ φ(Si )| 5
0
0
5
P/R/F1 CEAFM =
=
+
+
=
≈ 41.7%
(2.55)
N
12 12 12
12
Si
In system partition (d), every predicted entity is a singleton cluster. This is most problematic for the
one-to-one alignment in CEAF as only three markables can be considered:
X |Si ∩ φ(Si )| 1
0
3
P/R/F1 CEAFM =
= 3·
+9·
=
= 25%
(2.56)
N
12
12
12
Si
BLANC
Since BLANC uses both coreference links and disreference links, one convenient way of computing the
values rc, rd, wc, wd is using a script. In appendix C, the python code for computing these values is
presented.
In system partition (a), Gold2 and Gold3 are merged. Therefore, all true coreference links are present
but further false coreference links are added. Thus, the recall of coreference links is 100% and the
precision is less. With respect to the disreference links, no true disreference link is a predicted coreference
link. Hence, the precision of disreference links is 100% and the recall is less because of the two merged
true entities:
Pc
=
Rc
=
F1c
=
Pd
42
=
rc
21
=
≈ 67.742%
rc + wc
21 + 10
rc
21
=
≈ 100%
rc + wd
21 + 0
2 · Pc · R c
2 · 0.67742 · 1.0
=
≈ 80.77%
Pc + Rc
0.67742 + 1.0
rd
35
=
≈ 100%
rd + wd
35 + 0
(2.57)
(2.58)
(2.59)
(2.60)
Rd
=
F1d
=
rd
35
=
= 77.78%
rd + wc
35 + 10
2 · Pd · Rd
2 · 1.0 · 0.7778
=
≈ 87.5%
Pd + R d
1.0 + 0.7778
(2.61)
(2.62)
0.8077 + 0.875
F1c + F1d
=
≈ 84.41%
(2.63)
2
2
System partition (b) shows the same effect as system partition (a) does: two true entities are merged
and so the recall of coreference links and the precision of disreference links is 100%, whereas the corresponding precision and recall is less:
BLAN C =
Pc
=
Rc
=
F1c
=
Pd
=
Rd
=
F1d
=
21
rc
=
≈ 45.65%
rc + wc
21 + 25
rc
21
=
≈ 100%
rc + wd
21 + 0
2 · 0.4565 · 1.0
2 · Pc · Rc
=
≈ 62.68%
Pc + R c
0.4565 + 1.0
rd
20
=
≈ 100%
rd + wd
20 + 0
20
rd
=
= 44.44%
rd + wc
20 + 25
2 · P d · Rd
2 · 1.0 · 0.4444
≈ 61.53%
=
Pd + Rd
1.0 + 0.4444
(2.64)
(2.65)
(2.66)
(2.67)
(2.68)
(2.69)
F1c + F1d
0.6268 + 0.6153
=
≈ 62.11%
(2.70)
2
2
In system partition (c), the predicted set of entities comprises only one single entity and the true
entity set contains both coreference and disreference links. Thus, as described in (2.6.4), Pd , Rd and F1d
get zero. As all markable pairs in the predicted set of entities constitute a coreference link, the recall of
coreference links is 100%:
rc
21
Pc =
=
≈ 31.82%
(2.71)
rc + wc
21 + 45
21
rc
=
≈ 100%
(2.72)
Rc =
rc + wd
21 + 0
2 · Pc · Rc
2 · 0.3182 · 1.0
F1c =
=
≈ 48.28%
(2.73)
Pc + R c
0.3182 + 1.0
BLAN C =
F1c + F1d
0.4828 + 0
=
= 24.14%
(2.74)
2
2
In system partition (d), the predicted set of entities consists of only singleton entities but the true
partition does also contain coreference links. In this special case, the scores Pc , Rc and F1c get zero. As
the predicted partition only comprises disreference links, the recall of disreference links is 100%:
BLAN C =
Pd
=
Rd
=
F1d
=
45
rd
=
≈ 68.18%
rd + wd
45 + 21
rd
45
=
= 100%
rd + wc
45 + 0
2 · P d · Rd
2 · 0.6818 · 1.0
=
≈ 81.08%
Pd + Rd
0.6818 + 1.0
(2.76)
F1c + F1d
0 + 0.8108
=
= 40.54%
2
2
(2.78)
BLAN C =
(2.75)
(2.77)
43
Comparison of the four evaluation metrics
System response
(a)
(b)
(c)
(d)
MUC-F1
94.7
94.7
90.0
0.0
B3 -F1
86.5
73.7
54.5
40.0
CEAF
83.3
58.3
41.7
25.0
BLANC
84.41
62.11
24.14
40.54
Table 2.17.: Comparison of evaluation metrics (Luo, 2005)
When evaluating system partitions (a) and (b), MUC scores 94.7%, indicating wrong coreference
links. It does not make a distinction what entities (i.e. coreference chains) are falsely connected. This
distinction is made by the other three scorers. As B3 counts the overlapped markables for the entity
containing a certain markable, the partitions (a) and (b) show no negative effect on the recall but on the
precision. The precision and thereby the f-score is smaller in (b) than in (a), as the number of falsely
assigned markables is greater. This bias is also discernable with CEAF and BLANC. CEAF uses a oneto-one alignment of entities and thereby ignores those alignments with the smaller markable overlap.
Generally, CEAF scores these partitions more critically than B3 . BLANC rewards both coreference and
disreference links. It scores (a) better than (b), as the number of false links is smaller in (a) than in (b).
So, MUC is the only scorer that is not able to give (b) more penalty (e.g. for merging more incompatible
markables) than (a).
The two system baseline partitions (c) and (d) (all in one cluster; all singletons) show the most divergent scores. As MUC only regards coreference links, a cluster comprising all markables has only two
wrong coreference links (compared to the gold partition). Therefore, MUC scores 90.0%. In the case
of only singletons, there are no coreference links and MUC does not work. Here, it is defined to be
0.0%. Other than MUC, B3 is able to evaluate these baselines in a more realistic manner: (c) is not evaluated very good and (d) gets a score greater than 0. As in the partitions (a) and (b), again, CEAF scores
more critically and give (c) and (d) scores that are about 13.0 − 15.0 smaller than with B3 . BLANC
scores these baselines different. As the disreference links outnumbers the coreference links, the balanced
BLANC (α = 0.5) gives more penalty to (c) than to (d), as there are more misclassified links in (c) as in
(d).
2.7. SemEval-2010 Task 1: Coreference Resolution in
Multiple Languages
The subsequent section gives a brief overview about the SemEval-2010 task 1, described in (Recasens
et al., 2010) with a slight focus on the participants for German, in particular SUCRE. All information
about the competition is extracted from (Recasens et al., 2010).
The main goal of SemEval-2010 task 1: “Coreference Resolution in Multiple Languages” was to
perform and evaluate the results of coreference resolution for six languages: Catalan, Dutch, English,
German, Italian and Spanish. There were four evaluation settings based on the properties closed/open
and gold-standard/regular. SemEval-2010 provides the four most commonly used evaluation scores
(cf. (2.6)): MUC, B3 , CEAF and BLANC. Questions that should be answered with SemEval’s results
are:
1. Is it possible to construct a coreferent resolver for several languages without a huge amount of
tuning?
44
2.7. SemEval-2010 Task 1: Coreference Resolution in Multiple Languages
2. Is an optimal linguistic annotation a must for getting good coreference resolution results or is
automatically annotated information sufficient?
3. To what extent are the different evaluation scores similar and do these provide the same ranking?
The four evaluation settings that are used mean:
Closed: In this setting, the systems are only allowed to use the information given within the dataset.
Open: The participants are allowed to use external resources like Wikipedia or WordNet to improve
the preprocessing information used for the coreference resolution task.
Gold-standard: Here, the systems used information from the gold-standard annotation of grammatical attributes like lemma, part-of-speech, dependency relations or morphological features like
gender, number, case, . . . and additionally true markables.
Regular: The systems only used the information extracted from automatic predictors. For example,
for German, the lemmas were predicted by TreeTagger (Schmid, 1995), the part-of-speech and the
morphology by RFTagger (Schmid and Laws, 2008) and the dependency relations by MaltParser
(Hall and Nivre, 2008).
For this purpose of coreference resolution, results have been submitted by six participants: SUCRE
(Kobdani and Schütze, 2010a), RelaxCor, TANL-1 (Attardi et al., 2010), UBIU (Zhekova and Kübler,
2010), Corry-(B,C,M) and BART (Broscheit et al., 2010a). For each language a unique corpus with
different annotation is used (table 2.18).
Language
Catalan and Spanish
Dutch
English
German
Italian
Corpus
The AnCora corpora: a Catalan and Spanish treebank of 500k
words, source: newspapers
The KNACK-2002 corpus: 267 documents from the Flemish
weekly magazin “Knack”
The OntoNotes Release 2.0 corpus: 300k words from The Wall
Street Journal and 200k words from the TDT-4 collection
The TüBa-D/Z corpus: 794k words from the newspaper
“die tageszeitung (taz)”
The LiveMemories corpus: texts from the Italien Wikipedia,
blogs, news articles and the like
Table 2.18.: Corpora used for each langauge in SemEval-2010
The corpora have been transformed into a specific data format in order to get a common representation
across all six languages. One excerpt of the task dataset is given in figure 2.13.
This data format contains several columns with gold-standard and regular annotation with morphological, syntactic and semantic information:
• The first column corresponds to the token-ID.
• The second column is the word form of the token.
• Columns 3/4 correspond to the gold/automatic annotation of the lemma.
• Columns 5/6 correspond to the gold/automatic annotation of the part-of-speech-tag.
• Columns 7/8 correspond to the gold/automatic annotation of some morphological features like
case, number or gender.
45
1 Frau Frau Frau NN NN cas=n|num=sg|gend=fem cas=n|num=sg|gend=fem 3
3 SUBJ SUBJ _ _ _ _ (815
2 K. K. K. NE NE cas=n|num=sg|gend=fem cas=n|num=sg|gend=* 1 1 APP
APP _ _ _ _ 815)
3 hörte hören hören VVFIN VVFIN _ per=3|num=sg|temp=past|mood=ind 0 0
ROOT ROOT _ _ _ _ _
4 zu zu zu PTKVZ PTKVZ _ _ 3 3 AVZ AVZ _ _ _ _ _
5 . . . $. $. _ _ 4 4 -PUNCT--PUNCT-_ _ _ _ _
Figure 2.13.: A sentence in the original SemEval-2010 task dataset
• Columns 9/10 correspond to the gold/automatic annotation of the ID of the syntactic head (this is
0 in the case of a tree root).
• Columns 11/12 correspond to the gold/automatic annotation of the dependency relation to head
described in columns 9/10.
• Columns 13/14 correspond to the gold/automatic annotation of the named entity type in open-close
notation (if available).
• Columns 15/16 correspond to the gold/automatic annotation of the predicate semantic class (if
available).
• The last column corresponds to the coreference relation in the open-close notation. For example,
in figure 2.13, the markable 815 starts with token 1, “Frau”, and ends with token 2, “K.”.
Thereafter, the dataset is divided into training set, development set and test set. Table (2.19) shows the
sizes of these sets for the German corpus, TüBa-D/Z.
Training set
#documents #sentences
900
19,233
Development set
199
4,129
Test set
136
2,736
#tokens
331,614
#tokens
73,145
#tokens
50,287
Table 2.19.: The training, development and test set of TüBa-D/Z in SemEval-2010
The systems have different architectures and machine learning methods. Table 2.20 compares the
participants in terms of differences in architecture for German.
The second column indicates the system architectures. BART uses closest-first clustering whereas
SUCRE uses best-first clustering. Further description to SUCRE is given in chapter 3. One significant
difference between BART and the other mentioned systems for German is the usage of external resources.
BART uses GermaNet and gazetteers and the others do not use such resources at all.
As a first step in the evaluation, two baseline systems were analyzed with respect to the language (i.e.
the corpus). Table 2.21 shows these scores for German and English.
46
2.7. SemEval-2010 Task 1: Coreference Resolution in Multiple Languages
BART
SUCRE
TANL-1
UBIU
System architecure
Closest-first model
Best-first clustering, relational
database model and Regular
feature definition language
Highest entity-mention similarity
Pairwise model
ML Methods
MaxEnt
Decision trees, Naive Bayes,
SVM and MaxEnt
External Resources
GermaNet & gazetteers
-
MaxEnt
-
MBL
-
Table 2.20.: Comparison of architectures of BART, SUCRE, TANL-1 and UBIU in SemEval-2010
CEAF
MUC
R
P
F1
R
P
F1
R
S INGLETONS : Every markable constitutes each a single entity
German 75.5 75.5 75.5
0.0
0.0
0.0
75.5
English 71.2 71.2 71.2
0.0
0.0
0.0
71.2
A LL - IN - ONE : All markables are grouped together into one cluster
German 8.2
8.2
8.2
100 24.8 39.7
100
English 10.5 10.5 10.5
100 29.2 45.2
100
B3
P
BLANC
P
F1
F1
R
100
100
86.0
83.2
50.0
50.0
49.4
49.2
49.7
49.6
2.4
3.5
4.7
6.7
50.0
50.0
0.6
0.8
1.1
1.6
Table 2.21.: The baseline scores for German and English in SemEval-2010
These show some limitations of the used evaluation scores. These have been further described in
(2.6). The differences in these baseline scores reveals differences in the distribution of the entities in the
respective corpus.
Kobdani and Schütze (2010b) describe this indication as follows: “the system tendency to incorrectly
generate larger clusters is penalized in B3 and CEAF-M metrics, and to incorrectly generate singleton
clusters is penalized in MUC metric”. This means, in the case of German and English, the corpora
TüBa-D/Z and OntoNotes turn out to be slightly different in the entities’ distribution: As in the S INGLE TON -baseline the scores are slightly better in German, TüBa-D/Z contains more singleton entities than
OntoNotes. On the other hand, in the case of the A LL - IN - ONE-baseline, the values for English are better
and thus OntoNotes contains more coreferent markables.
The baseline scores were hard to beat by the participating system. Table 2.22 shows the results of
SemEval-2010 for German. Here, SUCRE, TANL-1 and UBIU only participate in the closed setting,
whereas BART only participates in the open setting. Therefore, SUCRE can only be compared with
TANL-1 and UBIU.
It turns out that SUCRE performs best in closed × regular for the languages English, German and
Italian. Surprisingly, SUCRE did not outperform the values in the S INGLETON-baseline (cf. tables 2.21
and 2.22) for CEAF (72.9 vs. 75.5) and B3 (81.1 vs. 86.0). TANL-1 usually wins with respect to these
scores.
Considering the three posed questions above, the following results have been discovered:
1. With respect to the language, English is the one that has most participants with fifteen entries.
German comes second place with eight entries (cf. table 2.22). Less entries have Catalan/Spanish,
Italian and Dutch. English was the winner in ranking the overall results, followed by German on
the second place. Reasons for this ranking are the differences in the respective corpora (e.g. the
size; here, German has the largest corpus) and the fact that most systems are originally developed
for English.
47
R
German
closed × gold
SUCRE
72.9
TANL-1 77.7
UBIU
67.4
closed × regular
SUCRE
60.6
TANL-1 50.9
UBIU
39.4
open × gold
BART
67.1
open × regular
BART
61.4
CEAF
P
F1
72.9
77.7
68.9
R
B3
P
F1
R
58.4
25.9
21.9
90.4
77.2
73.7
73.6
96.7
77.9
81.1
85.9
75.7
78.2
54.4
60.0
61.8
75.1
77.2
66.4
57.4
64.5
35.0
31.5
11.4
40.9
15.4
10.4
69.1
47.2
41.2
60.1
54.9
53.7
64.3
50.7
46.6
52.7
50.2
50.2
59.3
63.0
54.4
53.6
44.7
48.0
70.5
40.1
51.1
85.3
64.4
73.4
65.5
61.0
62.8
61.4
36.1
45.5
75.3
58.3
65.7
55.9
60.3
57.3
R
MUC
P
F1
72.9
77.7
68.2
74.4
16.4
22.1
48.1
60.6
21.7
59.2
48.2
51.9
59.9
49.5
44.8
49.3
10.2
9.5
66.7
66.9
61.2
61.3
BLANC
P
F1
Table 2.22.: Official results of SemEval-2010 for German
2. As the gold-standard performed significantly better than regular, an optimal linguistic annotation
turns out to be necessary for good coreference resolution results. However, Recasens et al. (2010)
relativizes this insight as a direct effect on the markable detection. This falls rapidly in the regular
setting.
RelaxCor for English is the only participant that reveals a slight improvement by using external
resources (open). The corresponding values are given in table 2.23. Therefore, if SUCRE’s annotation is extended with several external resources like GermaNet, there also might be a slight
improvement.
3. The rankings of the participants differ with respect to the considered evaluation score. For example,
in German, closed × gold (cf. table 2.22), with respect to CEAF and B3 the ranking is: TANL1>SUCRE>UBIU, whereas with respect to MUC, SUCRE outperforms both: SUCRE>TANL1>UBIU and finally with respect to BLANC, TANL-1 is the weakest: SUCRE>UBIU>TANL-1.
In general, there is a correlation between CEAF and B3 but a lack of correlation between CEAF
and MUC in respect of recall. Therefore, the evaluation score has to be defined appropriately or
combined with others (cf. MUC-B3 -score in chapter 4).
R
English
closed × gold
RelaxCor 75.6
open × gold
RelaxCor 75.8
CEAF
P
F1
75.6
75.8
R
B3
P
F1
R
33.7
74.8
97.0
84.5
57.0
83.4
61.3
34.2
75.2
96.7
84.6
58.0
83.8
62.7
R
MUC
P
F1
75.6
21.9
72.4
75.8
22.6
70.5
Table 2.23.: closed vs. open in SemEval-2010
48
BLANC
P
F1
CHAPTER 3
The SUCRE system
The term SUCRE (on German SÜKRE) is an acronym for the German title Semi-Überwachte KoReferenz-Erkennung which is in English semi-supervised coreference resolution (Kessler, 2010). This
chapter is addressed to this coreference system and is organized as follows: in the first section, some
main facts about the SUCRE project are presented. The second section introduces the architecture of
SUCRE as described in (Kobdani and Schütze, 2010a). The third section shows a short overview of the
idea of Self Organizing Maps (SOMs) for the visualization of coreference features. In the fourth section,
the multi-lingual aspect of SUCRE is presented. Here, Kobdani and Schütze (2010b) show that the architecture of SUCRE providing a relational database model and a regular definition language is capable
of implementing features that can be used in several languages. The fifth section repeats and summarizes
the results of SUCRE in SemEval-2010, introduced in (2.7). Finally, the sixth section presents the dataset
extracted from the TüBa-D/Z corpus, that is used for German coreference research in SUCRE.
3.1. The project
The SUCRE project is financed by the German Research Foundation (Deutsche Forschungsgesellschaft
(DFG)). The project heads are Prof. Dr. Hinrich Schütze, Prof. Dr. Hans Kamp and Prof. Dr. Gunther
Heidemann. Two institutes of the University of Stuttgart1 take part in this project: the Institute for
Natural Language Processing2 (Institut für Maschinelle Sprachverarbeitung (IMS)) and the Institute for
Visualization and Interactive Systems3 (Institut für Visualisierung und Interaktive Systeme (VIS)).
The beginning of the project was in September 2009 and has a first duration of two years (Kessler,
2010).
One goal of the project described by Kessler (2010) is the progress in interactive visualization of
coreference features. As described in (3.3) the visualization with Self Organizing Maps simplifies the
semi-supervised annotation of large documents. Additionally, insights for new features can be drawn out
of the visualization results (cf. (3.3)).
Kessler (2010) presents the modules of the project (figure 3.1), where the visualization module has
been worked out by the Institute for Visualization and Interactive Systems, the feature extraction module
is part of both Institutes’ works and the remaining modules are tasks of the Institute for Natural Language
Processing.
In the subsequent chapter, the focus lies on the modules developed by the Institute for Natural Language Processing.
1 http://www.uni-stuttgart.de
2 http://www.ims.uni-stuttgart.de
3 http://www.vis.uni-stuttgart.de
49
Chapter 3. The SUCRE system
Figure 3.1.: The module architecture in the SUCRE project
3.2. The architecture
Beside the resolution of nouns and pronouns, SUCRE performs full coreference resolution. The unique
architecture of SUCRE reveals a new approach to feature engineering of coreference resolution with the
use of a relational database model and a regular feature defintion language (Kobdani and Schütze,
2010a).
SUCRE enables a flexible feature engineering by converting a raw text within a preprocessing step into
a relational database model. This model provides fast and flexible ways of implementing new features
(Kobdani et al., 2010). Feature engineering takes an important part in coreference resolution. Thus, a
system with which researchers can additionally implement features using a regular definition language
has a great advantage concerning effort. It is possible to extract features from the text as well as to import
external features (e.g. semantic relationships from an ontological information source like GermaNet).
The modular architecture provides a clear partition of data storage, feature engineering and machine
learning algorithms (Kobdani et al., 2010). As a result, SUCRE allows to use any externally available
classification method (Kobdani and Schütze, 2010a).
Figure 3.2.: The coreference architecture of SUCRE
Figure 3.2 shows the architecture of the full coreference resolution task that ends up in a coreference
partition (“Markable Chains”). The architecture can be divided into two main steps: preprocessing
(3.2.1) and coreference resolution (3.2.4). The latter comprises the “Pair Estimation” (figure 3.3) and the
“Chain Estimation” (the final Decoding-step).
Figure 3.3.: The pair estimation of SUCRE
50
3.2.1. Preprocessing
In the preprocessing step the text corpus is processed and transformed into the relational database model
(Kobdani et al., 2010). There are two kinds of preprocessing executions: prelabeled and unlabeled
(Kessler, 2010). In the first case, the system uses an annotated corpus and extracts all information out of
it. In the second case, all information is gained by using NLP tools like a tokenizer or a part-of-speech
tagger, as it has been the case in the regular evaluation settings provided in SemEval-2010 (cf. (2.7)). In
the course of the feature engineering provided in this diploma thesis, true markables with gold annotation
are used instead.
Preliminary text conversion
Here, the raw input text is transformed into a format in which tokens (i.e. words, punctuation marks, . . . )
and sentences are recognized and marked up.
• Tokenization
• Sentence boundary detection
Extracting atomic word features
Here, the atomic word features (cf. (3.2.2)) are extracted from the tokens identified in the previous step.
For instance:
• Part-of-speech tags
• Lemmas
• Grammatical/natural gender
• Grammatical number
• Parse information
Markable detection
In this step, the markables are identified based on the previous step (e.g. all noun phrases from the parse
information are regarded as potential markables (Kobdani and Schütze, 2010b)). For more details on the
issue of markable detection, see (2.2).
Extracting atomic markable features
After all markables are identified, for each markable its atomic markable features (cf. (3.2.2)) are extracted using the information from the previous steps (e.g. atomic word features).
• Named Entity
• Alias
• Syntaxtic role
• Semantic class
51
3.2.2. Features in SUCRE
There are two kinds of features available in SUCRE: atomic features and link features.
Atomic Features: SUCRE defines atomic features for words and markables. Atomic word features
are the position in the corpus, the numeric ID of document, paragraph or sentence. In addition
to that, atomic word features might be the part-of-speech tag (e.g. NN for common noun), the
grammatical or natural gender (i.e. male, female, neuter), the number (e.g. singular or plural),
the semantic class, the word type (e.g. in the case of a pronoun, the type of pronoun), the case
(nominative, genitive, dative or accusative) or the person (i.e. first, second, third).
Atomic markable features can be the number of words in the markable, named entity (i.e. whether
the markable is a proper name), alias (i.e. whether the markable constitutes an alias form e.g.
an acronym), the syntactic role (e.g. the subject or object) or the semantic class (e.g. person,
organization, event, . . . )(Kobdani and Schütze, 2010a).
Link Features: Link features are defined over a pair of markables m1 and m2 . In many cases, the most
important word in a markable is its head word (e.g. pronouns), but sometimes the head word is not
expressive enough for resolving a markable, for example in the markable pair (das Buchm1 , ein
Buchm2 ), the distinguishing feature is the beginning word, or with the markable pair (ein Student
aus Deutschlandm1 , ein Student aus Frankreichm2 ) it is the last word, that differs and indicates
disreference. In some cases, all words in a markable has to be considered. SUCRE’s feature
definition language uses keywords to specify the word selection of a markable: m1 and m2 refer
to the first and second markable, m1b and m2b, m1h and m2h and m1e and m2e refer to the first
word, the head word and the last word in the first and second markable. More details on markable
keywords are provided in appendix B, (B.1).
Some of the functions for each keyword are exact- and substring matching (case-sensitive and
case-insensitive), edit distance, alias, word relations, markable parse tree path and absolute value
(Kobdani and Schütze, 2010a). Some examples of link features (extracted from the feature set
given for feature engineering in German coreference resolution in section 4.1) are presented in
example (8):
(8)
a.
b.
c.
{abs(m2b.stcnum-m1b.stcnum)>1}
→ The distance between the two markables is bigger than one sentence
{alias(m1h,m2a)||alias(m1a,m2h)}
→ The head of one markable is an alias of the other markable
{(seqmatch(m1h,m2h))}
→ Exact string match of both markables’ heads
Further information about the feature definition language is given in appendix B.
3.2.3. The Relational Database Model
The result of the preprocessing step (3.2.1) is a relational database model. It is a common structure
that includes all data used for the coreference resolution, for instance: the text corpus, the results of the
preprocess, relations between textual entities, classification results and the like. As it is usually the case
in NLP, here, the values of attributes of textual entities and the relationship between those entites form the
base for features. Thus, the relational database model constitutes the “natural formalism for supporting
the definition and extraction of features” (Kobdani et al., 2010).
A minimal model for running the system consists of three tables: word table (table 3.1), markable
table (table 3.2) and link table (table 3.3).
In the word table, the Word-ID constitutes the word’s index (its numeric ID), i.e. the position of the
token in the corpus. This ID uniquely identifies the word. Foreign keys are Document-ID, Paragraph-ID
52
Word Table
Word-ID
Primary Key
Document-ID
Foreign Key
Paragraph-ID
Foreign Key
Sentence-ID
Foreign Key
Word-String
Attribute
Word-Feature-0
Attribute
Word-Feature-1
Attribute
..
..
.
.
Word-Feature-N
Attribute
Markable Table
Markable-ID
Primary Key
Begin-Word-ID
Foreign Key
End-Word-ID
Foreign Key
Head-Word-ID
Foreign Key
Markable-Feature-0
Attribute
Markable-Feature-1
Attribute
..
..
.
.
Markable-Feature-N
Attribute
Table 3.2.: Markable Table
Table 3.1.: Word Table
Link Table
Link-ID
Primary Key
First-Markable-ID
Foreign Key
Second-Markable-ID Foreign Key
Coreference-Status
Attribute
Confidence-Status
Attribute
Table 3.3.: Link Table
and Sentence-ID. They point to the primary keys of the corresponding document, paragraph or sentence
table. Containing the word ID and the word string, the word table is able to reconstruct the raw text of
the corpus, as it knows the words’ linear order, and any other (tagged) format using the word features
(part-of-speech tag, number, gender, . . . ). The word features can be defined and added to the word table
in the preprocessing step.
Figure 3.4 shows an excerpt of the word table created for the TüBa-D/Z corpus. The information
in the columns are ordered: word-ID, word string, document-ID, paragraph-ID, sentence-ID, part-ofspeech tag, number, gender, case, person.
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
Für
diese
Behauptung
hat
Beckmeyer
bisher
keinen
Nachweis
geliefert
.
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
65
65
65
65
65
65
65
65
65
65
APPR
PDAT
NN
VAFIN
NE
ADV
PIAT
NN
VVPP
$.
unknown
unknown
singular
unknown
singular
unknown
unknown
singular
unknown
unknown
unknown
unknown
female
unknown
male
unknown
unknown
male
unknown
unknown
unknown
unknown
accusative
unknown
nominative
unknown
unknown
accusative
unknown
unknown
unknown
unknown
unknown
unknown
unknown
unknown
unknown
unknown
unknown
unknown
Figure 3.4.: Example of a word table in the TüBa-D/Z corpus
In the markable table, the Markable-ID is the primary key constituting the unique index of each
markable. The Begin-Word-ID, Head-Word-ID and End-Word-ID are foreign keys, i.e. they refer to the
primary keys in the word tables corresponding to the first word, the head word and the last word in the
markable. Like the word features, the markable features can be defined and added in the preprocessing
53
step.
Figure 3.5 shows the markable table of the three markables in the word table in figure 3.4. The
information in the columns are ordered: markable-ID, document-ID, word-ID of the first word, word-ID
of the last word, word-ID of the head word, named entity class. For instance, markable m323 corresponds
to the proper name Beckmeyer. Therefore, the first, last and head word are the same. As there is no named
entity classification available for TüBa-D/Z, the last column is always annotated as unknown.
322
323
324
4
4
4
1203
1206
1208
1204
1206
1209
1204
1206
1209
unknown
unknown
unknown
Figure 3.5.: Example of a markable table in the TüBa-D/Z corpus
In the link table, the Link-ID is the primary key. It discriminates the link from others. The foreign
keys are the markable-IDs of the first and second markable that are connected via the respective link.
They refer to the primary keys in the corresponding markable tables (Kobdani and Schütze, 2010a).
The coreference status and the confidence status indicate if the connected markables are coreferent or
disreferent and how confident this classification is.
Figure 3.6 shows an excerpt of the link table with the markable m323 mentioned in the word and markable table above. The information in the columns are ordered: link-ID, markable-ID of the antecedent,
markable-ID of the anaphor, coreference status, confidence status. As shown in figure 3.6, markable
m323 , the proper name Beckmeyer, is coreferent with markable m314 , the noun phrase Bremens Häfensenator and with markable m335 , the complex proper name Uwe Beckmeyer but disreferent with markable
m322 , the noun phrase diese Behauptung. As this link table is gold standard, the confidence status is
always 100%.
1144
1152
1153
314
322
323
323
323
335
coref
disref
coref
100
100
100
Figure 3.6.: Example of a link table in the TüBa-D/Z corpus
3.2.4. Coreference Resolution
After the relational database model is created, the coreference resolution step can be started.
Link Generator
In the training step of the classifier, SUCRE creates positive and negative training samples. For each
adjacent coreferent markable pair, a positive instance is created, say <mi ,mj >. Negative training instances are generated by pairing all preceding markables mα that are disreferent to mj with mj . For the
decoding, the system generates all possible markable pairs inside a window of 100 markables (Kobdani
and Schütze, 2010a).
The output of this module is a list of generated links that is saved within the link table (cf. table 3.3)
(Kobdani and Schütze, 2010b).
54
3.3. Visualization with SOMs
Filtering
As the number of disreference links outnumbers the number of coreference links by far, this step is used
for reducing the number of disreference links with the use of prefilters (cf. (4.1.2)). Thereby, links
that definitely connect two disreferent markables are filtered out. For instance, if the antecedent and the
anaphor mismatch in the number value.
Link Feature Extractor
In this step, the values of the link features (cf. (3.2.2)) that have been defined are extracted for creating
the dataset for training and testing a pairwise classifier. For the definition of link features, the regular
definition language is used (see appendix B).
The final samples are represented as link feature vectors. Each component of such a vector corresponds
to a link feature.
Learning (Training)
SUCRE uses four classifiers for which the feature vectors are their input: Decision-Tree (Quinlan, 1993),
Naive-Bayes, Support Vector Machine and Maximum-Entropy. The best results were achieved with
the Decision-Tree-classifier (Kobdani and Schütze, 2010a). Therefore, it is used as basis for the feature
engineering in chapter 4 to 6.
Classification and Decoding
After classification each test sample as coreferent or disreferent, the decoding (clustering) step starts.
Here, the coreference chains (i.e. the clusters) are generated based on the pairwise decisions. In SUCRE,
the best-first clustering is used. That means, for a given markable mj the best predicted antecedent
mii (i.e. with the highest confidence status) is chosen. The starting point is the end of the document,
moving leftwards (Kobdani and Schütze, 2010a). If the number of markables within a document exceeds
a predefined threshold, there is a limit that yields better efficiency and results (Kobdani and Schütze,
2010b).
One traditional possibility for annotating a text with coreference information visually is a text based visualization (e.g. the GATE framework (Cunningham et al., 2002)). The problem that occurs with this kind
of visualization is that the annotation of large documents (e.g. books or detailed reports) requires a lot of
time (Burkovski et al., 2011). Moreover, it does not provide a visualization of the feature space or similarities between links. Furthermore, text based visualizations are limited by the number of lines/colors
that a user can distinguish by annotating coreference information (Burkovski et al., 2011). Another kind
of coreference visualization is constituted by the Self Organizing Maps (SOMs). Here, unsupervised
machine learning methods are combined with visualization and interaction techniques (Burkovski et al.,
2011).
The SOM visualizes coreference information gained by pairwise models. It provides an interactive
presentation of the feature space. Thereby, the user can explore the feature space and is able to annotate
data with coreference information in a fast manner. This approach is aimed to boost and advance human
coreference recognition (Burkovski et al., 2011).
3.3.1. Self Organizing Map
A Self Organizing Map is a “type of artificial neural network”, firstly described in (Kohonen, 1982),
with which one can visualize “high dimensional data” (Burkovski et al., 2011). It can be used as an
55
unsupervised machine learning method.
Burkovski et al. (2011) describe the SOM in a formal way: nodes (i.e. the neurons in this neural
network) are connected to other nodes within a low dimensional topology. A neuron ni ∈ N is defined
by its particular location ri ∈ Rdtopol , where dtopol is the dimension of the topology, and its weight
vector w
~ i ∈ Rdin , where din is the dimension of the input vectors ~x ∈ Rdin .
Training
At the beginning (training time t = 0), all weight vectors w
~ i (0) are initialized based on some data
knowledge or randomly. During training, an input vector ~xk is chosen randomly and is assigned to the
node nj (the best matching unit (BMU)) whose weight vector has the smallest distance with respect to,
say, euclidean distance (cf. formula 3.1):
j = arg min(||~xk − w
~ i (t)||)
i
(3.1)
After each assignment, every weight vector w
~ i is updated with a learning rule (formula 3.2), where α(t)
is a “time decreasing learning coefficient” that controls the influence of the input on the training and hij
denotes the neighborhood function, a distance measure between the node ni , that correlates to the weight
vector w
~ i , and the BMU nj . A common neighborhood function is given in formula 3.3 (cf. (Burkovski
et al., 2011)), where σ(t) is the radius of the neighborhood, which also descreases with time t (Kessler,
2010).
w
~ i (t + 1) = w
~ i (t) + hij (t)α(t)(~xk − w
~ i (t))
(3.2)
||~rj − ~ri ||2
)
(3.3)
2σ(t)2
“Graphically speaking, the weight vectors change their places in feature space to get closer to the data
points and take their neighbors with them” (Burkovski et al., 2011).
After each iteration, the training time t is increased by one (t ← t + 1) and thereby, α(t) and hij (t)
decrease. Then, the next input vector, ~xk+1 , is chosen.
These steps are repeated until a limit of iterations is reached or α(t) is below a predefined threshold.
hij (t) = exp(−
Visualization
There are several ways to visualize this neural network. The most commonly used one is the so-called
“U-matrix”, where the position of nodes and their neighborhood relations are represented as components
(cf. figure 3.7). Here, ui represents the position of a node and uij is the distance between the nodes ni
and nj in the feature space: uij = ||w
~i − w
~ j ||.
Figure 3.7.: U-matrix in (Burkovski et al., 2011)
High U-matrix values indicate that the weight vectors w
~ i of neighboring nodes ni are far apart in
the feature space. Burkovski et al. (2011) represent the U-matrix as a graph (figure 3.8). All U-matrix
values are indicated with a gray scale color: black represents a high value and light gray represents a low
value. The number with which a node ni is labeled indicates the number of assigned input vectors ~xi for
which ni is the BMU. The user can click on each node and receive information about the assigned links
(Burkovski et al., 2011).
Moreover, it is possible to focus on a single feature and reduce the SOM to a component plane (cf.
figure 3.9, 3.10 and 3.11). Thereby, the user can have insights in the impact of a specific feature.
56
Figure 3.8.: Graph of a U-matrix in (Burkovski et al., 2011)
3.3.2. Application of SOMs in coreference resolution
The subsequent section mentions three features, that are described in table 3.4:
Head match
WordNet distance
Markable span
1 if the head of markables matches, otherwise 0.
Jaccard coefficient of WordNet hypernym sets for both markables.
1 if one markable spans the other, otherwise 0.
Table 3.4.: Three features used by Burkovski et al. (2011)
Burkovski et al. (2011) discuss three applications of the visualization with SOMs:
1. SOMs enable to represent “high dimensional coreference data and their features in a low dimensional space”. This way, the user is able to better understand the distribution of coreference data
in the feature space. Figure 3.8 shows the U-matrix-graph for the proportion of coreference links
(i.e. feature vectors) assigned to the matrix nodes. The dark nodes indicate high assignments of
coreference links, whereas light gray nodes comprise none or just a few coreference links. Clusters
or regions of nodes are defined as areas with high density of high-numbered nodes, separated by
nodes without any link assignments or by black edges (i.e. edges with a high U-matrix value). The
regions A − D contain predominantly coreference links, whereas region E also involves disreference links, as there are light gray nodes.
If the user wants to know what feature is accountable for a cluster, he or she can use the component
planes (figure 3.9, 3.10 and 3.11).
57
Figure 3.9.: The component plane for head match Figure 3.10.: The component plane for WordNet
distance
Figure 3.11.: The component plane for markable span
The component plane in figure 3.9 is based on the head match feature. As this is a good indicator
for coreference, the regions A, B and D are mainly dark in this component plane. Thus, they
are constituted by this feature. Figure 3.10 shows the component plane for the WordNet distance
feature. Here, the user can see that this feature is responsible for region C, as here, the region C is
predominantly black. The region B does not have high U-matrix values in this component plane.
Thus, the WordNet distance feature has no positive impact on the creation of region B.
2. The user can find new insights for designing new features by exploring the SOMs. Some regions
in figure 3.8 show both gray nodes and black nodes. Here, the features do not separate coreference
58
3.4. The multi-lingual aspect in SUCRE
and disreference links well enough. One way of solving this is a closer look into the gray nodes.
So, the user can see details on the links assigned to that node and can find reasons why both
coreference and disreference links are assigned to the same node. For instance, figure 3.11 shows
the component plane for the markable span feature. This feature is an indicator for disreference.
Nonetheless, region E in figure 3.11 mainly contains coreference links. If the user checks some
nodes within this region, the reason for this becomes obvious. Most coreference links are assigned
due to the fact that the second markable is an apposition to the first markable. This can be solved
by introducing an apposition feature. In general, the “inspection of nodes with mixed links helps
the user to understand what these links have in common and what new feature may separate them”
(Burkovski et al., 2011).
3. As the annotation of large documents in a text based visualization is not time efficient, the SOMs
provide a great advantage with the option of annotating similar data with coreference information.
Given strong indicators for coreference like the component plane for the head match feature (figure
3.9), the user is able to identify regions. Now it is possible to annotate whole clusters with the
right coreference information by checking for nodes in the respective regions, what kind of links
(coreference/disreference) are predominantly given. If the user is not sure about the right class, he
or she can also label the clusters with confidence values. These confidence values may be used for
a supervised learning algorithm, afterwards (Burkovski et al., 2011).
3.4. The multi-lingual aspect in SUCRE
The relational database model used in SUCRE provides a flexibility in feature engineering, that makes it
possible for using one and the same feature set for several languages. As described in (2.7), SUCRE provides coreference resolution for all six languages presented in SemEval-2010: Catalan, Dutch, English,
German, Italian, and Spanish.
In (Kobdani and Schütze, 2010b), the question is addressed in how far it is possible “to define a
common feature set that can be used for different languages”. Here, they focus on four languages:
Dutch, German, Italian, and Spanish.
With respect to the multi-lingual aspect, three main categories of features are defined, where only the
first two are considered to be relevant for a feature set applicable to different languages:
Identical features: An identical feature is a feature that is identical for all languages with respect to
concept and definition. For instance, the distance between two markables in terms of sentences
or exact or substring match. Kobdani and Schütze (2010b) discriminate three kinds of identical
features:
1. String-based features:
a) Exact string match of the markables’ heads
b) Head of m1 contained in m2
c) Head of m2 contained in m1
d) Any word in m1 is contained in m2 or vice versa
e) Substring match of the markables’ heads
f) Partial match of the head of m1 with any word in m2
g) Partial match of the head of m2 with any word in m1
h) Partial match of any word in m1 with any word in m2 or vice versa
i) Edit distance between the markables’ heads
2. Distance features:
59
a) Distance between m1 and m2 in terms of sentences
b) Distance between m1 and m2 in terms of words
3. Span features:
a) One markable is included in the other one
b) m1 overlaps with m2
As the results based on this feature set outperforms the baselines (i.e. SINGLETONS and ONE
CLUSTER), Kobdani and Schütze (2010b) argue “that these link features should be in the common
feature set of the four languages”.
Universal features: A universal feature is a feature that is identical for all languages with respect to
concept but often has different realizations due to different annotation styles and lexical/grammatical
differences. For example, pronoun type features, semantic class, number or definiteness of a noun
phrase. Kobdani and Schütze (2010b) propose four groups of universal features that are defined
for each language:
1. Noun type features: m1 /m2 is
a) a common noun
b) a proper noun
c) definite
d) indefinite
As adding noun type features to the identical feature set increases the MUC-F1 about more
than 10% (averaged over all four languages), Kobdani and Schütze (2010b) affirm the adding
of noun type features to the feature set.
2. Pronoun type features: m1 /m2 is
a) a first person pronoun
b) a second person pronoun
c) a third person pronoun
Although some more language specific variants of these features in German and Spanish
are even better, these pronoun type features improves the MUC-F1 score with about 5%.
Therefore, the pronoun type features should be part of a multi-lingual feature set.
3. Grammatical features: m1 /m2 is
a) a subject
b) a direct object
c) an indirect object
These six features (three for each markable) improve the MUC-F1 score with about 8%.
Therefore, it is advisable to include the grammatical features to the common feature set.
4. Agreement features: m1 and m2 agree with respect to
a) number
b) natural gender
c) semantic class
These features yield a 7% increase in the MUC-F1 score. Thus, it should be added to the
final multi-lingual feature set too.
60
3.5. Evaluation results in SemEval-2010
Language specific features: A language specific feature is defined just for a specific language. For
example the grammatical gender for German.
The overall results of the four languages are shown in the results of SemEval-2010 - SUCRE gold
annotation in table 3.5. This shows that the multi-lingual feature set presented by Kobdani and Schütze
(2010b) achieves competitive results for coreference resolution.
3.5. Evaluation results in SemEval-2010
SUCRE participated in SemEval-2010 Task 1 on Coreference Resolution in Multiple Languages for gold
and regular closed annotation tracks. It got best results in several categories, including regular closed
annotation tracks for English, German and Italian. Further information of this competition is given in
(2.7).
Table 3.5 shows the results of SUCRE and the best competitor system, TANL-1. The four main
evaluation measures (CEAF, MUC, B3 and BLANC) are used. Additionally, the score for markable
detection (MD), that is on 100% in the gold annotation, is shown above the other scores. SUCRE’s
results for gold closed annotation track of English and German are the best in MUC and BLANC.
Language
System
MD-F1
CEAF-F1
MUC-F1
B3 -F1
BLANC
System
MD-F1
CEAF-F1
MUC-F1
B3 -F1
BLANC
System
MD-F1
CEAF-F1
MUC-F1
B3 -F1
BLANC
System
MD-F1
CEAF-F1
MUC-F1
B3 -F1
BLANC
ca
100
68.7
56.2
77.0
63.6
69.7
47.2
37.3
51.1
54.2
100
70.5
42.5
79.9
59.7
82.7
57.1
22.9
64.6
51.0
de
en
es
it
SUCRE (Gold Annotation)
100
100 100 98.4
72.9 74.3 69.8 66.0
58.4 60.8 55.3 45.0
81.1 82.4 77.4 76.8
66.4 70.8 64.5 56.9
SUCRE (Regular Annotation)
78.4 80.7 70.3 90.8
59.9 62.7 52.9 61.3
40.9 52.5 36.3 50.4
64.3 67.1 55.6 70.6
53.6 61.2 51.4 57.7
TANL-1 (Gold Annotation)
100
100 100 N/A
77.7 75.6 66.6 N/A
25.9 33.7 24.7 N/A
85.9 84.5 78.2 N/A
57.4 61.3 55.6 N/A
TANL-1 (Regular Annotation)
59.2 73.9 83.1 55.9
49.5 57.3 59.3 45.8
15.4 24.6 21.7 42.7
50.7 61.3 66.0 46.4
44.7 49.3 51.4 59.6
nl
100
58.8
69.8
67.0
65.3
42.3
15.9
29.7
11.7
46.9
N/A
N/A
N/A
N/A
N/A
34.7
17.0
8.3
17.0
32.3
Table 3.5.: Results of SUCRE and the best competitor system, TANL-1, in SemEval-2010 Task 1
This shows that SUCRE has been optimized in order to achieve good results on the four evaluation
measures (Kobdani and Schütze, 2010a). For the improvement of SUCRE’s performance within this
61
diploma thesis, the bold values in table 3.5 (i.e. German, Gold annotation, SUCRE and best competitor,
TANL-1) has to be considered.
3.6. The dataset for German coreference resolution
As the results with the feature research in this diploma thesis should be comparable to the results of
SemEval-2010, the dataset of SemEval is used. As mentioned in (2.7), for German coreference resolution, the TüBa-D/Z corpus is used (“Tübinger Baumbank des Deutschen / Zeitungskorpus”), developed
at the University of Tübingen (Hinrichs et al., 2005a). It comprises 794k words from the newspaper “die
tageszeitung (taz)”. It is a syntactically hand-annotated corpus. The annotation scheme is divided into
four levels of syntactic constituence:
1. lexical level
2. phrasal level
3. level of topology
4. sentence level
The annotation contains information about morphology, part-of-speech-tags, lemmas, grammatical
functions, named-entities and anaphora resp. coreference relations [SfS Universität Tübingen (2010)].
For the purpose of SUCRE, the dataset extracted from TüBa-D/Z contains about 455k words (cf. table
2.19 in (2.7)). The training and development set of SemEval-2010 is merged to the new training set used
in this study. Table 3.6 shows for both sets the number of documents, sentences and tokens:
Training set
#sentences
23,362
Test set
136
2,736
Total
1235
26,098
#documents
1099
#tokens
404,759
#tokens
50,287
#tokens
455,046
Table 3.6.: The training and test set of TüBa-D/Z in this study
The dataset is annotated with the gold annotation of the original SemEval-2010 dataset (cf. figure 2.13
in (2.7)) and already transformed into the relational database model (cf. (3.2.3)). Figure 3.12 shows a
further excerpt of the original task dataset from SemEval-2010 (cf. (2.7)).
1 Er er er PPER PPER cas=n|num=sg|gend=masc|per=3
per=3|cas=n|num=sg|gend=masc 2 2 SUBJ SUBJ _ _ _ _ (191)
2 wird werden werden VAFIN VAFIN _ per=sg|num=pres|temp=ind 0 0 ROOT
ROOT _ _ _ _ _
3 wissen wissen wissen VVINF VVINF _ _ 2 2 AUX AUX _ _ _ _ _
4 ,,,$,$,_ _ 3 3 -PUNCT--PUNCT-_ _ _ _ _
5 warum warum warum PWAV PROAV _ _ 3 3 S ADV _ _ _ _ _
6 . . . $. $. _ _ 5 5 -PUNCT--PUNCT-_ _ _ _ _
Figure 3.12.: Another sentence in the original SemEval-2010 task dataset
The token ID (column 1), the word form of the token (column 2), the gold part-of-speech-tag (column
5) and the gold morphological features (column 7) are inserted in the word table.
62
CHAPTER 4
Linguistic error analysis
In this chapter, the classification results of SUCRE are considered. The pairwise classifier labels true
coreferent/disreferent links with a confidence value. If the value is below 50%, the link is classified as
disreferent. If the value is greater or equal to 50%, the link is predicted to be coreferent. As the links
have true labels (coref/disref with 100% confidence status - cf. link table 3.3), the classification decisions
can be divided into true positive (TP), true negative (TN), false positive (FP) and false negative (FN).
Table 4.1 shows the confusion matrix for these classification judgements.
Gold annotation
Calc-Prob≥50
Calc-Prob<50
Sums
Coreference
TP
FN
TP+FN
Disreference
FP
TN
FP+TN
Sums
TP+FP
FN+TN
TP+FP+TN+FN
Table 4.1.: Confusion matrix for classification judgements
Misclassifications are the false positives (i.e. disreference links misclassified as coreferent) and false
negatives (i.e. coreference links misclassified as disreferent).
Each class of misclassification is analysed in a linguistical manner. Thus, the errors are inspected
and reasons for the misclassifications are searched. Moreover, this analysis tries to answer the following
questions:
1. What are frequent problems occurring with false positives and false negatives including some
examples?
2. What features lead to this problem?
3. Is there any linguistic background for this misclassification?
4. How could those problems be solved (i.e. is it possible to implement a new link feature for this
linguistic phenomenon)?
5. If necessary, what modifications/extensions of the pseudo-language have to be done in order to
implement the new feature?
SUCRE’s output includes a list of all correct and incorrect link classifications and the information
about the markables and their positions. For a better error analysis each link gets a unique id. This unique
link-id consists of the first and second markable-ID separated by an x. Depending on the calculation
63
Chapter 4. Linguistic error analysis
probability and the Gold-value the links are divided into TP, FP, TN and FN (cf. table 4.1). The optimal
goal would be getting every link from false positive to true negative and from false negative to true
positive.
In (4.1), the initial configuration is presented in terms of the first results, the initial feature set for
filtering definitely disreferent markable links (prefilters) and the feature set for creating feature vectors
for the classifier’s input (link features). The features are presented in pseudo language and are described
with a paraphrase. The tables 4.4 and 4.5 show how they contribute to the current evaluation scores.
(4.2) describes what differences occur in the baseline of the initial configuration and the baseline given
by SemEval-2010. One step, that includes the removing of three features, provides a new baseline
that is near the one, achieved in SemEval-2010. Afterwards in (4.3) the misclassifications of the false
positives are analysed, based on the new baseline introduced in (4.2), in terms of different classes of
linguistic or processing problems. The leading question is, why were two disreferent markables classified
as coreferent? Some solutions (i.e. new link features) for the problems are proposed. These features will
be implemented later in chapter 5. The analysis of the false positives provides further restrictions on the
feature set in order to improve precision by not losing too much recall. In section (4.4) a similar analysis
is done for false negatives. Here, the leading question is, why are two coreferent markables classified as
disreferent? The goal in this step is to increase recall by not losing too much precision.
4.1. The initial configuration
4.1.1. The initial results
Running SUCRE with the initial feature settings, the evaluation scores are:
MUC-correct
MUC-Precision
MUC-Recall
MUC-f-score
2469
0.403036
0.74909
0.524093
MUC-B3 -f-score
CEAFM -all
13446
CEAFM -Precision
0.649784
CEAFM -Recall
0.649784
CEAFM -f-score
0.649784
BLANC-Attraction-f-score 0.204744
BLANC-Repulsion-f-score 0.973537
BLANC-Precision
0.559756
B3 -all
B3 -Precision
B3 -Recall
B3 -f-score
0.617821
CEAFE -correct
CEAFE -Precision
CEAFE -Recall
CEAFE -f-score
BLANC-Recall
BLANC-f-score
RAND-accuracy
13446
0.645682
0.901311
0.752376
6298.88
0.860503
0.62058
0.721109
0.761964
0.645392
0.948779
Table 4.2.: Initial results of SUCRE
For more details on the meanings of the evaluation scores, see (2.6). The MUC-B3 -F-measure is the
harmonic mean of MUC’s and B3 ’s F-measure. The reason for this combination is described in (2.6).
For instance, given the two baseline scores for MUC and B3 in German (cf. table 2.21 from SemEval2010 (2.7)) the MUC-B3 metric is an acceptable tradeoff as shown in table 4.3. In the case that there
are no coreference links, MUC as well as MUC-B3 return 0% or even “nan”, whereas in the case that
both scores (MUC and B3 ) are non-zero, their harmonic mean rewards both correct coreference links and
correct singleton entities:
64
R
MUC
P
F1
0.0
0.0
0.0
100
24.8
39.7
B3
P
R
F1
S INGLETONS
75.5 100 86.0
A LL - IN - ONE
100 2.4
4.7
MUC-B3
R
P
F1
0.0
0.0
0.0
100
4.4
8.4
Table 4.3.: The usage of MUC-B3
4.1.2. The link features in the prefilter
Links which cannot be coreferent are caught by some link features which model this incompatibility.
Those links are filtered out before creating the feature vectors (cf. (3.2.4)).
1. {(m1h.f1==f1.singular)&&(m2h.f1==f1.plural)}
→ The first markable is singular but the second markable is plural
2. {(m1h.f1==f1.plural)&&(m2h.f1==f1.singular)}
→ The first markable is plural but the second markable is singular
3. {(abs(m2b.stcnum-m1b.stcnum)>2)&&((m1h.f0==f0.P~)||(m2h.f0==f0.P~))}
→ The distance between the two markables is bigger than two sentences and one markable is a
pronoun
4. {(abs(m2b.stcnum-m1b.stcnum)>0)&&((m1h.f0==f0.PRF)||
(m2h.f0==f0.PRF)||(m1h.f0==f0.PRELS)||(m2h.f0==f0.PRELS)||
(m1h.f0==f0.PRELAT)||(m2h.f0==f0.PRELAT))}
→ The markables are not in the same sentence and one markable is a reflexive pronoun or a
relative pronoun
5. {(m2b.f0==f0.PIS)||(m2b.f0==f0.PIAT)||(m2b.f0==f0.PIDAT)}
→ The second markable is an indefinite pronoun
6. {(m1h.f2==f2.female)&&(m2h.f2==f2.male)}
→ The first markable is female but the second markable is male
7. {(m1h.f2==f2.male)&&(m2h.f2==f2.female)}
→ The first markable is male but the second markable is female
8. {(m1h.f2!=m2h.f2)&&(m1h.f2!=f2.unknown)&&(m2h.f2!=f2.unknown)&&
((m1h.f0==f0.P~)&&(m2h.f0==f0.P~))}
→ The two markables differ in gender, which is not unknown, and both markables are pronouns
4.1.3. The link features for the feature vectors
1. {abs(m2b.stcnum-m1b.stcnum)==0}
→ Both markables are in the same sentence
→ The second markable is in the subsequent sentence
3. {abs(m2b.stcnum-m1b.stcnum)>1}
→ The distance between the two markables is bigger than one sentence
65
4. {alias(m1h,m2a)||alias(m1a,m2h)}
→ The head of one markable is an alias of the other markable
5. {(seqmatch(m1h,m2h))}
→ Exact string match of both markables’ heads
6. {(strmatchlc(m1h,m2h))}
→ Case-insensitive substring match of both markables’ heads
7. {strmatchlc(m2b,ein)&&(m2b.f0==f0.ART)}
→ Second markable starts with an indefinite article
8. {((m1b.txtpos<=m2b.txtpos)&&(m2e.txtpos<=m1e.txtpos))||
((m2b.txtpos<=m1b.txtpos)&&(m1e.txtpos<=m2e.txtpos))}
→ The first markable includes the second or vice versa
9. {(m1b.txtpos<=m2b.txtpos)&&(m1e.txtpos<=m2e.txtpos)&&
(m1e.txtpos>=m2b.txtpos)}
→ The first markable precedes the second markable but they overlap
10. {(m2b.txtpos<=m1b.txtpos)&&(m2e.txtpos<=m1e.txtpos)&&
(m2e.txtpos>=m1b.txtpos)}
→ The second markable precedes the first markable but they overlap
11. {(m1h.f0==f0.NE)&&(m2h.f0==f0.NE)}
→ Both markables are proper names
12. {(m1h.f0==f0.NE)&&(m2h.f0==f0.NN)}
→ The first markable is a proper name and the second markable is a common noun
13. {(m1h.f0==f0.NN)&&(m2h.f0==f0.NE)}
→ The first markable is a common noun and the second markable is a proper name
14. {(m1h.f0==f0.NN)&&(m2h.f0==f0.NN)}
→ Both markables are common nouns
15. {(m1h.rewtag == rewtags.SUBJ) && (m2h.rewtag == rewtags.SUBJ)}
→ Both markables are subjects
16. {(m1h.rewtag != rewtags.SUBJ) && (m2h.rewtag != rewtags.SUBJ)}
→ Neither the first nor the second markable is a subject
17. {(m1h.f0==f0.PDAT)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)||
(m2h.f1==f1.unknown))}
→ The first markable is an attributive demonstrative pronoun and both markables have the same
number or one’s is unknown
18. {(m1h.f0==f0.PDS)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)||
→ The first markable is a substituting demonstrative pronoun and both markables have the same
19. {(m1h.f0==f0.PIDAT)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)||
→ The first markable is an attributive indefinite pronoun and both markables have the same number or one’s is unknown
66
20. {(m1h.f0==f0.PIS)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)||
→ The first markable is a substituting indefinite pronoun and both markables have the same number or one’s is unknown
21. {(m1h.f0==f0.PPER)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)||
→ The first markable is a personal pronoun and both markables have the same number or one’s
is unknown
22. {(m1h.f0==f0.PRF)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)||
→ The first markable is a reflexive pronoun and both markables have the same number or one’s is
unknown
23. {(m1h.f0==f0.PPOSS)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)||
→ The first markable is a substituting possessive pronoun and both markables have the same
24. {(m1h.f0==f0.PPOSAT)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)||
→ The first markable is an attributive possessive pronoun and both markables have the same
25. {(m1h.f0==f0.PRELAT)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)||
→ The first markable is an attributive relative pronoun and both markables have the same number
or one’s is unknown
26. {(m1h.f0==f0.PRELS)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)||
→ The first markable is a substituting relative pronoun and both markables have the same number
or one’s is unknown
27. {(m2h.f0==f0.PDAT)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)||
→ The second markable is an attributive demonstrative pronoun and both markables have the
same number or one’s is unknown
28. {(m2h.f0==f0.PDS)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)||
→ The second markable is a substituting demonstrative pronoun and both markables have the
same number or one’s is unknown
29. {(m2h.f0==f0.PIDAT)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)||
→ The second markable is an attributive indefinite pronoun and both markables have the same
30. {(m2h.f0==f0.PIS)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)||
→ The second markable is a substituting indefinite pronoun and both markables have the same
67
31. {(m2h.f0==f0.PPER)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)||
→ The second markable is a personal pronoun and both markables have the same number or one’s
is unknown
32. {(m2h.f0==f0.PRF)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)||
→ The second markable is a reflexive pronoun and both markables have the same number or one’s
is unknown
33. {(m2h.f0==f0.PPOSS)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)||
→ The second markable is a substituting possessive pronoun and both markables have the same
34. {(m2h.f0==f0.PPOSAT)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)||
→ The second markable is an attributive possessive pronoun and both markables have the same
35. {(m2h.f0==f0.PRELAT)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)||
→ The second markable is an attributive relative pronoun and both markables have the same
36. {(m2h.f0==f0.PRELS)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)||
→ The second markable is a substituting relative pronoun and both markables have the same
37. {(m1h.f1==m2h.f1)&&(m1h.f1!=f1.unknown)}
→ Both markables have the same number which is not unknown
→ Both markables have the same gender which is not unknown
→ Both markables have the same case which is not unknown
→ Both markables have the same person which is not unknown
4.1.4. The performance of the 40 features
By running a script that iteratively adds one link feature to the link feature set and runs SUCRE on that
feature set, one gets evaluation scores for every iteration. Putting them together results in table 4.4 and
for reversed iterations in table 4.5.
The order of the features 1 − 40 (cf. 4.1.3) is derived from the original SUCRE feature set, which has
been provided as a starting point for this study.
One drawback of such a huge feature set is that these tables are not very clear. One problem for this
is that the link features are dependent on others. So, a decrease of the MUC-B3 -F1 -score after adding
a feature to the feature set does not indicate a bad feature performance in general as it might contribute
its power in dependence of features that are added later on. Analogously, an increase of the score after a
feature adding may also indicate a positive impact on already added features.
68
Nonetheless, one trend is clear: with very few features, the MUC-score is low (because of a small
recall) and the B3 -score is great (because of no or few coreference links). By adding features, the MUCscore increases and therewith the MUC-B3 -score.
B3
MUC
Features
1
1-2
1-3
1-4
1-5
1-6
1-7
1-8
1-9
1-10
1-11
1-12
1-13
1-14
1-15
1-16
1-17
1-18
1-19
1-20
1-21
1-22
1-23
1-24
1-25
1-26
1-27
1-28
1-29
1-30
1-31
1-32
1-33
1-34
1-35
1-36
1-37
1-38
1-39
1-40
C
0.0
0.0
0.0
0.0
15.0
1108.0
1195.0
1199.0
1199.0
1199.0
1208.0
1251.0
1251.0
1350.0
1692.0
1691.0
1604.0
1604.0
1625.0
1625.0
1650.0
1782.0
1912.0
1912.0
2031.0
2031.0
1875.0
1875.0
1896.0
1896.0
1897.0
1986.0
2063.0
2063.0
2200.0
2203.0
2437.0
2437.0
2456.0
2467.0
2469.0
P
0.0
0.0
0.0
0.0
0.5556
0.4689
0.4518
0.4481
0.4481
0.4481
0.376
0.3826
0.3826
0.3715
0.3693
0.369
0.366
0.366
0.3644
0.3644
0.3601
0.3723
0.3299
0.3299
0.3325
0.3325
0.3206
0.3206
0.3227
0.3227
0.3227
0.3313
0.3331
0.3331
0.381
0.3809
0.4092
0.409
0.4063
0.4013
0.403
R
0.0
0.0
0.0
0.0
0.0046
0.3362
0.3626
0.3638
0.3638
0.3638
0.3665
0.3796
0.3796
0.4096
0.5134
0.513
0.4867
0.4867
0.493
0.493
0.5006
0.5407
0.5801
0.5801
0.6162
0.6162
0.5689
0.5689
0.5752
0.5752
0.5755
0.6025
0.6259
0.6259
0.6675
0.6684
0.7394
0.7394
0.7451
0.7485
0.7491
F1
nan
nan
nan
nan
0.009
0.3916
0.4023
0.4015
0.4015
0.4015
0.3712
0.3811
0.3811
0.3896
0.4296
0.4292
0.4178
0.4178
0.4191
0.4191
0.4189
0.4409
0.4206
0.4206
0.4319
0.4319
0.4101
0.4101
0.4134
0.4134
0.4136
0.4276
0.4348
0.4348
0.4851
0.4853
0.5268
0.5266
0.5259
0.5224
0.5241
C
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
P
1.0
1.0
1.0
1.0
0.9989
0.9114
0.8951
0.893
0.893
0.893
0.8501
0.8478
0.8478
0.8224
0.7599
0.7597
0.7738
0.7738
0.768
0.768
0.7575
0.7443
0.6524
0.6524
0.6276
0.6276
0.6496
0.6496
0.6478
0.6478
0.6477
0.6409
0.6212
0.6212
0.6711
0.6705
0.6614
0.6611
0.6533
0.6449
0.6457
R
0.7549
0.7549
0.7549
0.7549
0.7557
0.8118
0.8177
0.8184
0.8184
0.8184
0.8191
0.8228
0.8228
0.8254
0.8449
0.8449
0.8396
0.8396
0.8416
0.8416
0.8413
0.848
0.852
0.852
0.8585
0.8585
0.8518
0.8518
0.8523
0.8523
0.8547
0.8596
0.8594
0.8594
0.8795
0.8798
0.8936
0.8935
0.8964
0.8985
0.9013
F1
0.8603
0.8603
0.8603
0.8603
0.8605
0.8587
0.8546
0.8541
0.8541
0.8541
0.8343
0.8351
0.8351
0.8239
0.8002
0.8
0.8054
0.8054
0.8031
0.8031
0.7972
0.7928
0.739
0.739
0.7251
0.7251
0.7371
0.7371
0.7361
0.7361
0.737
0.7343
0.7211
0.7211
0.7613
0.761
0.7602
0.7599
0.7558
0.7508
0.7524
MUC-B 3
F1
0.0179
0.5379
0.5471
0.5463
0.5463
0.5463
0.5138
0.5233
0.5233
0.5291
0.559
0.5587
0.5502
0.5502
0.5508
0.5508
0.5492
0.5667
0.5361
0.5361
0.5414
0.5414
0.527
0.527
0.5295
0.5295
0.5298
0.5404
0.5425
0.5425
0.5926
0.5926
0.6223
0.6221
0.6202
0.6162
0.6178
Table 4.4.: Cumulative performance of the 40 original features
69
B3
MUC
Features
40
40-39
40-38
40-37
40-36
40-35
40-34
40-33
40-32
40-31
40-30
40-29
40-28
40-27
40-26
40-25
40-24
40-23
40-22
40-21
40-20
40-19
40-18
40-17
40-16
40-15
40-14
40-13
40-12
40-11
40-10
40-9
40-8
40-7
40-6
40-5
40-4
40-3
40-2
40-1
C
255.0
255.0
255.0
255.0
549.0
549.0
549.0
549.0
549.0
687.0
687.0
687.0
710.0
710.0
736.0
858.0
898.0
898.0
919.0
1001.0
1011.0
1011.0
1017.0
1017.0
1243.0
1258.0
1257.0
1254.0
1254.0
1254.0
1254.0
1254.0
1254.0
1260.0
2379.0
2387.0
2390.0
2496.0
2515.0
2469.0
P
0.7163
0.7163
0.7163
0.7163
0.7439
0.7439
0.7439
0.7439
0.7439
0.6562
0.6562
0.6562
0.6502
0.6502
0.6334
0.5905
0.5995
0.5995
0.5914
0.5753
0.58
0.58
0.5788
0.5788
0.5863
0.5873
0.5888
0.5885
0.5885
0.5885
0.589
0.589
0.5983
0.4651
0.429
0.4613
0.4608
0.4158
0.4082
0.403
R
0.0774
0.0774
0.0774
0.0774
0.1666
0.1666
0.1666
0.1666
0.1666
0.2084
0.2084
0.2084
0.2154
0.2154
0.2233
0.2603
0.2725
0.2725
0.2788
0.3037
0.3067
0.3067
0.3086
0.3086
0.3771
0.3817
0.3814
0.3805
0.3805
0.3805
0.3805
0.3805
0.3805
0.3823
0.7218
0.7242
0.7251
0.7573
0.763
0.7491
F1
0.1397
0.1397
0.1397
0.1397
0.2722
0.2722
0.2722
0.2722
0.2722
0.3164
0.3164
0.3164
0.3236
0.3236
0.3302
0.3613
0.3746
0.3746
0.379
0.3975
0.4013
0.4013
0.4025
0.4025
0.459
0.4627
0.4629
0.4621
0.4621
0.4621
0.4623
0.4623
0.4651
0.4197
0.5381
0.5636
0.5635
0.5368
0.5319
0.5241
C
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
13446.0
P
0.9929
0.9929
0.9929
0.9929
0.9896
0.9896
0.9896
0.9896
0.9896
0.9693
0.9693
0.9693
0.9677
0.9677
0.9637
0.9485
0.9479
0.9479
0.9442
0.9279
0.9292
0.9292
0.9275
0.9275
0.911
0.9089
0.9095
0.9097
0.9097
0.9097
0.9097
0.9097
0.9138
0.8672
0.7004
0.7374
0.7367
0.6526
0.6383
0.6457
R
0.7629
0.7629
0.7629
0.7629
0.784
0.784
0.784
0.784
0.784
0.7933
0.7933
0.7933
0.7953
0.7953
0.7974
0.8049
0.806
0.806
0.807
0.81
0.8108
0.8108
0.8103
0.8103
0.8249
0.8246
0.825
0.8258
0.8258
0.8258
0.826
0.826
0.8256
0.83
0.8967
0.8978
0.898
0.9024
0.9044
0.9013
F1
0.8628
0.8628
0.8628
0.8628
0.8749
0.8749
0.8749
0.8749
0.8749
0.8725
0.8725
0.8725
0.8731
0.8731
0.8727
0.8709
0.8713
0.8713
0.8702
0.865
0.866
0.866
0.8649
0.8649
0.8658
0.8647
0.8652
0.8657
0.8657
0.8657
0.8658
0.8658
0.8674
0.8482
0.7865
0.8097
0.8094
0.7574
0.7484
0.7524
Table 4.5.: Reversed cumulative performance of the 40 original features
70
MUC-B 3
F1
0.2404
0.2404
0.2404
0.2404
0.4152
0.4152
0.4152
0.4152
0.4152
0.4644
0.4644
0.4644
0.4722
0.4722
0.4791
0.5108
0.524
0.524
0.528
0.5447
0.5484
0.5484
0.5494
0.5494
0.6
0.6028
0.6031
0.6026
0.6026
0.6026
0.6028
0.6028
0.6056
0.5615
0.639
0.6646
0.6644
0.6283
0.6218
0.6178
4.2. One problem with distance features
4.2. One problem with distance features
As the MUC-F1 -score (58.4%) and B3 -F1 -score (81.1%) in SemEval-2010 would result in the MUC-B3 F1 -score 67.9% (cf. formula 4.1), the initial MUC-B3 -F1 -score (61.78%) is much below this baseline.
2 · 0.584 · 0.811
0.947248
2 · F1M U C · F1B 3
=
=
≈ 0.67903
(4.1)
F1M U C + F1B 3
0.584 + 0.811
1.395
The best result that is visible in the tables 4.4 and 4.5 is 66.46%. Considering the results in table
4.5, there is a rapid decrease when adding the first three features. These boolean features describe the
distance between the markables in terms of sentences:
F1M U C _B 3 =
But as Hobbs (1986) found out, 98% of the antecedents of a pronoun are in the same or in the previous
sentence. Moreover, McEnery et al. (1997) showed that in 86.64% of cases, the antecedent is within a
window of three sentences (Kobdani and Schütze, 2010b).
Furthermore, all approaches presented in chapter 2 use a feature for sentence distance. Also the German approaches in (2.5) that are based on the TüBa-D/Z corpus engage several features for the distance.
Therefore, it is not plausible why these three features perform that bad. One assumption is that there
is an inconsistency in the original SemEval dataset and the underlying TüBa-D/Z dataset. This inconsistency has to be investigated in future work. For the research in this diploma thesis, one solution is
to remove the first three features about distance in terms of sentences and rerun SUCRE with only 37
features. Thereby, a baseline is created that is similar to the one achieved in SemEval-2010. Table 4.6
shows the scores of the new baseline.
MUC-correct
MUC-Precision
MUC-Recall
MUC-f-score
2390
0.460767
0.725121
0.56348
MUC-B3 -f-score
CEAFM -all
13446
CEAFM -Precision
0.718132
CEAFM -Recall
0.718132
CEAFM -f-score
0.718132
BLANC-Attraction-f-score 0.315814
BLANC-Repulsion-f-score 0.987133
BLANC-Precision
0.611977
B3 -all
B3 -Precision
B3 -Recall
B3 -f-score
0.664425
CEAFE -correct
CEAFE -Precision
CEAFE -Recall
CEAFE -f-score
BLANC-Recall
BLANC-f-score
RAND-accuracy
13446
0.73673
0.898049
0.80943
7139.44
0.864444
0.703393
0.775647
0.742423
0.670918
0.974741
Table 4.6.: Results of the new baseline
So, for the subsequent analysis these scores will be considered as baseline.
4.3. Error analysis in false positives
This section focusses on the analysis of false positives. That means it is to figure out why two disreferent
markables are classified as coreferent. The most often occurring link errors are divided into some main
groups for which a solution could move the links to True Negative (TN).
71
Each group provides some self-created examples and/or some out of SUCRE’s output. Further examples to each group are given in the appendix E.1.
4.3.1. The second markable is indefinite
In some misclassifications the second markable is indefinite. The first markable occurs as a definite
common noun whose head exactly matches the head of the second markable. Such a problem occurs
when a feature like feature no. 5 (Exact string match of both markables’ heads) outweighs a feature like
feature no. 7 (Second markable starts with an indefinite article). The same problem might occur with
feature no. 6 (Case-insensitive substring match of both markables’ heads).
(9)
a.
b.
c.
d.
e.
72
Erst in den 70er Jahren entstanden Übersetzungen und Inszenierungen , die jedoch für die
grell-grausigen aber poetischen Stücke keine überzeugenden Lösungen fanden .
Da dies auch Armin Holz nicht gelungen zu sein scheint ( siehe nebenstehende Rezension )
, bleibt ( Valle-Inclan , der Exzentriker der Moderne )m1 , auch weiterhin ( ein Geheimtip )m2
.
D. N.
(ID: 60x61); (Calc-Prob:52)
Besonders Margit Bendokat als La Tatula , Bärbel Bolle als die verstorbene Juana la
Reina und Corinna Harfouch als Mari-Gaila verliehen ( der Inszenierung )m1 wichtige
Glanzpunkte .[. . . ]
Man hatte sich am Ende einer erfolgreichen Spielzeit von dieser letzten Premiere im
Deutschen Theater , mit der übrigens Ignaz Kirchner ursprünglich sein Debüt als neues
Ensemblemitglied hätte geben sollen , mehr versprochen .
Kirchner wird nun erst im Herbst mit ( einer Langhoff-Inszenierung )m2 seine Arbeit aufnehmen
.
(ID: 218x250); (Calc-Prob:52)
Das Bühnenbild von Peter Schubert erzielt mit großen Stahlkäfigen zwar Wirkung , die
lachsfarbene Wandbespannung erscheint aber für die Atmosphäre ( des Stückes )m1 fast
wieder zu schick .[. . . ]
Besonders Margit Bendokat als La Tatula , Bärbel Bolle als die verstorbene Juana la Reina
und Corinna Harfouch als Mari-Gaila verliehen der Inszenierung wichtige Glanzpunkte .
Diese wenigen atmosphärischen Momente lassen ( ein dichtes , interessantes Stück )m2
erahnen , das aber in dieser Fassung weit unter dem Möglichen inszeniert scheint .
(ID: 208x221); (Calc-Prob:52)
Gott guckt uns nicht zu , ( der )m1 hat der Welt längst den Rücken gekehrt “, läßt der
spanische Dichter Ramon del Valle-Inclan einen seiner Helden gleich zu Beginn seiner
Grotske “Wunderworte “erklären .[. . . ]
Und wirklich : viel kann Gott in dem kleinen galizischen Dorf trotz all der katholischen
Frömmigkeit seiner Bewohner nicht verloren haben .
Als die Witwe Juana la Reina plötzlich auf offener Straße stirbt und ihrem irren Sohn
Laureano ( ein einträgliches Geschäft )m2 hinterläßt , möchte sich so mancher in ihrer
Verwandtschaft gerne vor den Karren spannen , in dem die Verblichene ihren irren Laureano von Jahrmarkt zu Jahrmarkt geschoben hatte .
Bei der Polizei erfuhr ( die alte Dame )m1 , daß es sich bei ihrem Fall nicht um ein Vergehen handele , welches von Amts wegen verfolgt werden könne .[. . . ]
Sie hat dabei noch Glück gehabt .
( Eine andere alte Dame , der gleiches widerfuhr )m2 , mußte einen Monat auf dem örtlichen
Bahnhof nächtigen , sozusagen als Obdachlose .
(ID: 389x432); (Calc-Prob:83)
f.
Die sonderbare Art und Weise der Hausbesetzung kommt noch aus Zeiten , als aufgrund
bürokratischer Verwicklungen Sozialwohnungen leerstanden , die dann von ( wilden
Mietern )m1 besetzt wurden .[. . . ]
Inzwischen klagt dieser beim Obersten Gerichtshof , dessen Richter vorsichtshalber auch
gleich in Urlaub gefahren sind .
Einer von ihnen erklärte einer Zeitung schon mal anonym , es sei durchaus rechtens , wenn
man ( wilde Mieter )m2 auf eigene Faust rauswerfe .
(ID: 442x492); (Calc-Prob:52)
In example (9a), there is no apparent reason for creating a coreference link except that the first and
second markable are adjacent. Indeed, there is a copulative construction but such constructions are not
annotated as coreferent in the corpus.
Example (9b) shows a typical combination of substring match of the markables’ head and an indefinite
determiner attending the second markable. This kind of string match will be discussed in (4.3.5).
In (9c), the same problem occurs with an exact string match of the heads.
The last two examples might be solved by modifying the triggering features (i.e. string-matching
features) such that the second markable must not be indefinite.
Example (9d) combines a masculine demonstrative pronoun with an indefinite neuter common noun.
This mismatch of gender is discussed in (4.3.10). Apart from indefinite pronouns, any kind of pronoun
is definite and thus this example again shows a shift from definite to indefinite.
In (9e), one interesting detail is the keyword andere, which indicates that the first and the second markable, if string matched, are disreferent. Similar keywords might be neue∼, alternative∼ or spätere∼.
The tilde means any inflectional suffix or none. Therefore, a further string-matching feature modification
might be a conjunction that returns TRUE in the case that the second markable does not contain any of
the aforementioned keywords.
The last example, (9f), shows a difficult case of indefiniteness as indefinite common nouns in plural
do not bear any determiner starting with ein.
Considering all these cases of false positives with an indefinite second markable, the question comes
up whether there is any coreferent case where the second markable occurs to be indefinite. To check this,
all TPs are analyzed whether the second markable starts with ein. Some results are presented in example
(10):
(10)
a.
b.
c.
Der Gang zum Sozialamt wird zum Spießrutenlauf .
“Manchmal stelle ( ich )m1 mir vor , daß alle in den Büros freundlich zu mir sind “,
beschreibt ( ein Obdachloser )m2 seinen Tagtraum während der Wartezeit .
Freundlichkeit hält auch der Diplom-Psychologe Klaus Hartwig * für eine der wichtigsten
Voraussetzungen im Umgang zwischen Sachbearbeitern und Hilfeempfängern .
(code: 15067-15067x15083-15084); (prob:53)
( Neue Studie zum Rauchen )m1 [. . . ]
Was Columbus vor 500 Jahren von den Indianern als Genußmittel nach Europa brachte ,
ist längst zum schädlichen Laster degradiert : der Tabak .
( Eine neue Studie des britischen Epidemiologen Richard Peto )m2 hat dieses Risiko jetzt
exakt quantifiziert .
(code: 18844-18847x18872-18879); (prob:83)
Naheliegende Frage : Warum wehrt sie sich dann so vehement gegen ( ein generelles Verbot
für Zigarettenreklame )m1 ?[. . . ]
Warum wird überhaupt noch geworben , wenn es so nutzlos ist ?
Acht von zwölf Gesundheitsministern der EG haben sich im November des Vorjahres für
( ein generelles Verbot von Zigarettenreklame )m2 ausgesprochen , vier ( die Minister der
Bundesrepublik , Großbritanniens , Dänemarks und der Niederlande ) waren dagegen .
(code: 29128-29132x29159-29163); (prob:83)
73
d.
e.
f.
in seinen Zeitschriften publizieren so “unbekannte Talente “wie Franz Kafka , Robert
Walser , ( Robert Musil )m1 und andere .[. . . ]
Darüber hinaus ist noch ein von Anne Gabrisch 1987 herausgegebener Auswahlband erhältlich : Franz Blei - Porträts .
Die fehlende Nachwirkung Bleis dürfte ihren Grund darin haben , daß der Schriftsteller
Blei keinen wiedererkennbaren Stil , keinen durchgängigen Erzählton besitzt : weder den
ironisch-epischen eines Thomas Mann , den essayistisch-analytischen ( eines Robert Musil )m2
noch den skeptisch-melancholischen eines Joseph Roth .
(code: 33832-33833x34016-34018); (prob:83)
Aber Schmidt-Braul konnte weder ein zeitweilig favorisiertes Management-buy-outVerfahren zuwege bringen , da er keine Finanzierungsquellen aufzuschließen vermochte ,
noch kompetente Käufer für den Verlag interessieren .
Dort , wo ( sich )m1 dennoch ( eine Verkaufsvariante )m2 abzeichnete , schoß die Treuhand quer , so bei der Bewerbung der Volker-Spieß-Gruppe aus Westberlin , die zuvor
schon den Morgenbuch-Verlag übernommen hatte .
Hier spielte nämlich die Immobilie eine entscheidende Rolle , die eigentlich zu Volk und
Welt gehörte , aber inzwischen von der Treuhand für die Bundesregierung beansprucht
wird , handelt es sich bei dem Haus in der Nähe des Postdamer Platzes doch um ein lukratives 50-Millionen-Objekt .
(code: 36712-36712x36714-36715); (prob:52)
Harald Wolf , PDS-Sprecher für Stadtplanung , monierte darüber hinaus , daß es bisher
keine seriöse Untersuchung zum Innenstadtring gebe , die ( eine Öffnung der Oberbaumbrücke für den Individualverkehr )m1 nahelegt .[. . . ]
An einem » Runden Tisch « sollten Bürger , Initiativen , Verwaltungen und Bezirkspolitiker beteiligt werden , fordert der Umweltverband .
( Eine Öffnung der Oberbaumbrücke für den Autoverkehr )m2 werde die Klimabelastung
verschärfen - insbesondere den Sommersmog .
(code: 42728-42734x42778-42784); (prob:83)
As example (10) shows, there are indeed some coreferent links with an indefinite second markable.
In (10a), there is a shift from direct to indirect speech and thus from first person, definite (all personal
pronouns are definite) to third person, indefinite. Here, the modifications of the string matching feature
described above would not take any effect on this right classification.
The example (10b) joins a headline with a first mention of the entity in the subjacent text. Such a
headline can be captured as a common noun in singular without any determiner.
In example (10c), the coreference can be captured because all words in both markables are the same.
This property can be used as additional factor for the modified string-matching feature.
Example (10d) uses a special expression of a proper name. eines Robert Musil can be captured as
being coreferent with Robert Musil by checking whether the head of the second markable is a proper
name.
One kind of coreference link with an indefinite second markable is given in (10e). Here, the reflexive
pronoun sich is cataphoric and its antecedent is on its right. There is no string match such that any
modification of those features would not take any effect on this instance.
Example (10f) contains a similar markable pair as in (10c). Only the last words differ but here, both
are compound words with the same head.
4.3.2. Wrong assignment of a relative pronoun
A relative pronoun always refers to an antecedent in the leftward matrix clause. But it would be too
simple to exclude all cases where a relative pronoun as first markable precedes the second markable, since
although a relative pronoun always refers to a leftward markable it might be coreferent to a markable in
74
the same sentence on its right-hand side. This can be the case when the relative pronoun constitutes a
grammatical function which is referred to by a reflexive pronoun or a possessive pronoun like in (11).
(11)
a.
b.
Der Hund, dem sein Herrchen sein Stöckchen gibt,. . .
Der Mann, der sich rasierte, . . .
In (11) all underlined markables are coreferent. So, for the relative pronoun as first markable, the
features has to enforce the second markable to be a reflexive pronoun or a possessive pronoun.
A lot of false positives concerning the association with relative pronouns show a connection between
a relative pronoun as first markable and a succeeding common noun as second markable. The examples
in (12) show some cases of wrong assignment of the relative pronoun.
(12)
a.
b.
c.
d.
Der neben Garcia Lorca bedeutendste spanische Dramatiker des 20. Jahrhunderts wurde
für das deutschsprachige Theater spät entdeckt .
Erst in den 70er Jahren entstanden Übersetzungen und ( Inszenierungen )m1 , ( die )m2
jedoch für die grell-grausigen aber poetischen Stücke keine überzeugenden Lösungen fanden .
Da dies auch Armin Holz nicht gelungen zu sein scheint ( siehe nebenstehende Rezension
) , bleibt Valle-Inclan , der Exzentriker der Moderne , auch weiterhin ein Geheimtip .
Aus seinem umfänglichen dichterischen Schaffen ragen vor allem die von ihm kreierten
esperpentos heraus , Schauerpossen , die die von Leidenschaft und Gewalt deformierte
Gesellschaft wie durch einen Zerrspiegel betrachten .
Zu ( diesem Genre )m1 gehören neben den Wunderworten ( ( die )m2 im Original als Tragikomödie
untertitelt ist ) auch die Dramen Glanz der Boheme und die Trilogie Karneval der Krieger
.
Es sind sperrige , sprachgewaltige Grotesken , die Mystik und Mythen karikieren und eine
erhebliche Fortschrittsskepsis ausdrücken .
Denn das darf man nur , wenn man eine Ersatzwohnung ...
Aufgrund des gleichen Paragraphen gibt es in Warschau inzwischen Tausende ( kommunaler
Sozialmieter )m1 , ( die )m2 ihre Zahlungen eingestellt haben - Kündigung droht ihnen
deshalb nicht .
Die Stadt hat inzwischen sogar schon private Schuldenjäger beauftragt , die Mieten einzutreiben
.
(ID: 512x514); (Calc-Prob:51)
Kürzungspläne des Senats , etwa die angedachte und schließlich verworfene Schließung
der Deutschen Oper , würden nur das Angebot mindern und wirkten sich » schädigend auf
den Tourismus aus « .
Abseits aller kulturellen Angebote hält Berlin für Busch-Petersen , ( der )m1 bis 1989 »
hinter der Mauer ausgehalten hat « , einen gewaltigen Trumpf in der Hand , wie er erst
kürzlich wieder feststellen konnte : ( Ein schwedischer Gast , den er durch den Ostteil
führte )m2 , sei von den Einschußlöchern an den Häusern » ganz fasziniert « gewesen .
Er sei zwar nicht dafür , alles zu konservieren , aber » ein bißchen von dem , was gewesen
ist , sollten wir erhalten « .
(ID: 1430x1439); (Calc-Prob:53)
Example (12a) shows a difficult case since this combination of markables is absolutely possible. The
relative pronoun die can be used for markables in any gender in plural and thus, die can refer to both
Übersetzungen und Inszenierungen and only to Inszenierungen as both markables are in plural.
In general, this problem cannot be solved in an easy manner. Possibly, even by considering the whole
context including world knowledge, this uncertainty might remain unanswered. One solution might be
75
the annotation. If one considers the entry for the relative pronoun and the corresponding antecedent in
the word table, the value for gender (i.e. both) leaps to the eye (cf. figure 4.1).
183
184
185
187
Übersetzungen
und
Inszenierungen
die
1
1
1
1
1
1
1
1
10
10
10
10
NN
KON
NN
PRELS
plural
unknown
plural
plural
female
unknown
female
both
nominative
unknown
nominative
nominative
unknown
unknown
unknown
unknown
Figure 4.1.: The relative pronoun referring to a conjunction
One indicator that the relative pronoun refers to both Übersetzungen and Inszenierungen might be the
gender value both, which is implemented in various corpora in SUCRE, although it might not be necessary because of the gender value unknown. It is mainly used for die, as it can refer both to plural
masculine antecedents and to plural feminine antecedents. As the two antecedent candidates Übersetzungen und Inszenierungen and Inszenierungen provide different syntactic structures, this is a task of
syntactical disambiguation rather than pure coreference resolution. Given a syntactic parser with an adequate disambiguation module, by definition, the gender could be set to both for the indication of the
reference to a conjunction, as the connected markables might have different gender values (although this
is not the case in the example in figure 4.1, where both markables are female). So, a possible feature is
one that checks whether m2 is a relative pronoun and then if m1 is annotated as a part of a conjunction.
If both is true and the gender value for m2 is both, the feature would vote for a disreference between m1
and m2 . As the gender value both also refers to markables that do not contain entities with a clear gender
(e.g. Staatsangehörige), an additional check for this feature might be the gender of m1 . If m1 does not
have the gender value both, then the feature votes for a disreference between m1 and m2 , otherwise, it
would proceed in the aforementioned way.
However, the annotation of the relative pronoun die always contains the gender value both, if it is in
plural. Here, the annotation could be advanced for solving this problem.
In example (12b), the antecedent for the relative pronoun is preceding but not adjacent. In German, it
is unusual to have markables between a relative pronoun and its preceding antecedent (only verb forms,
a comma/parenthesis or a preposition is allowed). This problem can be solved by checking whether there
is any markable between m1 and m2 or whether the distance between m1 and m2 is greater than 2 (i.e.
a comma/parenthesis and a possible preposition).
Example (12c) combines the relative pronoun in m2 with its possibly antecedent on the left, which is
however embedded in another markable which is the right antecedent to m2 . The head of m1 has the
relation tag GMOD indicating a modifier in genitive. This combination can be checked in a feature but it
is also possible that this kind of markable connection shows a right coreference link.
In (12d) the relative pronoun is found in m1 . As mentioned above with example (11) this can only be
the case if the second markable is a possessive or reflexive pronoun.
One kind of example that frequently occurs in the true positives was the inclusion of m2 in m1 as it
is the case in example (13a). Here, a feature might check whether m2 is within m1 , given that m2 is a
relative pronoun.
(13)
76
a.
1929 wurde er wegen seiner Gegnerschaft zur Diktatur Primo de Riveras kurzfristig inhaftiert , leitete ab 1933 dann aber die spanische Kunstakademie in Rom .
esperpentos heraus , ( Schauerpossen , ( die )m2 die von Leidenschaft und Gewalt
deformierte Gesellschaft wie durch einen Zerrspiegel betrachten )m1 .
Zu diesem Genre gehören neben den Wunderworten ( die im Original als Tragikomödie
.
4.3.3. Relative proximity in context
One problem that often occurs is a reference between a markable m1 and m2 which is admittedly possible
(i.e. there is no incompatibility between them) but nevertheless markable m2 is disreferent to m1 and
refers to an entity denoted by a markable between m1 and m2 . Some examples of these false positives
are given in (14):
(14)
a.
Gott guckt uns nicht zu , der hat der Welt längst den Rücken gekehrt “, läßt der spanische Dichter Ramon del Valle-Inclan einen seiner Helden gleich zu Beginn ( seiner )m1
Grotske “Wunderworte “erklären .
Frömmigkeit ( seiner )m2 Bewohner nicht verloren haben .
Laureano ein einträgliches Geschäft hinterläßt , möchte sich so mancher in ihrer Verwandtschaft gerne vor den Karren spannen , in dem die Verblichene ihren irren Laureano
von Jahrmarkt zu Jahrmarkt geschoben hatte .
b.
In der Inszenierung von Armin Holz in den Kammerspielen des Deutschen Theaters scheint
dieser bittere Kern hinter viel Regie-Schnickschnack wieder zu verschwinden .
Als habe ( er )m1 von Anfang an klarstellen wollen , daß hier lediglich eine “Schauperposse “zu sehen ist , läßt Armin Holz den Figuren des Stücks gar keine Gelegenheit ,
( sich )m2 erst einmal sinnhaft vorzustellen .
Bernd Stempel als Pedro Gailo muß in seinem viel zu kleinen Anzug und einem zweifarbigen Rauschebart über die Bühne stelzen und sich von der überzeichneten Zuhälterfigur
Septimo ( Ulrich Haß ) verhöhnen lassen , seine Frau Mari-Gaila und Schwester Marica
agieren gleich zu Beginn so exaltiert , daß einem der später so wüst entbrennende Streit
über den lukrativen Karren nicht mehr bizarr , sondern nur noch folgerichtig erscheinen
kann .
(ID: 156x163); (Calc-Prob:52)
c.
Hamburg ( ap ) - Ein zwei Jahre alter Schäferhund namens “Prinz “hat im Hamburger
Stadtteil Altona eine Wohnung besetzt .
( Der 24jährige Besitzer )m1 hatte dem Tier am Vortag ( sein )m2 zukünftiges Heim gezeigt
.
Das gefiel dem Hund so gut , daß er unmittelbar hinter der Tür Stellung bezog und niemanden mehr durchließ .
(ID: 325x328); (Calc-Prob:51)
d.
( Sie )m1 hat dabei noch Glück gehabt .[. . . ]
Eine andere alte Dame , der gleiches widerfuhr , mußte einen Monat auf dem örtlichen
Dann starb ( sie )m2 dort .
(ID: 428x435); (Calc-Prob:53)
e.
Bei der Polizei erfuhr die alte Dame , daß es sich bei ihrem Fall nicht um ( ein Vergehen
handele , welches von Amts wegen verfolgt werden könne )m1 .[. . . ]
Helena begab sich zu Gericht .
( Dieses )m2 gab ihr recht und verurteilte die wilde Mieterin dazu , die Wohnung zu verlassen .
(ID: 396x400); (Calc-Prob:52)
77
f.
Daß ( Nawrocki )m1 von dieser bigotten Inszenierung profitiert , ist weder sein Verdienst
noch von ihm gewollt .
Mit ( ihrem )m2 neuen Geschäftsführer steht die privatwirtschaftliche Marketing GmbH
nunmehr gleichgewichtig neben der öffentlich-rechtlichen Olympia GmbH .
An finanzieller Potenz und Aktionsradius wird sie ihr bald überlegen sein .
(ID: 668x673); (Calc-Prob:51)
Example (14a) couples two possessive pronouns that show an exact string match: seiner. But m1 refers
to der spanische Dichter Ramon del Valle-Inclan and m2 to dem kleinen galizischen Dorf, a markable
that is between m1 and m2 .
In (14b) the personal pronoun er is combined with the reflexive pronoun sich. Such a combination in
different sentences is excluded by the prefilters (4.1.2) but in the current sentence, there are finite embedded clauses and thus several subjects. A reflexive pronoun can only refer to a subject (cf. (4.3.4)). In this
case, the reflexive pronoun refers to an implicit subject, because it occurs in an infinitival subordinated
clause, whose subject is controlled by the superordinated accusative object den Figuren des Stücks (cf.
control verbs in (4.4.1)).
The markable pair in (14c) is a very descriptive example for an ambiguity. Markable m2 (sein) can
refer to Der 24jährige Besitzer as well as to dem Tier. With regard to semantics, the interest of an animal
has probably more focus on its own home and not on the one of its owner. So, the given reading is
unlikely and markable m2 has to corefer with the intermediate markable dem Tier.
In example (14e), markable m2 is a demonstrative pronoun. Those have the property to corefer with
those compatible expressions that are most salient. One indicator for salience is proximity in context: the
closer an expression the more salient it is, if it is compatible. Thus, the markable Gericht is more salient
than m1 (ein Vergehen . . . ).
One exception for a more salient markable being between the connected markables is given in (14f).
Here, m2 is cataphoric and thus corefers with the succeeding antecedent die privatwirtschaftliche Marketing GmbH.
At this point it is not possible to implement a feature that captures this phenomenon, since link features
are only defined on two markables m1 and m2 . However, a possible feature would be one that returns
1 (i.e. TRUE) if none of the markables between m1 and m2 is compatible with m2 with respect to all
distinctive atomic features like gender, number, person or semantic class. Given the constants m1 and
m2 referring to antecedent and anaphor, a first order predicate logic representation of this feature might
be:
∀m3 ((m1b.txtpos < m2b.txtpos) ∧ (m1e.txtpos < m2e.txtpos)
∧ (m1b.txtpos < m3b.txtpos) ∧ (m1e.txtpos < m3e.txtpos)
∧ (m3b.txtpos < m2b.txtpos) ∧ (m3e.txtpos < m2e.txtpos))
⇒ ((m3h.f 1 ! = m2h.f 1) ∨ (m3h.f 2 ! = m2h.f 2)
∨ (m3h.f 4 ! = m2h.f 4) ∨ (m3h.semcls ! = m2h.semcls)))
It says that for all markables m3 (in a given set of markables) if m1 precedes m2 and m1 precedes m3
and m3 precedes m2 (i.e. there is the linear order m1 < m3 < m2 ), then m3 is incompatible with
m2 in at least one of the features number, gender, person or semantic class. If they are compatible in
all of those features, the implication’s consequence is false and the feature returns 0 (i.e. FALSE) (as
(1 ⇒ 0) ⇔ 0).
A possible way of introducing such a universal quantified m3 is the shift from markable pairs to triples
containing m1 , m2 and a set M that contains all markables inbetween.
The effect of this final feature (with an implicit universal quantified m3 ) will be shown by example
(14d), repeated in example (15):
(15)
78
Eine andere alte Dame , der gleiches widerfuhr , mußte einen Monat auf dem örtlichen Bahnhof nächtigen , sozusagen als Obdachlose .
(ID: 428x435); (Calc-Prob:53)
Given the markables ( Sie )m1 and ( sie )m2 , every further markable between m1 and m2 has to be
incompatible to m2 in at least one of the atomic features: gender, number, person or semantic class.
One possible markable between m1 and m2 is “Eine andere alte Dame”m3 . This markable has the
atomic features < gender, f emale >, < number, singular >, < person, third > and
< semantic class, person >. All of these atomic features match to ( sie )m2 . Thus the link feature
returns 0 - one vote for m1 and m2 being disreferent.
One have to remark that this solution just captures anaphoric (rather than cataphoric) coreference
relations. Thus, example (14f) would not be captured. But cataphoric relations are less common than
anaphoric relations (e.g. in the TüBa-D/Z version used in (Klenner and Ailloud, 2009), there are 13,818
anaphoric relations but just 1,031 cataphoric relations.) and will be ignored in this solution.
4.3.4. Reflexive pronouns and non-subjects
According to Canoo.net (2011c), there are three kinds of reflexive verb constructions. True reflexive
verbs, reflexive variants of a verb and reflexively used verbs.
True reflexive verbs are those that inherently subcategorize a reflexive pronoun (e.g. “sich schämen”
(“to be embarrassed”)). Those governed reflexive pronouns are semantically empty and they cannot
be exchanged by a personal pronoun. Nevertheless, these pronouns are still caught as markable in the
TüBa-D/Z corpus.
This is the same issue with reflexive variants of a verb. Although the full verb can be used with nonreflexive pronouns, the meaning of the verb is changed (usually) with the use of a reflexive pronoun
(“sich ärgern” vs. “jemanden ärgern” (“to be annoyed” vs. “to bother somebody”)).
On the other hand, the reflexively used verbs govern a referring reflexive pronoun (e.g. “sich rasieren”
(“to shave”)). These reflexive pronouns always refer to the subject in the same sentence (cf. (Canoo.net,
2011c)).
The false positives in (16) have one problem in common: they combine a reflexive pronoun with a
non-subject (e.g. an accusative or dative object):
(16)
a.
b.
c.
Und so mischten sich beim nicht gerade enthusiastischen Schlußapplaus , als Armin Holz
zu seinen Schauspielern auf die Bühne kam , unter die wenigen Bravo-Rufe auch lautstarke Unmutsbekundungen .
Man hatte ( sich )m1 am Ende ( einer erfolgreichen Spielzeit )m2 von dieser letzten Premiere im Deutschen Theater , mit der übrigens Ignaz Kirchner ursprünglich sein Debüt
als neues Ensemblemitglied hätte geben sollen , mehr versprochen .
Kirchner wird nun erst im Herbst mit einer Langhoff-Inszenierung seine Arbeit aufnehmen
.
(ID: 239x240); (Calc-Prob:51)
So verteilte die russische Delegation eine Prognose über die Entwicklung der russischen
Wirtschaft .
Beim Haushaltsdefizit orientierten ( sich )m1 die amtlichen Statistiker noch an der GaidarCamdessus-Vereinbarung : Es soll ( sich )m2 1992 auf deutlich weniger als die festgeschriebenen fünf Prozent , nämlich 2,3 Prozent des Bruttosozialprodukts , belaufen .
Gegenwärtig jedoch liegt es bei 17 Prozent .
(ID: 8248x8252); (Calc-Prob:83)
Es wurde , wie der russische Wirtschaftsstar Gregori Jawlinski anläßlich des Gipfels erinnerte , bereits 1990 und 1991 Michail Gorbatschow als Kreditlinie eingeräumt .
Der Kern , um ( den )m1 das Karussell ( sich )m2 drehte , löst sich somit auf .
79
d.
Hat es den Münchner Rummel überhaupt gegeben ?
(ID: 8376x8378); (Calc-Prob:52)
Nun steht es still - und alle Welt wundert sich , daß es sich nicht von der Stelle bewegt hat
.
Bei der hohen Drehgeschwindigkeit - immer um den Mittelpunkt jenes 24 Milliarden Dollar schweren Hilfspakets für Rußland - ist es eher ein Wunder , daß eine neue Einsicht
dennoch aufspringen konnte : die Erkenntnis , daß ( sich )m1 mit ( einem IWFStandardprogramm )m2 die Schwierigkeiten Rußlands nicht Schlag auf Schlag lösen lassen
, wie Boris Jelzin den Kohls und Bushs klarmachen konnte .
Rußlands Präsident war deshalb wohl wirklich “sehr zufrieden mit dem Treffen “der G7-Regierungschefs aus den USA , Japan , der Bundesrepublik , Frankreich , Großbritannien , Italien und Kanada - auch wenn es nur die konkrete Zusage gab , die erste IWFKredittranche von einer Milliarde Dollar im August zu überweisen .
(ID: 8206x8207); (Calc-Prob:51)
Although the right subject precedes the reflexive pronoun in example (16a), the reflexive pronoun is
linked with the succeeding genitive modifier of a PP-adjunct.
In (16b) two reflexive pronouns are connected. A reflexive pronoun, which is classified as accusative
object, cannot be the subject of a sentence. Thus, both reflexive pronouns need a subject. In the first
case, the subject is directly adjacent (die amtlichen Statistiker) and in the second case, the subject is the
preceding pronoun (Es).
Actually, it is possible to combine a relative pronoun with a reflexive pronoun (cf. example (11)) but
this relative pronoun has to constitute the subject of the relative clause. This is not the case in (16c). Here,
the relative pronoun is an accusative object in a PP-adjunct, whereas the right subject is das Karussell.
In example (16d) the correct antecedent of the reflexive pronoun is the markable die Schwierigkeiten
Rußlands and not, as proposed, the dative object of a PP-adjunct.
To model this issue, the features concerning reflexive pronouns (i.e. the features no. 22 and no. 32 in
(4.1.3)) have to enforce the other (non-reflexive) markable to be a subject.
4.3.5. Problems with substring-matches
In German compound words are not separated by white spaces as it is the case in English. Thus any
compound word AB containing the common nouns A and B substring-matches with A or B:
(17)
a.
b.
c.
SchäferhundAB - HundB
VertragB - ArbeitsvertragAB
HundA - HundehalterAB
Example (17a) is a possible link of coreferent markables that returns a positive substring-matching feature
value. The markables have the same head (both are dogs). On the other hand the markables in example
(17b) might be coreferent but it is unlikely since usually the more informative markable precedes its
repeated mention, that only contains enough information to be coreferred with the first markable - in
an extreme case, this is a pronoun (cf. “Typically, but not always, names and other descriptions are
shortened in subsequent mentions” - (Versley, 2006)). In example (17c) there is a case which definitely
connects two disreferent markables since their heads are different (i.e. a dog vs. a dog owner).
A list of further examples of false positives with a positive substring-matching extracted from the
corpus is shown in example (18).
(18)
80
a.
Das gefiel ( dem Hund )m1 so gut , daß er unmittelbar hinter der Tür Stellung bezog und
niemanden mehr durchließ .[. . . ]
Erst die Feuerwehr konnte beide durch das Fenster befreien .
b.
c.
d.
e.
Herrchen wollte ( den Hundefänger )m2 holen .
(ID: 331x345); (Calc-Prob:52)
Um so mehr , als man das Absurde an dieser Praxis noch auf die Spitze treiben kann .
Im Fall Helena G. verurteilte das Gericht ( die wilde Mieterin )m1 zur Zahlung von ( Miete )m2
.
Doch hätte die nicht gezahlt , hätte Helena G. sie auch nicht rauswerfen können .
(ID: 502x503); (Calc-Prob:52)
Als ob das irgend etwas mit Sport zu tun hätte , daß Hans Anton ein paar Jährchen in
faschistischer Uniform rumgekaspert ist und seine Briefe mit “es grüßt mit ( erhobenem Arm )m1
“unterschrieben hat .[. . . ]
Schließlich stehen sie im Dienst einer großen und gerechten Sache , dem Bankkonto des
IOC .
Also , wenn wir die Briten richtig verstanden haben wollen , handelt es sich bei Juan und
seinen 94 Komplizen aus ( dem Lausanner Marmorpalast )m2 um die korrupteste Bande
auf Gottes Erdboden .
(ID: 577x623); (Calc-Prob:52)
Erstmals seit langer Zeit schaffte die Vielspringerin und Aktivistin in Sachen Dopingbekämpfung die 2-m-Marke nicht und scheiterte bei ( 1,98 m )m1 .[. . . ]
Der Keniate Yobes Ondieki lief über 5.000 Meter in 13:03,58 Minuten eine neue Jahresweltbestleistung .
Damit war er über fünf Sekunden schneller als der hoffnungsvolle Ohrringträger Dieter
Baumann ( Leverkusen ) in ( seinem )m2 deutschen Rekordlauf am 6. Juni in Sevilla .
(ID: 8433x8441); (Calc-Prob:53)
Wie Nawrocki gestern erklärte , erhält Fuchs keine Abfindung , da ( sein Vertrag )m1 regulär am 15. August ausläuft .[. . . ]
Nawrocki selbst erhält für seinen Doppeljob kein zusätzliches Salär .
( Sein Geschäftsführervertrag mit der Marketing GmbH )m2 ist unbefristet .
(ID: 814x836); (Calc-Prob:52)
In (18a) the substring hund matches both markables but as the second markable is a compound word,
they have different heads: (dog vs. dog catcher). This mismatch can be handled by checking whether
one markable starts or ends with the other markable. For this case, some new functions have to be
introduced to the pseudo language. (cf. (5.1.4) and appendix B). One problem that occurs with this
approach are markables with the same head but with inflectional suffixes. For this reason, the feature has
to be modified in the sense that it returns 0 (FALSE) in the case that one markable starts with the other
and does not end with an inflectional suffix.
Another case of substring match is the morphological derivation. For example in (18b), the noun
Mieterin is a derivation of Miete, thus they have the substring Miete in common but refer to absolutely
different entities. But this is not always true as in example (19) one markable is the diminutive of the
other and might corefer with it.
(19)
Kindm1 - Kindleinm2
At this point, it is not exactly clear how to solve this problem. One way could be to take into account
inflectional suffixes but no derivational suffixes except diminutives.
The example (18c) shows a surprising case of two markables that have accidently three sequential
letters in common: Arm vs. Marmorpalast. This problem also occurs with pronouns or other markables
that consists of a very short string. An extreme case is given in (18d). Here, the abbreviation for meter,
m, allegedly corefers with the possessive pronoun seinem, which ends with the inflectional suffix ∼em.
One way of solving this problem might be a comparison of the relative length of each markable. If
one markable has length 3 or less, the other markable must not have the fourfold or sixfold length as it is
the case above.
81
In addition, if two markables string-match, none of them may be a pronoun.
As discussed with example (17b) it is unlikely that a more informative compound word succeeds a
coreferent word. This is also the case in (18e). Here, Vertrag might be coreferent with the succeeding
Geschäftsführervertrag but this coreference in unlikely. For this reason, the feature sketched above for
example (17a) has to enforce the compound word preceding the non-compound word.
Some interesting cases of true coreference between two markables that substring-match are given in
(20):
(20)
a.
Daß Einbruch auch strafbar ist , wenn der Einbrecher nicht mit einem Sack auf dem
Rücken und einer Maske vor dem Gesicht das Weite sucht , ist eine Erkenntnis , die auch
nach Ansicht ( des polnischen Bürgerombudsmanns )m1 die Auffassungsgabe der polnischen Polizei bei weitem übersteigt .[. . . ]
man wilde Mieter auf eigene Faust rauswerfe .
( Ombudsmann Tadeusz Zielinski )m2 will indessen nicht einsehen , daß gewöhnliche Einbrecher besser behandelt werden als die rechtmäßigen Eigentümer bzw. Mieter .
(ID: 479x494); (Calc-Prob:52)
b.
Als ob das irgend etwas mit Sport zu tun hätte , daß ( Hans Anton )m1 ein paar Jährchen
in faschistischer Uniform rumgekaspert ist und seine Briefe mit “es grüßt mit erhobenem
Arm “unterschrieben hat .[. . . ]
Na bittschön , Vollbeschäftigung in Herzogenaurach , und irgendwelche Schuhe müssen
die Sportler ja anziehen !
( Der Spanier-Hansl )m2 kümmert sich wenigstens .
(ID: 571x593); (Calc-Prob:52)
c.
( BUNDESRAT )m1 [. . . ]
Berlin .
( Der Bundesrat )m2 soll sich auf Initiative Berlins für einen verbesserten Kündigungsschutz für Gewerbetreibende einsetzen .
(ID: 1205x1209); (Calc-Prob:52)
Example (20a) shows a combination of valid compound word linkage and an inflectional suffix: Bürgerombudsmanns has the inflectional suffix ∼s but corefers with Ombudsmann, which has no such suffix.
In this case, the modified string-matching feature would deny that m1 ends with m2 . The markable pair
in (20b) resembles this problem. Here, a familiar form of the proper name Hans with a suffix ∼l is used
within a compound word.
At this point, it is not clear how to combine these two issues as the pseudo language is not expressive
enough to use string concatenation (e.g. ’m2h + s’ or ’m1h + l’) as input for a function returning TRUE
in the case, that one string ends with the other.
The markables in (20c) reveal a less problematic case. The first markable BUNDESRAT is the capitalized form of the second markable Bundesrat. One way of solving this is to change the case-sensitive
exact string matching feature (i.e. seqmatch) to a case-insensitive one (i.e. seqmatchlc).
4.3.6. “Es“(“it“) as expletive pronoun in German
The pronoun es differs from others (e.g. er, sie) in several ways. Canoo.net (2011a) mentions 5 different
uses of es:
Personal pronoun: “es” replaces expressions referring to a real-world entity:
(21)
82
Wo ist das Telefon? Es steht auf dem Tisch.
Placeholder for a clause: “es” can be used as a placeholder for a clause that constitutes the subject
or object:
Es freut uns, dass unsere Mannschaft gewonnen hat.
(22)
Provisional subject: The so-called “Vorfeld-es” can be used as a provisional subject that precedes the
actual subject in the sentence. This way, the linear order of constituents in a German declarative
sentence (e.g. the finite verb at the second position) is ensured:
Es steht ein Schrank im Gang.
(23)
Formal subject: “es” functions as merely formal subject of impersonal uses of verbs. In this kind of
use, es is semantically empty and has no reference:
(24)
a.
Es regnet stark.
b.
Es handelt sich um ein Missverständnis.
c.
Wie geht es dir? Mir geht es gut.
d.
Es gibt gute Sachen.
Formal object: “es” can also be used as a formal accusative object in some idiomatic expressions:
(25)
a.
Wir hatten es eilig.
b.
Sie haben es im Leben sehr weit gebracht.
The only kind of use of the pronoun “es” that is of particular interest in the coreference resolution task
is the personal pronoun. In contrast to English, this kind of usage is less common in German, as a lot of
entities with a neuter natural gender are expressed with a common noun that has the grammatical gender
female or male (cf. der Computermale or die Glühbirnef emale ).
Thus, the other four kinds of es-usages described above do not constitute anaphoric markables in
the sense of noun phrase resolution we have defined for SUCRE. Nevertheless, they are detected as
markables in the corpus and therefore, a new group of false positives has been discovered. Example (26)
shows one instance of this group:
(26)
( Es )m1 gibt keine gesellschaftliche Kraft , die die Vorteile eines gemeinsamen Landes nicht
herausstellt .
Allein schon vom Blick auf die Landkarte , in der die große Stadt Berlin mitten im Land
Brandenburg liegt , macht überdeutlich , daß ( es )m2 in Zukunft eine sehr enge wirtschaftliche
, verkehrsmäßige , kulturelle , bildungsmäßige Verflechtung der beiden Länder geben wird .
Dies ist auch aus alternativer Sicht wünschenswert .
(ID: 12125x12135); (Calc-Prob:53)
A possible solution for these non-referring pronouns might be a lexical feature that checks whether
the governing verb is a verb form of geben, regnen or handeln. However, the expressiveness of SUCRE’s
feature definition language is limited to the word-ID of the governing verb, not to its string representation.
As soon as the expressiveness has been increased, the performance of such a feature has to be checked.
However, up to now there is no clue pointing out that a given pronoun es is a referring personal pronoun
or non-referring. Thus the only possibility to capture this phenomenon is to create a feature which returns
FALSE in the case that a personal pronoun equals es.
83
4.3.7. Units, currencies, month names, weekdays and the like
Sometimes the head of a markable is a dimensional unit, a currency, a month name, a weekday or the
like. Although it is not impossible that some of these markables are coreferring, this is rather unusual:
(27)
a.
b.
c.
d.
Von dieser Regelung profitiert bespielsweise auch die Bundesbauministerin Irmgard Schwaetzer ( FDP ) selbst mit ( ihren circa 15.000 Mark monatlichen Bruttoeinkommen )m1 und
ihrem nicht unerheblichen Steuersatz .
Die Ministerin besaß bis Anfang 1991 eine Wohnung in der Bonner Riemannstraße in
Bonn , mit deren Erwerb sie ( bis zu tausend Mark Steuern )m2 im Monat sparen konnte .
Auf großstädtische Verhältnisse umgerechnet nehmen sich solche Summen noch ganz anders aus .
(ID: 1097x1108); (Calc-Prob:83)
( Nur zehn Prozent der umgewandelten Wohnungen )m1 sind an die dort wohnenden Mieter verkauft worden , ein Drittel der Mieter hat die Wohnung verlassen müssen , viele
andere haben mit Kündigungsprozessen zu kämpfen .[. . . ]
Denn das ist häufig der Fall .
( Zwischen 60 und 70 Prozent der Vermieter )m2 , schätzt Hanka Fiedler , wollen nur den
Mieter herausbekommen , um die Wohnung , die dann im Preis steigt , besser verkaufen
zu können .
(ID: 988x1062); (Calc-Prob:83)
Eine Niederlage mußte auch Weltmeister Samuel Matete ( Sambia ) über ( 400 Meter
Hürden )m1 hinnehmen .[. . . ]
Seine 48,18 Sekunden reichten nicht , um Kevin Young ( USA , 47,97 ) zu schlagen .
Über ( 800 Meter )m2 setzte sich der Kenianer William Tanui mit 1:43,62 Minuten gegen
den Weltjahresbesten Johnny Gray ( USA , 1:44,19 ) durch .
(ID: 8476x8481); (Calc-Prob:83)
Mit der gleichen knappen Zeitspanne unterlag im 200-m-Lauf der Afrikaner dem Weltmeister Michael Johnson ( USA ) , der für die halbe Stadionrunde ( 20,10 Sekunden )m1
benötigte .[. . . ]
Eine Niederlage mußte auch Weltmeister Samuel Matete ( Sambia ) über 400 Meter Hürden hinnehmen .
( Seine 48,18 Sekunden )m2 reichten nicht , um Kevin Young ( USA , 47,97 ) zu schlagen
.
(ID: 8473x8478); (Calc-Prob:83)
In (27a) the former German currency Mark constitutes the markable. In this case, it only expresses
an amount of money that is earned monthly and no specific entity as it might be within the context of,
say, a bank robbery. The subsequent examples (27b), (27c) and (27d) contain two allegedly coreferent
markables that express information about percentages, local distances and time spans. None of them are
usually coreferring.
Thus, a simple way of handling this issue is the creation of a link feature that returns FALSE in the
case that the head of the first or second markable exact string matches with one string of a particular set
of keywords (e.g. {Mark, Dollar, Prozent, Meter, Juli, Sekunden, Mittwoch, . . . }) and TRUE otherwise.
An alternative version is to create a prefilter feature that discards all markable pairs which contain a
unit or any other keyword as a markable head.
4.3.8. Problems with the alias feature
The alias function returns TRUE in the case that one markable is the alias of the other and FALSE
otherwise. Two possible TRUE-cases are given in example (28).
84
(28)
a.
b.
Fußballclubm1 - FCm2
Deutsche Markm1 - DM m2
Another positive example of the alias function is in (29), although here, the alias is even expressed
in the corresponding markable:
(29)
Berlin ( taz ) - ( Die staatliche Zentralstelle für Sicherheitstechnik ( ZFS ) in )m1 Düsseldorf
hat einen Riesenauftrag an Land gezogen .
Die Wissenschaftler ( der ZFS )m2 dürfen in den nächsten Jahren bis zu 1.000 Atommüllfässer
aufmachen und ihren strahlenden Inhalt überprüfen .
Die Fässer sollen zur Konditionierung aus dem Atommüllager Gorleben in eine Lagerhalle
nach Duisburg-Wanheim transportiert werden .
(ID: 6064x6067); (Calc-Prob:50)
In the set of false positives, the alias function fires six times and just one time in the set of true
positives. Three of those false positives are given in (30):
(30)
a.
b.
c.
Ivo Knoflicek hat in Österreich endlich einen neuen Verein gefunden : Der frühere CSFRNationalspieler wurde vom ( FC St. Pauli Hamburg )m1 zunächst für ein Jahr an Vorwärts
Steyr ausgeliehen .[. . . ]
TENNIS.
In der ersten Runde ( des Federation Cup vom 12. bis 19. Juli in Frankfurt / Main )m2 trifft
die an Nummer eins gesetzte deutsche Mannschaft um Wimbledon-Siegerin Steffi Graf
auf Außenseiter Neuseeland .
(ID: 8573x8583); (Calc-Prob:50)
WECHSEL
Ivo Knoflicek hat in Österreich endlich einen neuen Verein gefunden : ( Der frühere CSFRNationalspieler )m1 wurde vom ( FC St. Pauli Hamburg )m2 zunächst für ein Jahr an Vorwärts Steyr ausgeliehen .
Leihgebühr : 100.000 Mark .
(ID: 8572x8573); (Calc-Prob:50)
Vorweg : ( Sieben spannende Seiten Widmungen an die Menschen , die Torsten Schmidt
als Junkies knipste )m1 :[. . . ]
S. P.
Ausstellung : bis 6. 8. im Schlachthof ; Buch : “Ich bin einmalig , und daß ich lebe , das
freut mich . Menschen in der Drogenszene “, Rasch und Röhring Verlag , ( 29.80 DM )m2
(ID: 2927x2954); (Calc-Prob:50)
In example (30a) the Fußballclub, “FC”, is misinterpreted as Federation Cup and in (30b) as frühere
CSFR-Nationalspieler, where the F corresponds to frühere and the C to CSFR-Nationalspieler.
The most unlikely alias interpretation is given in example (30c). Here, the Deutsche Mark, “DM”, is
linked with a large markable that contain the sequential words die and Menschen.
Thus, one possibility is to completely remove the alias feature since it has an insufficient predictive
power.
4.3.9. First markable begins with “kein“
Markables that begin with the quantifier kein cannot refer to any other markable outside the quantifier’s
scope. Example (31) shows an instance of such a true coreference. Here, all underlined markables are
coreferent:
(31)
Kein Hund, dem sein Herrchen sein Stöckchen gibt, ärgert sich.
85
A negative example is given in (32):
(32)
» ( Keine Gewalt )m1 « war ein Slogan der großen Novemberdemonstrationen .[. . . ]
Im puren , physischen Sinn war es auch eine weitgehend » gewaltlose « Revolution .
( Die andere Gewalt , die Gewalt der Geschichte , des Alltags , unserer Gefühle und
Vorurteile )m2 , entlud sich dagegen , und sie entlädt sich immer noch .
(ID: 1881x1892); (Calc-Prob:83)
Here, there is an exact string match between the markables’ heads but m2 is not in the scope of the
first markable’s quantifier, kein.
Thus, the only possible anaphora for a markable starting with kein is a relative pronoun, a possessive
pronoun or a reflexive pronoun. These pronouns have to occur in the same sentence as m1 .
A possible feature for this mismatch returns TRUE in the case that the first markable begins with kein
and the second markable is a pronoun of the kinds described above and is in the same sentence. It returns
FALSE otherwise.
The same solution can be done for the quantifier jede∼.
4.3.10. Disagreement in gender and number
This group of false positives is very surprising since a mismatch in gender or number should have been filtered out in the pre-filters (4.1.2). There are several problems that are responsible for this mismatch. Most
of the time, the problem is that some kinds of attributive pronouns (e.g. attributive possessive pronouns
(P P OSAT )) are labeled unknown with respect to some atomic features (e.g. < gender, unknown >).
First, one remark on the annotation of attributive pronouns, e.g. possive pronouns. There are in a way
two kinds of annotation for a possessive pronoun. The first one constitutes the syntactical agreement with
the respective NP-head and the second one constitutes the coreferential agreement with the antecedent.
In example (33) the possessive pronoun seine syntactically agrees with the noun Schuhe and coreferentially agrees with the subject Peter. One can say that this coreferential agreement is a necessary condition
for coreference but no sufficient condition, as there might be another entity with the same coreferential
agreement features.
In the first case (syntactical agreement), the possessive pronoun in (33) has the number value plural
and the gender value male. These values can be considered as based on morphological suffixes: as
seine ends with an e, it cannot agree with an NP-head in singular and non-female, because this e-suffix
indicates either a plural-number or a female-gender in singular. This annotation is the one that is returned
by a parser and that is the only important one for the grammaticality in German, as the noun phrases sein
Schuhe or seine Schuh are not well-formed.
But this annotation is not relevant for coreference resolution. Here, another kind of annotation is
needed that is based on the coreferential agreement. Here, the possessive pronoun in example (33) has
the number value singular and the gender value male or neutral, which are the exact complementary
annotation possibilities as for the syntactical agreement of seine. The coreferential agreement can be
considered to be based on the morphological stem: as seine starts with sein (rather than ihr) it cannot
agree with an antecedent which is singular and female or plural (e.g. Maria/die Männer : seine). This
annotation is usually not returned by a German parser.
(33)
Peterm1 holt seinem2 Schuhe ab.
Figure 4.2 shows another excerpt of the original task dataset from SemEval-2010 (cf. (2.7)).
However, the reason, why almost all attributive pronouns are labeled unknown with respect to the
grammatical attributes, is that there is no gold annotation for them in the original SemEval dataset (cf.
column 7 in line 5 in figure 4.2 but only the automatic parsed annotation (column 8). Thus, the system
returns unknown due to the lack of gold annotation.
86
1 Heute heute heute ADV ADV _ _ 2 2 ADV ADV _ _ _ _ _
2 wählen wählen wählen VVFIN VVFIN _ per=3|num=pl|temp=pres|mood=ind
0 0 ROOT ROOT _ _ _ _ _
3 die die d ART ART cas=n|num=pl|gend=fem cas=n|num=pl|gend=fem 4 4
DET DET _ _ _ _ (179
4 Schottinnen Schottin Schottin NN NN cas=n|num=pl|gend=fem
cas=n|num=pl|gend=fem 2 2 SUBJ SUBJ _ _ _ _ 179)
5 ihr ihr ihr PPOSAT PPOSAT _ cas=a|num=sg|gend=neut 6 6 DET DET _ _
_ _ (4|(179)
6 Parlament Parlament Parlament NN NN cas=a|num=sg|gend=neut
cas=a|num=sg|gend=neut 2 2 OBJA OBJA _ _ _ _ 4)
7 . . . $. $. _ _ 6 6 -PUNCT--PUNCT-_ _ _ _ _
Figure 4.2.: Another sentence in the original SemEval-2010 task dataset
The excerpt in figure 4.2 shows the original dataset of the sentence “Heute wählen die Schottinnen
ihr Parlament”. As described above, the automatic annotation (column 8), that is returned by a German
parser, corresponds to the syntactical agreement (i.e. the possessive pronoun ihr has the number value
singular and the gender value neutral as it agrees with the NP-head Parlament.) The right antecedent
for the possessive pronoun is the noun phrase die Schottinnen, which has completely different values in
terms of case, number and gender.
Consequently, the syntactic agreement with possessive pronouns are irrelevant for coreference resolution, but it is possible to narrow the achievable attribute values by considering the morphological stem of
the possessive pronoun. Canoo.net (2011b) presents table 4.7:
singular
plural
1st person
2nd person
mein∼
unser∼
dein∼
euer∼
3rd person
male
sein∼
female
ihr∼
ihr∼
neuter
sein∼
Table 4.7.: Table of the possessive pronouns
Therefore, if the possessive pronoun starts with sein, the attribute values are <number,singular>, <person,3> and <gender,male/neutral> and so on.
A few examples of the false positives based on this unknown-disagreement are given in (34). For each
example, the attributes are examined and the problem is figured out. Afterwards, a solution is proposed:
(34)
a.
Als habe ( er )m1 von Anfang an klarstellen wollen , daß hier lediglich eine “Schauperposse “zu sehen ist , läßt Armin Holz den Figuren des Stücks gar keine Gelegenheit , sich
erst einmal sinnhaft vorzustellen .[. . . ]
kann .
So kann die Groteske nicht zu ( ihrer )m2 Wirkung kommen , kann nichts von der Form
ins Formlose umschlagen , vom Maßvollen ins Sinnlose kippen .
87
(ID: 156x179); (Calc-Prob:53)
581 er 2 2 29 PPER singular male nominative third
692 ihrer 2 2 31 PPOSAT unknown unknown unknown unknown
Problem: The possessive pronoun ihrer is labeled unknown for the attributes number,
gender, case and person. The personal pronoun er is singular and male but ihrer has
just two other options: (singular and female) or (plural and unknown). The only given
restriction in the prefilters (4.1.2) is a mismatch between female and male.
Solution: One solution is to re-annotate those pronouns manually. Since it is not possible
to decide which attributes to use for a kind of possessive pronoun with the stem ihr (singular and female) or (plural and unknown) (cf. table 4.7), one way is to keep the gender
unknown and introduce a new number value both_ihr and a new restriction in prefilters
such that if a markable has the number both_ihr, the other markable must not be male or
neuter if it is singular. An alternative could be the introduction of a pre-filter that checks
whether the string representation of a markable starts with ihr instead of introducing a
number value for this. Both ways are possible and would lead to the same result. For
pursuing the goal of filling the lack of annotation, the first option with the number value
both_ihr is preferred to the second one, that leaves the number value unknown.
b.
( Deutschen Theater )m1 , mit ( der )m2 übrigens Ignaz Kirchner ursprünglich sein Debüt als neues Ensemblemitglied hätte geben sollen , mehr versprochen .
.
(ID: 242x243); (Calc-Prob:51)
953 Theater 2 2 41 NN singular neutral dative unknown
956 der 2 2 41 PRELS singular female dative unknown
Problem: The first markable Deutschen Theater has the gender value neutral, whereas
the second markable der is the dative form of a relative pronoun with the gender value
female. Thus, there is a mismatch between female and neutral. But there is no restriction
in the prefilters for a mismatch between male and neutral or between female and neutral
given that one markable is not a pronoun (cf. feature no. 8 in (4.1.2)) as there are neutral
common nouns in German (e.g. Mädchen) that can be referred to by female pronouns.
Solution: An introduction of four further features in the prefilters (for each combination
of mismatch of male/female and neutral given that the first markable’s head is not within
a special list of keywords (e.g. Mädchen)) solves this problem.
c.
88
Der zweieinhalbstündige Theaterabend in den Kammerspielen des Deutschen Theaters
blieb dann auch entsprechend unentschieden .
( Viele Gedanken )m1 , Blitzlichter einer Idee wurden angerissen , aber nicht ausgeführt ,
wie zum Beispiel die Inzest-Anspielung zwischen Pedro und ( seiner )m2 Tochter Simonina ( verläßlich gut : Ulrike Krumbiegel ) .
Ramon del Valle-Inclans Katholizismuskritik ist bei Armin Holz so stark in den Hintergrund gedrängt , daß das Kreuz , als es auf die Bühne geschleppt wird , kaum mehr als ein
weiterer Gag sein kann .
(ID: 188x193); (Calc-Prob:51)
4.4. Error analysis in false negatives
727 Gedanken 2 2 33 NN plural male nominative unknown
747 seiner 2 2 33 PPOSAT unknown unknown unknown unknown
Problem: The possessive pronoun seiner is singular and male/neutral (cf. table 4.7) but
the common noun Gedanken is plural. So, there is a mismatch between singular and
plural that is caused by the insufficient annotation of attributive possessive pronouns.
Solution: To solve this problem one can annotate all possessive pronouns with the stem
sein as singular and introduce a new gender value non_fem and some corresponding
pre-filters that prevent to include markable pairs containing sein and any markable with
the gender female.
This section focusses on the analysis of false negatives. Therefore, it is to figure out why two coreferent
markables are misclassified as disreferent. Again, the most often and clearest link errors are grouped
together into some main groups, for which one or several possible solutions are proposed and later
implemented in chapter 5 in order to move the links to True Positive (TP).
Each group provides some examples from SUCRE’s output. Further examples to each group are given
in the appendix E.2.
4.4.1. Reflexive pronouns with non-subjects or considerable sentence
distance
In this group, two main problems are revealed. First, based on the architecture of SUCRE, there are two
kinds of false negatives (FN) that have to be distinguished from each other. Second, there are more
complex syntactic structures that enforces the system to allow a reflexive pronoun to corefer with, say,
an accusative object in the same (global) sentence.
(35)
a.
b.
c.
Der 71jährige Ober-Olympier kommt aber auch wirklich nicht gut weg : Nicht nur , daß
er die ( ihm )m1 anvertrauten Coubertinschen Ideale verraten und auf ihren Trümmern ein
Wirtschaftsunternehmen aufgebaut hat , kreiden ihm die Briten an .[. . . ]
Der Spanier-Hansl kümmert sich wenigstens .
Um ( sich )m2 und auch um seine Kumpels .
(ID: 558x595); (Calc-Prob:0)
Auch beim Nato-Verbündeten Türkei , in die ( Außenminister Klaus Kinkel )m1 in der
kommenden Woche reisen wird , werde unvermindert gefoltert .[. . . ]
Immer mehr Menschen “verschwinden “oder werden von “offenkundig staatlich geduldeten Todesschwadronen “ermordet , sagte Deile .
Er appellierte an Kinkel , ( sich )m2 für die politischen Häftlinge einzusetzen .
(ID: 5023x5031); (Calc-Prob:0)
Er appellierte an ( Kinkel )m1 , ( sich )m2 für die politischen Häftlinge einzusetzen .
SEITE 8
(ID: 5030x5031); (Calc-Prob:7)
In example (35a), the output of SUCRE proposes to classify the markable ihm and the markable sich
as coreferent (what actually has not been done), although there are several sentences between them and
ihm obviously does not constitute the subject. In (35b), the subject in m1 , Außenminister Klaus Kinkel,
89
is repeated several sentences later as Kinkel, adjacent to m2 . These proposals are very surprising, as one
prefilter feature (no. 4 in (4.1.2)) excludes such links to be classified at all. The clarification for this
confusion is that there are two kinds of false negatives (FN) based on the architecture of SUCRE (cf.
(3.2)). First, the links created by the link generator are filtered (cf. (4.1.2)) and afterwards they are used
for training/testing. Now, one can use the term false negatives (FN) as a group of instances that have
been misclassified as negative (i.e. disreferent).
Another usage of false negatives (FN) comes up after the clustering step, where SUCRE uses bestfirst clustering to create coreference chains out of the pairwise decisions of the aforementioned classifier.
Now, the system might consider two markables mi and mj to be disreferent, because they do not occur in
the same predicted cluster. This prediction is compared with the true partition (gold-standard coreference
information). If mi and mj belong to the same true cluster, they should have been clustered (rather than
classified) as coreferent. Thus, the link connecting mi and mj is an instance of this usage of false
negatives (FN).
And this is exactly the case in example (35a) and (35b) as SUCRE’s output of the false negatives starts
after the clustering step and corresponds to the comparison of predicted partition and true partition. In
order to improve the classification part of SUCRE, its output has to be restricted to the links that are
created for the classifier’s input.
Example (35c) reveals another problem. It concerns complex syntactic structure. Here, there is an
infinitival embedded clause containing the second markable, sich. This clause functions as argument to
the finite main verb appellierte. This kind of infinite clause has no subject and therefore, there is no
explicit antecedent for the reflexive pronoun. In most syntactic theories, the embedded infinite main verb
einzusetzen nevertheless needs a subject. Most of the time, this subject is the subject or direct/indirect
object of the superordinated (matrix) clause. This is the case when the infinite clause is an argument of a
so-called control verb. This group of verbs is split up into verbs that enable the control of subjects and
those that enables the control of objects. Example (36) shows some examples of control:
(36)
a.
Peteri verspricht Mariaj , sichi umzudrehen. (→ subject control verb)
b.
Mariaj bittet Peteri , sichi umzudrehen. (→ direct object control verb)
c.
Mariaj empfiehlt Peteri , sichi umzudrehen. (→ indirect object control verb)
d.
Petersi Versuch, sichi umzudrehen, missfiel Mariaj . (→ deverbal subject control noun)
Example (36b) shows the same pattern of direct object control as in (35c). The verb versprechen
(cf. example (36a)) on the other hand enables subject control whereas the verb empfehlen (cf. example
(36c)) enables indirect object control. A special kind of control are deverbal nouns that are derived
from control verbs. In (36d) the noun Versuch is derived from the control verb versuchen, which enables
subject control.
One way of solving the problem of false negatives concerning reflexive pronouns in embedded clauses
governed by control verbs is to check whether there is a intermediate dependency relation between a
control verb or a deverbal noun derived from a control verb and the reflexive pronoun (i.e. a dependency
relation between the control verb and the main verb governing the reflexive pronoun). This can be done
by using a list of subject control verbs and a list of object control verbs and then assign coreference to
the markable if it is annotated with the right grammatical function and disreference otherwise.
4.4.2. Semantic Relations between the markables
This group is based on the idea that two markables can corefer although they do not share any syntactical
or string-based features. This is often the case, when both markables have common nouns as heads that
do not substring match with each other. For implementing features that check for a specific semantic
relation, there has to be a feature access to an appropriate knowledge base for ontological information
like in GermaNet.
90
The examples below show some false negatives that occur because of such an implicit relation between
two common nouns that cannot be captured string-based:
(37)
a.
Als ein Bekannter ( des Hundehalters )m1 versuchte , die Wohnung zu räumen , wurde er
gebissen und flüchtete ins Wohnzimmer zur Gattin ( des Besitzers )m2 .
(ID: 335x340); (Calc-Prob:10)
b.
Der 24jährige Besitzer hatte dem Tier am Vortag ( sein zukünftiges Heim )m1 gezeigt .[. . . ]
Als ein Bekannter des Hundehalters versuchte , ( die Wohnung )m2 zu räumen , wurde er
gebissen und flüchtete ins Wohnzimmer zur Gattin des Besitzers .
(ID: 329x337); (Calc-Prob:5)
c.
Der 24jährige Besitzer hatte ( dem Tier )m1 am Vortag sein zukünftiges Heim gezeigt .
niemanden mehr durchließ .
Als ein Bekannter des Hundehalters versuchte , die Wohnung zu räumen , wurde er gebissen und flüchtete ins Wohnzimmer zur Gattin des Besitzers .
(ID: 326x331); (Calc-Prob:5)
d.
Daraufhin weigerte sich ( Daimler-Benz )m1 , ihn nach Abschluß seiner Ausbildung als
Schlosser zu übernehmen - der Artikel sei nämlich ein “Bekenntnis zur Gewalt “.
Es sei zu befürchten , daß der junge Mann in bestimmten Situationen auch im Betrieb
Gewalt befürworten werde , argumentierte ( das Unternehmen )m2 .
Das Bundesarbeitsgericht teilte den Standpunkt und wies die Klage auf Einstellung ab .
(ID: 6010x6022); (Calc-Prob:7)
e.
Aufgrund des gleichen Paragraphen gibt es in ( Warschau )m1 inzwischen Tausende kommunaler Sozialmieter , die ihre Zahlungen eingestellt haben - Kündigung droht ihnen
deshalb nicht .
( Die Stadt )m2 hat inzwischen sogar schon private Schuldenjäger beauftragt , die Mieten
einzutreiben .
Jemanden vor die Tür setzen dürfen die allerdings auch nicht .
(ID: 511x519); (Calc-Prob:5)
In example (37a), the markables des Hundehalters and des Besitzers should have been coreferred,
since Hundehalters is a compound word with the head Halter, which is a synonym to Besitzer.
Thus, a possible feature has to check the semantic relation of synonymy with respect to the heads
of the markables (possibly a compound word). If there is a positive relation, the feature returns TRUE,
otherwise FALSE.
Another kind of semantic relationship is given in example (37b) and (37c). Here, the pairs of the
respective markable heads, <Heim,Wohnung> as well as <Tier,Hund> show the semantic relation of
hyponymy or hypernymy: an apartment (Wohnung) is a type of home (Heim) and one type of animal
(Tier) is a dog (Hund).
Therefore, one can implement another feature that checks whether the markables’ heads are in an hyponymy relation with each other. If there is a positive relation, again the feature returns TRUE, otherwise
FALSE.
The markable pairs in (37d) and (37e) contain a proper name (Daimler-Benz or Warschau) and a
common noun that groups this proper name in a special category (e.g. a company (Unternehmen) or a
city (Stadt)).
91
In the case that the ontological knowledge base (e.g. GermaNet) does not provide any information
about the given proper name, for this purpose, there will be need for a named entity recognizer (which
is not available in the TüBa-D/Z corpus) and for the aforementioned knowledge about synonymy and
hyponymy. Then, an implementation of a feature would return TRUE in the case that one markable is
a named entity and its category (predicted by NER) is (a synonym/hyponym of) the head of the other
markable, and FALSE otherwise.
4.4.3. Both markables contain a common, possibly appositive proper
name
This group arose because of the fact that proper names (in particular surnames) are very often expressed
appositively to a head (a common noun or a first name). However, the respective features in the given
feature set (no. 11 - 13 in (4.1.3)) just check for proper names in the markables’ heads.
Some false negatives for this problem are given in (38):
(38)
92
a.
( Ramon Valle-Inclan )m1 [. . . ]
) , bleibt ( Valle-Inclan , der Exzentriker der Moderne )m2 , auch weiterhin ein Geheimtip
.
b.
Als mir Anfang des Jahres ( Martin Flug )m1 sein Manuskript Treuhand-Poker - Die Mechanismen des Ausverkaufs auf den Verlagstisch legte , schien mir manches recht überzogen
, und ich bat ihn , mir die haarsträubendsten Geschichten mit Dokumenten zu belegen , da
ich wenig Lust verspürte , gleich nach Erscheinen verklagt zu werden .[. . . ]
Doch in jedem einzelnen Fall konnte er mich von der Sauberkeit seiner Recherche überzeugen , und die Tatsache , daß bis heute - drei Monate nach der Erstauslieferung - keine Einstweiligen Verfügungen bei uns herniedergegangen sind , scheinen ihm zusätzlich recht zu
geben .
Heute würde ich wahrscheinlich nicht mehr so skeptisch fragen , denn das , was ich in den
letzten Wochen in meiner unmittelbaren Umgebung , der Ostberliner Verlagsszene , erlebt
habe , stellt ( Flugs )m2 Report noch um einiges in den Schatten .
(ID: 9647x9680); (Calc-Prob:9)
c.
Im November 1990 sollen sie ( den Angolaner Amadeu Antonio )m1 so zusammengeschlagen haben , daß der 28jährige starb .[. . . ]
Sie wehrte sich beredt gegen die Verteidiger , die ihrer Darstellung nicht glauben wollten
.
Durch ihre Schilderung sind die Vorgänge , die ( Amadeu Antonio )m2 das Leben kosteten
, noch weniger klar als vorher .
(ID: 5116x5211); (Calc-Prob:5)
d.
Daran ist dann offensichtlich auch der Versuch von Schmidt-Braul gescheitert ,
( den Luftfahrtunternehmer Dornier )m1 für die Übernahme zu gewinnen .
( Silvius Dornier )m2 wußte aus seinen zähen Verhandlungen mit Daimler-Benz , denen er
einen Großteil seiner Aktien verkauft hatte , daß auch bei der Treuhand etwas rauszuschlagen wäre .
Wenn sie schon die teure Immobilie einsackte , sollte sie wenigstens noch den Verlag
entschulden und etwas Geld für die Anschubfinanzierung locker machen .
(ID: 9868x9870); (Calc-Prob:7)
e.
f.
( Frieda Mermet )m1 heißt seine Angebetete , und eine einfache Wäscherin in einem Schweizer
Bergkaff ist sie .
Beim Walser-Ensemble , das die Briefe an ( die » Liebe Frau Mermet )m2 « auf die Bühne
bringt , hockt die Holde tatsächlich im Dichterolymp .
Ein goldener Bilderrahmen , ganz Gelsenkirchener Barock , schwebt über der Szene .
(ID: 13022x13029); (Calc-Prob:12)
“Das ist die alte Übung , Herr Präsident , auch wir müssen umlernen “, mußte sich Bundeskanzler Helmut Kohl auf seiner Pressekonferenz mit ( dem russischen Präsidenten Boris
Jelzin )m1 am Mittwoch in München verteidigen .
Er hatte ( seinen neuen “Duzfreund “Boris )m2 nämlich aus Versehen als sowjetischen
Präsidenten bezeichnet und wurde daraufhin auch prompt von ihm unterbrochen .
WAFFENSCHMUGGEL IM KLEINEN RAHMEN
(ID: 10091x10097); (Calc-Prob:8)
In example (38a), the head of m1 is the forename Ramon whereas the surname Valle-Inclan is the last
word in m1 but the head in m2 . The same problem is shown in (38b) but here, the surname is inflected,
so that a simple exact string match would not have sufficient predictive power.
A similar example is given in (38c). Here, again the head of m2 is appositively given in m1 but this
time, they share a common last word.
One possibility of solving this problem is a modification of the given features that check for proper
names (cf. no. 11 - 13 in (4.1.3)) such that they check whether the head of one markable is a proper
name and if this is true, then if there is a substring match between this head and any word of the other
markable. If both is the case, the feature returns TRUE and otherwise FALSE.
In (38d) and (38e), there are more complex cases, because the NE-heads of m2 (or m1 ) are not present
in m1 (or m2 ) at all. In this case, the feature sketched above has to be modified to simply check the head
and the last word of the markable for being a proper name and occurring in the other markable. But this
is not possible, if the common proper name is neither the head, the first or the last word. Admittedly,
one can check for the number of common words, but there is no possibility to check for all common
words whether they are proper names. Here, the pseudo-language should be extended. For instance, the
implementation of bounded variables that provide word attributes (e.g. f0 for part-of-speech) could solve
that problem (cf. example (39))
(39)
seqmatch(X,m1a)&&seqmatch(X,m2a)&&(X.f0==NE)
Another case is shown in example (38f): both markables’ heads are common nouns (Präsidenten vs.
Duzfreund) that do not share any common features. They are even in no semantic relation. This case can
be captured by the aforementioned feature because at least one proper name is one markable’s last word.
93
94
CHAPTER 5
Implementation of the features
In this chapter, the features that are proposed in the linguistic analysis in chapter 4 as well as in some
approaches presented in (2.5) are implemented in SUCRE’s regular link feature definition language.
In (5.1), all features that are proposed in the analysis of the false positives are implemented in their
groups (as arranged in chapter 4). In (5.2), the features of the false negatives analysis for which an
implementation is possible are coded. Finally, in (5.3), the implementations of the four features from the
German approaches in (2.5) are presented.
For each feature, the idea and its function are described briefly. Further descriptions on syntax and
functions of the feature definition language are presented in appendix B. Every feature is exemplified by
two instances, mostly a coreferent and a disreferent one. Finally, the feature is evaluated as a modification
of the baseline feature set. However, sometimes this evaluation is misleading, as an improvement with
respect to the baseline does not always mean a positive contribution to the final feature set. The reason for
this is based on the dependencies of a link feature to others. As the final feature set contains largely nonoriginal features, the dependencies to final features might be quite different as to the original features.
However, sometimes the trend is steady.
At the end of each group, the best performing features resp. a combination of them is chosen for the
final feature set.
5.1. Features for False Positives
5.1.1. The second markable is indefinite
This group addresses the indefiniteness of m2 in the case of exact string match and substring match (cf.
(4.3.1)).
Indef1 : The feature Indef1 extends the string matching feature by disallowing m2 to start with ein:
a) {seqmatch(m1h,m2h)&&(strmatchlc(m2b,ein)==0)}
b) {strmatchlc(m1h,m2h)&&(strmatchlc(m2b,ein)==0)}
The element (strmatchlc(m2b,ein)==0) checks, whether there is a substring match of the
first word in m2 and the string ein, thereby capturing all indefinite articles like ein, eine, einer,
eines, einem, . . . . If this is true, then strmatchlc(m2b,ein) is 1 and the equation 1==0
returns FALSE.
In (40), two examples are given:
(40)
a.
b.
<Ein Mannm1 , Der Mannm2 >
<Die Fraum1 , Eine Fraum2 >
95
Chapter 5. Implementation of the features
In (40a), m2 does not start with an indefinite article and the heads of m1 and m2 string match.
Therefore, the feature Indef1 returns TRUE.
In (40b), m2 starts with an indefinite article and thus, although the heads of m1 and m2 string
match, the feature Indef1 returns FALSE.
Table 5.1 shows the result of the comparison of the new baseline against the addition of Indef1 .
Feature set
New baseline
+ Indef1
Difference:
MUC-B3 -f-score
66.44%
68.39%
1.95%
Table 5.1.: Evaluation of the feature Indef1
Indef2 : The feature Indef2 extends the feature Indef1 by the check for a keyword like andere, that
indicates disreference:
a) {seqmatch(m1h,m2h)&&(strmatchlc(m2b,ein)==0)&&
(strmatchlc(andere,m2a)==0)}
b) {strmatchlc(m1h,m2h)&&(strmatchlc(m2b,ein)==0)&&
(strmatchlc(andere,m2a)==0)}
The element (strmatchlc(andere,m2a)==0) checks, whether there is a substring match
between the words in m2 and the string andere. If this feature performs well, one can decide to
extend this feature by further keywords like neue, alternative, . . . .
(41)
a.
b.
<Ein Mannm1 , Der Mannm2 >
<Ein Mannm1 , Der andere Mannm2 >
As shown above, example (41a) returns TRUE for the elements in Indef1 . The item
(strmatchlc(andere,m2a)==0) returns TRUE as the predicate strmatchlc(andere,m2a)
returns FALSE and thus 0==0 is true. Therefore, the overall feature returns TRUE.
In example (41b), strmatchlc(andere,m2a) returns TRUE and therefore 1==0 gets false
and the overall feature returns FALSE.
Feature set
New baseline
+ Indef2
Difference:
MUC-B3 -f-score
66.44%
66.81%
0.37%
Indef3 : The feature Indef3 extends the feature Indef1 by the possibility of m1 being a headline, i.e. it
lacks a determiner:
a) {seqmatch(m1h,m2h)&&((strmatchlc(m2b,ein)==0)||
(m1b.rewtag != rewtags.DET))}
96
b) {strmatchlc(m1h,m2h)&&((strmatchlc(m2b,ein)==0)||
(m1b.rewtag != rewtags.DET))}
The element ((strmatchlc(m2b,ein)==0)||(m1b.rewtag != rewtags.DET)) has
to be understood as an implication: if m1 starts with a determiner, then m2 may not start with an
indefinite article. As headlines often lack determiners, the repetition in the text below might corefer with it, although it starts with an indefinite article.
(42)
a.
b.
<Neue Studie zum Rauchenm1 , Eine neue Studiem2 >
<Eine neue Studiem1 , Eine Studiem2 >
The example in (42a) returns TRUE as m1 lacks a determiner, whereas the example in (42b) returns
FALSE due to the fact that both markables contain an indefinite article (i.e. a determiner).
Feature set
New baseline
+ Indef3
Difference:
MUC-B3 -f-score
66.44%
67.86%
1.42%
Indef4 : The feature Indef4 extends the feature Indef1 by the case that m1 and m2 are proper names and
m2 is adding an indefinite article journalistically:
a) {seqmatch(m1h,m2h)&&((strmatchlc(m2b,ein)==0)||(m2h.f0==f0.NE))}
b) {strmatchlc(m1h,m2h)&&((strmatchlc(m2b,ein)==0)||(m2h.f0==f0.NE))}
The element ((strmatchlc(m2b,ein)==0)||(m2h.f0==f0.NE)) has to be understood
as an implication: if m2 starts with an indefinite article, then it has to be a proper name. Proper
names usually corefer, if there is a string match.
(43)
a.
b.
<Robert Musilm1 , eines Robert Musilm2 >
<Ein Mannm1 , eines Mannesm2 >
The example in (43a) returns TRUE as m2 starts with an indefinite article but also contains a proper
name as its head. On the other hand, in (43b), there is no proper name and thus, the feature returns
FALSE.
Feature set
New baseline
+ Indef4
Difference:
MUC-B3 -f-score
66.44%
68.43%
1.99%
97
Indef5 : The feature Indef5 allows m2 to start with an indefinite article in the case that all words in both
markables are the same. As shown in example (10c), if both markables are exactly the same, a
coreference relation can also hold if they start with an indefinite article.
a) {seqmatch(m1h,m2h)&&((strmatchlc(m2b,ein)==0)||
(seqmatch(m1a,m2a)==max(m1.wrdlen,m2.wrdlen)))}
b) {strmatchlc(m1h,m2h)&&((strmatchlc(m2b,ein)==0)||
(seqmatch(m1a,m2a)==max(m1.wrdlen,m2.wrdlen)))}
The predicate max(m1.wrdlen,m2.wrdlen) returns the length of the longer markable. The
predicate seqmatch(m1a,m2a) returns the number of common words in m1 and m2 . As one
markable might be included in the other one, the idea is to compare the number of common
words against the length of the longer markable (in the case that they differ). Again, the element ((strmatchlc(m2b,ein)==0)||(seqmatch(m1a,m2a)==
max(m1.wrdlen,m2.wrdlen))) has to be understood as an implication: if m2 starts with
an indefinite article, then m1 and m2 has to be exactly the same.
(44)
a.
b.
<Ein generelles Verbotm1 , Ein generelles Verbotm2 >
<Ein generelles Verbotm1 , Ein absolutes Verbotm2 >
The example in (44a) returns TRUE as m2 starts with an indefinite article, however m1 and m2
are exactly the same. With example (44b), the feature returns FALSE, as m1 and m2 differ in one
word: max(m1.wrdlen,m2.wrdlen) returns 3 and seqmatch(m1a,m2a) returns 2, thus
2==3 is FALSE and so the overall feature result.
Feature set
New baseline
+ Indef5
Difference:
MUC-B3 -f-score
66.44%
68.33%
1.89%
The selection of the final features
One have to keep in mind that the features Indef2 up to Indef5 are all based on Indef1 . They are grouped
together as all of them constitute a kind of exception for the string match or for a disreference based on
an indefinite anaphora and are therefore intended to relativize the prediction of Indef1 . So, if one of the
features is worse than Indef1 (68.39%), it does not perform sufficiently. This is the case for the features
Indef2 (66.81%), Indef3 (67.86%) and Indef5 (68.33%). The feature Indef4 performs slightly better
(68.43%). As it also contains Indef1 , Indef4 is selected as final feature.
5.1.2. Wrong assignment of a relative pronoun
This group concerns the problem of the relative pronouns. Given that m1 or m2 is a relative pronoun its
anaphor or antecedent has to be a specific pronoun or at a specific position (cf. (4.3.2)).
Relpron1 : The feature Relpron1 is the following implication: if m1 is a relative pronoun, then it has
to constitute the subject and m2 has to be a reflexive pronoun, a possessive pronoun or a relative
pronoun too:
98
a) {(((m2h.f0==f0.PRF)||(m2h.f0==f0.PPOS∼)||(m2h.f0==f0.PREL∼))&&
(m1h.rewtag==rewtags.SUBJ))||(m1h.f0!=f0.PRELS)}
The term PPOS∼ stands for attributive and substituting possessive pronouns.
(45)
a.
Ein Hund, derm1 seinm2 Herrchen beißt . . .
b.
Ein Hund, derm1 (sein Herrchen)m2 beißt . . .
Example (45a) returns TRUE as m1 is a relative pronoun which constitutes the subject of the verb
beißen and m2 is an attributive possessive pronoun. On the other hand, in (45b) the head of m2 is
a common noun and thus, the feature returns FALSE.
Table 5.6 shows the result of the comparison of the new baseline against the addition of Relpron1 .
Although there is a deterioration of 3.11%, this feature is considered for the final feature set, as it
contributes to the final score (i.e. a removal would worsen the score about 0.09% (table 5.7)).
Feature set
New baseline
+ Relpron1
Difference:
MUC-B3 -f-score
66.44%
63.33%
-3.11%
Table 5.6.: Evaluation of the feature Relpron1
Feature set
Final set
- Relpron1
Difference:
MUC-B3 -f-score
73.06%
72.97%
-0.09%
Table 5.7.: The final set without Relpron1
Relpron2 : The feature Relpron2 is the following implication: if m2 is a relative pronoun, then the
distance between m2 and m1 has to be less or equal to 3. The reason for the number 3 is that usually at least the comma separates the relative pronoun from its preceding antecedent. Sometimes,
the relative pronoun is within a prepositional phrase and then, there are two tokens between the
antecedent and m2 . Thus, the distance between m1 and m2 is at most 3 in terms of tokens.
a) {(abs(m2h.txtpos-m1e.txtpos)<=3)||(m2h.f0!=f0.PRELS)}
(46)
a.
(Ein Hund)m1 , derm2 . . .
b.
Zu (diesem Genre)m1 gehören neben den Wunderworten (diem2 . . .
In example (46a), m2 is a relative pronoun and its antecedent is two tokens apart. Therefore,
Relpron2 returns TRUE. The markable m2 in (46b) however is six tokens apart. Thus, the feature
returns FALSE.
Feature set
New baseline
+ Relpron2
Difference:
MUC-B3 -f-score
66.44%
65.26%
-1.18%
99
Relpron3 : The feature Relpron3 tries to capture the possibility of m2 referring to an embedded noun
phrase. In example (12c) the noun phrase Tausende kommunaler Sozialmieter corefers with the
suceeding relative pronoun die. However, m1 was set to the embedded phrase kommunaler Sozialmieter. As this is not always wrong, this feature constitutes just a proposal for solving a problem like
this.
a) {(m2h.f0!=f0.PRELS)||(m1h.rewtag!=rewtags.GMOD)}
The feature says: if m2 is a relative pronoun, then m1 may not be a genitive modifier.
Feature set
New baseline
+ Relpron3
Difference:
MUC-B3 -f-score
66.44%
64.28%
-2.16%
Although Relpron1 shows a bad performance in the feature set of the new baseline, it contributes to
the final feature set, as a removal worsens the final score slightly. Relpron2 and Relpron3 show both
a deterioration that is also visible with the final feature set. Therefore, Relpron1 is the only feature
selected for the final feature set.
5.1.3. Reflexive pronouns and non-subjects
Reflex1 : This single group concerns the case that a reflexive pronoun is related to a non-subject. Examples of such misclassifications are given in (4.3.4). A simple way of solving this (and thereby
disregarding complex structures like syntactic control) is the following feature:
a) {((m2h.f0==f0.PRF)&&(m1h.rewtag == rewtags.SUBJ))||
((m1h.f0==f0.PRF)&&(m2h.rewtag == rewtags.SUBJ))}
The feature returns TRUE in the case that m2 is a reflexive pronoun and m1 is a subject or vice
versa (for the case of cataphoric reflexive pronouns).
Table 5.10 shows the result of the comparison of the new baseline against the addition of Reflex1 .
Although, the addition to the base feature set shows a slight improvement, the feature does not
perform well with the final features (cf. table 5.11).
Feature set
New baseline
+ Reflex1
Difference:
MUC-B3 -f-score
66.44%
66.69%
0.25%
Table 5.10.: Evaluation of the feature Reflex1
Feature set
Final set
+ Reflex1
Difference:
MUC-B3 -f-score
73.06%
72.79%
-0.27%
Table 5.11.: The final set with Reflex1
As an addition of Reflex1 to the final feature set would result in an deterioration of 0.27%, there will be
no features selected from this group to the final feature set.
100
5.1.4. Problems with substring-matches
The features that are implemented in this group are based on an important issue - the German morphology. As a simple string matching feature often does not suffice, two further topics will be focussed in this
group: inflectional suffixes and German compound words. The latter differ from English as a German
compound word is always realized in one string (cf. (4.3.5)).
For some German compound words like Schäferhundm1 vs. Hundm2 two new functions have been
added to the function set of SUCRE’s pseudo language (cf. appendix B) in order to increase expressiveness: bswitch (bswitchlc) and eswitch (eswitchlc).
bswitchlc(m1h,m2h) returns true in the case that the first markable’s head begins with the second
markable’s head (e.g. “Hundehalter”m1h - “Hund”m2h )
eswitchlc(m1h,m2h) returns true in the case that the first markable’s head ends with the second
markable’s head (e.g. “Schäferhund”m1h - “Hund”m2h )
The following features base on the idea that the substring matching feature strmatch/2 is only
meaningful, if the exact string matching feature seqmatch/2 does not take effect.
Substr1 : This feature checks, whether the head of m2 is the compound head of the head of m1 (i.e.
whether m1h ends with m2h ). It is a modification of the feature Indef1 , which has been implemented in (5.1.1).
a) {strmatchlc(m1h,m2h)&&(strmatchlc(m2b,ein)==0)
&&eswitchlc(m1h,m2h)}
Substr1 returns TRUE, if the head of m1 ends with the head of m2 and m2 does not start with an
indefinite article. It returns FALSE otherwise. The reason, why this feature ignores the other way
round (i.e. m2h ending with m1h ), is that repetitions of a reference are usually less informative
than preceding ones.
(47)
a.
b.
<Ein Schäferhundm1 , Der Hundm2 >
<Ein Hundehalterm1 , Der Hundm2 >
With example (47a), Substr1 returns TRUE, as the head of m1 ends with the head of m2 and there
is no indefinite article initializing m2 . In example (47b), the head of m1 starts with the head of
m2 . So, the feature returns FALSE.
Table 5.12 shows the result of the comparison of the new baseline against the addition of Substr1 .
Feature set
New baseline
+ Substr1
Difference:
MUC-B3 -f-score
66.44%
68.94
2.5%
Table 5.12.: Evaluation of the feature Substr1
Substr2 : This feature is addressed to inflectional suffixes. It claims the premise that the markables’
heads case-sensitively substring match but do not exact string match and m2 is definite. There are
two possibilities: m1h is an inflected version of m2h or vice versa. For instance, if m1h is inflected,
then it begins with m2h and ends with one of three possible inflectional suffixes: s, es and e. To be
able to exclude an inflected compound word, the threshold for the edit distance between m1h and
m2h is set to 3:
101
a) {strmatch(m1h,m2h)&&(seqmatch(m1h,m2h)==0)
&&(strmatchlc(m2b,ein)==0)&&(editdist(m1h,m2h)<3)
&&
((bswitchlc(m1h,m2h)&&(eswitch(m1h,s)||eswitch(m1h,es)||
eswitch(m1h,e)))
|| (bswitchlc(m2h,m1h)&&(eswitch(m2h,s)||eswitch(m2h,es)||
eswitch(m2h,e))) )}
(48)
a.
b.
<Ein Hundm1 , des Hundesm2 >
<Der Hundm1 , des Hundehaltersm2 >
With the markable pair in (48a), the feature returns TRUE as m2h starts with m1h and ends with
the inflectional suffix es. Although, with respect to the suffix s, this is the same case in example
(48b), the feature returns FALSE, as the edit distance between m1h and m2h is 8 (rather than being
below 3).
However, this feature ignores compound words and the case in which both markables are (differently) inflected.
If an option for compound words (i.e. eswitchlc(m1h,m2h)) is offered, the performance is
worse than in a single check for inflection (cf. table 5.14).
Feature set
New baseline
+ Substr2
Difference:
MUC-B3 -f-score
66.44%
70.37%
3.93%
Feature set
New baseline
+ Substr2 + c.w.
Difference:
MUC-B3 -f-score
66.44%
69.82%
3.38%
Table 5.14.: Substr2 with compound words
Substr3 : The feature Substr3 is an extension of the previous feature Substr2 : beside the aforementioned inflectional suffixes, the feature allows the suffixes chen and lein, which function as diminutive (a small version of the respective common noun). Thus, the edit distance threshold is set to
5:
a) {strmatch(m1h,m2h)&&(seqmatch(m1h,m2h)==0)
&&(strmatchlc(m2b,ein)==0)&&(editdist(m1h,m2h)<5)
&&
eswitch(m1h,e)||eswitch(m1h,chen)||eswitch(m1h,lein)))
|| (bswitchlc(m2h,m1h)&&(eswitch(m2h,s)||eswitch(m2h,es)||
eswitch(m2h,e)||eswitch(m2h,chen)||eswitch(m2h,lein))) )}
Feature set
New baseline
+ Substr3
Difference:
MUC-B3 -f-score
66.44%
70.30%
3.86%
102
Substr4 : This feature is based on example (18c), where the noun Arm is related to the noun Marmorpalast because of a substring match. The idea is that if the length of one markable is less or equal
to 3, then the other markable may not have more than double length:
a) {strmatchlc(m1h,m2h)&&(strmatchlc(m2b,ein)==0)&&
(((editdist(m1h,#)>3)||(editdist(m2h,#)<=2*editdist(m1h,#)))&&
((editdist(m2h,#)>3)||(editdist(m1h,#)<=2*editdist(m2h,#))))}
The feature Substr4 returns TRUE, if the heads of m1 and m2 substring match and the length of
mi is bigger than 3 or the length of mj is less or equal to the double length of mi . For computing
the length of a markable, its edit distance to a dummy like “#” that definitely cannot occur in the
markable is used. The element ((editdist(m1h,#)>3)||(editdist(m2h,#)<=
2*editdist(m1h,#))) is the implication: if the length of m1h is less or equal to 3, then m2h
has to be less or equal to the double length of m1h .
(49)
a.
<Der Armm1 , des Armesm2 >
b.
<Der Armm1 , der Marmorpalastm2 >
In both examples in (49), the heads of m1 and m2 substring match. With example (49a), Substr4
returns TRUE, as the length of m2h is not greater than twice the length of m1h , given that m1h has
the length 3. On the other hand, in (49b) m2h has the length 12, whereas m1h has the length 3.
Therefore, Substr4 returns FALSE.
Feature set
New baseline
+ Substr4
Difference:
MUC-B3 -f-score
66.44%
67.63%
1.19%
Substr5 : As pronouns are a very small closed class of words, a string match between them is quite
meaningless. The feature Substr5 extends Indef1 by the condition, that neither m1 nor m2 is a
pronoun:
a) {seqmatch(m1h,m2h)&&(strmatchlc(m2b,ein)==0)&&
((m1h.f0==f0.P∼)==0)&&((m2h.f0==f0.P∼)==0)}
b) {strmatchlc(m1h,m2h)&&(strmatchlc(m2b,ein)==0)&&
((m1h.f0==f0.P∼)==0)&&((m2h.f0==f0.P∼)==0)}
Here, the feature Substr5 is divided into two parts: one for substring match and one for exact string
match. The feature Substr5a returns TRUE, if the heads of m1 and m2 exact string match and are
no pronouns and FALSE otherwise. Substr5b checks for the same with the substring matching
feature.
(50)
a.
<Die Mutterm1 , Die Mutterm2 >
b.
<1,98 mm1 , seinemm2 >
103
In example (50a), m1 and m2 are exactly the same. Thus, they substring match and exact string
match and neither is a pronoun. Thus, both Substr5a and Substr5b return TRUE. In example (50b),
the head of m1 is an abbreviation for the unit Meter and this m is part of the inflectional suffix em
of the possessive pronoun sein. So, Substr5a returns FALSE, as there is no exact string match and
Substr5b returns FALSE, as m2 is a pronoun.
Feature set
New baseline
+ Substr5
Difference:
MUC-B3 -f-score
66.44%
67.64%
1.20%
Substr6 : This feature is based on a discovery in example (20c), where the headline noun BUNDESRAT
does not exact string match with the noun Bundesrat. Substr6 constitutes the simple modification
of case-sensitive exact string match to case-insensitive exact string match. So, instead of feature
{seqmatch(m1h,m2h)&&(strmatchlc(m2b,ein)==0)}, Substr6 proposes:
a) {seqmatchlc(m1h,m2h)&&(strmatchlc(m2b,ein)==0)}
Thereby, a markable pair like the one in (20c) would be handled as exact strings.
There is a slight improvement concerning the new baseline, but a deterioration in the case of the
final feature set.
Feature set
New baseline
+ Substr6
Difference:
MUC-B3 -f-score
66.44%
66.45%
0.01%
Feature set
Final set
+ Substr6
Difference:
MUC-B3 -f-score
73.06%
72.94%
-0.12%
Table 5.19.: The final set with Substr6
This group has presented six features concerning string match. The feature Substr1 shows a less improvement than with Substr2 . The idea of diminutives in Substr3 does not work well enough, as Substr2
performs better. Although Substr4 improves the score, it is not compatible with Substr2 and as this feature outperforms Substr4 , Substr4 will be disregarded. Feature Substr5 can be combined with Substr2
and also show good performance. The last feature, Substr6 , show a minimal improvement in the new
baseline, but it worsens the final score.
Therefore, for the final feature set, Substr2 and Substr5 are selected.
5.1.5. “Es“(“it“) as expletive pronoun in German
This group addresses the pronoun es, which is very often non-referring (cf. (4.3.6)). As there are no clues
for figuring out whether an es-pronoun refers or not, the following feature just checks for the presence of
such pronouns and returns TRUE, if they are not present.
104
Es1 : The feature Es1 checks, whether both markables are different from es. If so, it returns TRUE,
otherwise FALSE:
a) {(seqmatchlc(m1h,es)==0)&&(seqmatchlc(m2h,es)==0)}
This feature is very restrictive and disallows any occurence of the pronoun es; although it might be
referential.
(51)
a.
b.
<Ein Autom1 , Das Autom2 >
<Das Autom1 , esm2 >
In example (51a), neither m1 nor m2 equals es. Thus, the feature returns TRUE. In (51b), m2
might corefer with m1 , however the feature returns FALSE.
Table 5.20 shows the result of the comparison of the new baseline against the addition of Es1 .
Feature set
New baseline
+ Es1
Difference:
MUC-B3 -f-score
66.44%
66.01%
-0.43%
Table 5.20.: Evaluation of the feature Es1
Es2 : The feature Es2 checks, whether one markable is different from es. If so, it returns TRUE, otherwise FALSE:
a) {(seqmatchlc(m1h,es)==0)||(seqmatchlc(m2h,es)==0)}
Now, the feature is less restrictive and allows one markable to be equal to es. However, if both
markables equal es, Es2 returns FALSE.
(52)
a.
b.
<Ein Autom1 , esm2 >
<Esm1 , esm2 >
Now, example (52a) is accepted and the feature returns TRUE. In (52b), both markables equal es
and the feature returns FALSE.
Table 5.21 shows the result of the comparison of the new baseline against the addition of Es2 .
Feature set
New baseline
+ Es2
Difference:
MUC-B3 -f-score
66.44%
66.44%
0.0%
Es3 : In contrast to Es1 and Es2 , Es3 is distributed over the personal pronoun features of m1 and m2 . The
modification of the personal pronoun features restricts them in as much as the personal pronoun
features return TRUE if the respective personal pronoun differs from es:
The feature Es3 checks, whether one markable is different from es. If so, it returns TRUE, otherwise FALSE:
105
a) {(m1h.f0==f0.PPER)&&(seqmatchlc(m1h,es)==0)}
b) {(m2h.f0==f0.PPER)&&(seqmatchlc(m2h,es)==0)}
Table 5.22 shows the result of the comparison of the new baseline against the addition of Es3 . This
addition shows a deterioration of 0.21%. However, this is not the case with the final feature set as
a removal also worsen the score (cf. table 5.23).
Feature set
New baseline
+ Es3
Difference:
MUC-B3 -f-score
66.44%
66.23%
-0.21%
Feature set
Final set
- Es3
Difference:
MUC-B3 -f-score
73.06%
72.76%
-0.3%
Table 5.23.: The final set without Es3
The addition of feature Es1 shows a deterioration of the new baseline. As this is also the case with the
final feature set, Es1 will be ignored. Es2 does not show any effect on the new baseline and thus, it will
not be considered as a final feature. Although, there is also a deterioration with Es3 , it will be selected as
a final feature, since it contributes to the final score (as a removal worsens the final score about 0.3%).
5.1.6. Units, currencies, month names, weekdays and the like
This group addresses some keywords which are very unlikely to corefer with others. Among these
keywords are units like Meter or Prozent, currencies like Dollar or Mark, month names, week days and
the like (cf. (4.3.7)).
Unit1 : This feature checks for inequality of every keyword with markable m1 . The focus on m1 is
sufficient, as this problem only occurs in cases of an exact string match. This feature is used as a
link feature (as opposed to the prefilter feature Unit2 ).
a) {((seqmatchlc(m1h,Mark))==0)&&((seqmatchlc(m1h,Meter))==0)
&&((seqmatchlc(m1h,Prozent))==0)&&((seqmatchlc(m1h,Sekunden))==0)
&&((seqmatchlc(m1h,Juli))==0)&&((seqmatchlc(m1h,Milliarden))==0)
&&((seqmatchlc(m1h,Dollar))==0)&&((seqmatchlc(m1h,Jahr))==0)
&&((seqmatchlc(m1h,Mittwoch))==0)}
The feature returns TRUE in the case that the head of m1 is neither identical to Mark, nor to Meter,
. . . , nor to Mittwoch. If m1 equals any of the keywords, the feature returns FALSE.
The set of keywords is based on the false positives described in the previous chapter. For an adaption to other corpora, corresponding keywords like Montag, Dienstag, Januar, Februar, Stunden,
Millionen, . . . have to be added. For the sake of simplicity, they are left out in this study.
Table 5.24 shows the result of the comparison of the new baseline against the addition of Unit1 .
Feature set
New baseline
+ Unit1
Difference:
MUC-B3 -f-score
66.44%
66.42%
-0.02%
Table 5.24.: Evaluation of the feature Unit1
106
Unit2 : This feature checks for equality of any keyword with markable m1 . In contrast to Unit1 , Unit2
is used as a prefilter feature.
a) {seqmatchlc(m1h,Mark) || seqmatchlc(m1h,Meter) ||
seqmatchlc(m1h,Prozent) || seqmatchlc(m1h,Sekunden) ||
seqmatchlc(m1h,Juli) || seqmatchlc(m1h,Milliarden) ||
seqmatchlc(m1h,Dollar) || seqmatchlc(m1h,Jahr) ||
seqmatchlc(m1h,Mittwoch)}
The feature returns TRUE in the case that the head of m1 is equal to one of the keywords and
FALSE otherwise.
Table 5.25 shows the result of the comparison of the new baseline against the addition of Unit2 .
Feature set
New baseline
+ Unit2
Difference:
MUC-B3 -f-score
66.44%
66.88%
0.44%
Table 5.25.: Evaluation of the feature Unit2
The evaluations in table 5.24 and 5.25 show that the usage of Unit2 (i.e. the prefilter version) performs
better than the usage of the vector feature Unit1 . Thus, Unit2 is selected as final feature for the final
prefilter feature set.
5.1.7. First markable begins with “kein“
This group concerns markables that start with the quantifiers kein or jede, as the possibilities of coreferring markables for these are very restricted (cf. (4.3.9)). The only possible anaphors are a relative
pronoun, a possessive pronoun or a reflexive pronoun.
Kein1 : This feature checks whether the anaphor is a reflexive, a relative or a possessive pronoun:
a) {((strmatchlc(m1b,kein)||strmatchlc(m1b,jede))==0)||((m2h.f0==f0.PRF)||
(m2h.f0==f0.PRELS)||(m2h.f0==f0.PPOSAT))}
This feature has the meaning of the implication: if m1 is quantified with kein or jede, then m2 has
to be a pronoun of the kinds mentioned above.
(53)
a.
(Jeder Hund)m1 , derm2 sein Herrchen beißt . . .
b.
<Keine Gewaltm1 , Die andere Gewaltm2 >
In example (53a), Kein1 returns TRUE as m1 starts with the quantifier Jeder but m2 is a relative
pronoun. On the other hand, in example (53b), m2 is a definite noun phrase. Thus, Kein1 returns
FALSE.
Table 5.26 shows the result of the comparison of the new baseline against the addition of Kein1 .
107
Feature set
New baseline
+ Kein1
Difference:
MUC-B3 -f-score
66.44%
66.37%
-0.07%
Table 5.26.: Evaluation of the feature Kein1
Kein2 : This feature is based on the insight, that a link containing an antecedent that is quantified by
kein or jede is predominantly proposed because of a positive string matching feature. Therefore,
Kein2 is a modification of the string matching features, excluding a markable that starts with such
a quantifier at all:
a) {strmatchlc(m1h,m2h)&&
(strmatchlc(m1b,kein)==0)&&(strmatchlc(m1b,jede)==0)}
b) {seqmatch(m1h,m2h)&&
(54)
a.
b.
<Eine Gewaltm1 , Die Gewaltm2 >
<Keine Gewaltm1 , Die andere Gewaltm2 >
In example (54a), m1 is not quantified by kein or jede and thus Kein2 returns TRUE, whereas in
(54b) again, the feature returns FALSE.
Table 5.27 shows the result of the comparison of the new baseline against the addition of Kein2 .
Feature set
New baseline
+ Kein2
Difference:
MUC-B3 -f-score
66.44%
66.61%
0.17%
Table 5.27.: Evaluation of the feature Kein2
As the performance of Kein2 in the base feature set is better than the one of Kein1 (and so with the final
feature set), Kein2 is selected as final feature.
5.1.8. Disagreement in gender and number
This group is based on the discovery made in (4.3.10). Although there are several prefilter features
checking for a disagreement in gender or number, some markable pairs, in particular with an attributive
possessive pronoun, are listed as false positives. The reason for this was a suboptimal annotation of the
original SemEval-2010 dataset. The steps presented in this group contain re-annotations with the use of
heuristics as well as introductions of new prefilter features and extensions of the grammatical attributes
number and gender.
Agree1 : This step comprises the modification of the number attribute, the modification of the word
table and the introduction of two new prefilter features.
108
1. New number value both_ihr: The number value both_ihr denoting the two possibilities of
possessive pronouns starting with ihr∼ is added to the existing number values singular, plural and unknown:
0
0
1 singular
1 singular
2 unknown ⇒ 2 unknown
3 plural
3 plural
4 both_ihr
2. The modification of attributive possessive pronouns with the stem ihr∼ in the word table: in
each entry, the unknown in column 7 is transformed into both_ihr:
692 ihrer 2 2 31 PPOSAT unknown unknown unknown unknown
⇓
692 ihrer 2 2 31 PPOSAT both_ihr unknown unknown unknown
3. Finally, two new prefilter features for the number value both_ihr are introduced in the prefilters:
a) {(m1h.f1==f1.both_ihr)&&((m2h.f2==f2.male)||
(m2h.f2==f2.neutral))&&(m2h.f1==f1.singular)}
b) {(m2h.f1==f1.both_ihr)&&((m1h.f2==f2.male)||
(m1h.f2==f2.neutral))&&(m1h.f1==f1.singular)}
The features discard a markable pair, if m1 has the number value both_ihr, whereas m2
has the value singular and the gender value male or neuter; or vice versa with m2
being annotated with both_ihr.
Table 5.28 shows the result of the comparison of the new baseline against the addition of Agree1 .
Although the modification step Agree1 shows a bad performance on the new baseline, if it would
be undone in the final configuration, the score would be worse about 0.23%.
Feature set
New baseline
+ Agree1
Difference:
MUC-B3 -f-score
66.44%
65.43%
-1.01%
Table 5.28.: Evaluation of the feature Agree1
Feature set
Final set
- Agree1
Difference:
MUC-B3 -f-score
73.06%
72.83%
-0.23%
Table 5.29.: The final set without Agree1
Agree2 : This step only includes the introduction of four new prefilter features that checks for a disagreement between the gender values male/female and neuter. Additionally, the special
case like in the markable pair <Das Mädchen, sie> can be checked for each keyword. Here, only
Mädchen is used:
a) {(m1h.f2==f2.male)&&(m2h.f2==f2.neutral)}
b) {(m1h.f2==f2.neutral)&&(m2h.f2==f2.male)}
c) {(m1h.f2==f2.female)&&(m2h.f2==f2.neutral)&&
(seqmatch(m2h,Mädchen)==0)}
d) {(m1h.f2==f2.neutral)&&(m2h.f2==f2.female)&&
(seqmatch(m1h,Mädchen)==0)}
The features return TRUE (i.e. discard a markable pair), if m1 and m2 show a mismatch in terms
of the gender value neuter. In addition to that, the latter two features check for the inequality of
the female markable with the string Mädchen.
109
Feature set
New baseline
+ Agree2
Difference:
MUC-B3 -f-score
66.44%
66.96%
0.52%
Agree3 : This step comprises the modification of the gender attribute, the modification of the word
table and the introduction of two new prefilter features.
1. New gender value non_fem: The gender value non_fem denoting the incompatibility of possessive pronouns starting with sein∼ and female antecedents is added to the existing gender
values:
0
1 male
2 unknown
3 female
4 neutral
5 both
⇒
0
1 male
2 unknown
3 female
4 neutral
5 both
6 non_fem
2. Modification of attributive possessive pronouns with the stem sein∼ in the word table: in
each entry, the unknown in column 8 is transformed into non_fem:
747 seiner 2 2 33 PPOSAT unknown unknown unknown unknown
⇓
747 seiner 2 2 33 PPOSAT singular non_fem unknown unknown
3. Finally, two new prefilter features for the gender value non_fem are introduced in the prefilters:
a) {(m1h.f2==f2.non_fem)&&(m2h.f2==f2.female)}
b) {(m2h.f2==f2.non_fem)&&(m1h.f2==f2.female)}
The features discard a markable pair, if m1 has the gender value non_fem and m2 is female
or vice versa.
As this addition shows a bad performance for the baseline as well as for the final configuration
(cf. 5.32), Agree3 which is addressed to the possessive pronouns sein∼ seems to work anyhow
different than Agree1 for the possessive pronouns ihr∼.
Feature set
New baseline
+ Agree3
Difference:
MUC-B3 -f-score
66.44%
66.02%
-0.42%
110
Feature set
Final set
+ Agree3
Difference:
MUC-B3 -f-score
73.06%
73.0%
-0.06%
Table 5.32.: The final set with Agree3
5.2. Features for False Negatives
Since the performance of Agree1 shows a positive contribution to the final configuration, this modification step is used for it. Agree2 also shows good performance. The only issue touches Agree3 . Here,
there is no clear reason for this deterioration with the new baseline and in the final configuration. This
has to be figured out in future work (cf. (7.3)). Thus, for the final configuration, Agree1 and Agree2 are
used.
5.2. Features for False Negatives
5.2.1. Both markables contain a common, possibly appositive proper
name
This group is based on the insight, that proper names are not always the head of a markable but an apposition to a forename which is not mentioned in the other markable or to a common noun (cf. (4.4.3)). As
the features for proper names in the original feature set concern only the markables’ heads, an extension
is needed.
Proper1 : This feature modifies the original feature that handles markables whose heads are proper
names. Proper1 checks additionally if the head of one markable is contained in the other:
a) {(m1h.f0==f0.NE)&&(m2h.f0==f0.NE)
&&(seqmatch(m1h,m2a)||seqmatch(m2h,m1a))}
This feature returns TRUE in the case that both markables’ heads are proper names and the head
of one markable is in the other, and FALSE otherwise.
(55)
a.
b.
<(Ramonm1 h Valle-Inclan)m1 , (Valle-Inclanm2 h , der Exzentriker der Moderne)m2 >
<(Peterm1 h Müller)m1 , (Mariam2 h Maier)m2 >
In example (55a), the feature returns TRUE, as both markables’ heads are proper names (i.e. Ramon and Valle-Inclan) and the head of one markable (i.e. m2h: Valle-Inclan) occurs in the other
markable (here, the last word in m1 ). However in example (55b), there is no head of one markable
occurring in the other and thus, Proper1 returns FALSE.
Table 5.33 shows the result of the comparison of the new baseline against the addition of Proper1 .
Feature set
New baseline
+ Proper1
Difference:
MUC-B3 -f-score
66.44%
66.80%
0.36%
Table 5.33.: Evaluation of the feature Proper1
Proper2 : This feature is created as a named entity can also have a common noun as head (cf. example
(38c): “den Angolanerm1h Amadeu Antonio” vs. “Amadeum2h Antonio”). Thus, it also checks the
last word of each markable for a match in the other markable:
a) {((m1h.f0==f0.NE)&&(strmatchlc(m1h,m2a)))||
((m2h.f0==f0.NE)&&(strmatchlc(m2h,m1a)))||
((m1e.f0==f0.NE)&&(strmatchlc(m1e,m2a)))||
111
((m2e.f0==f0.NE)&&(strmatchlc(m2e,m1a)))}
Proper2 returns TRUE, if the head or last word of m1 is a proper name that occurs in m2 or vice
versa and FALSE otherwise.
(56)
a.
<(den Luftfahrtunternehmerm1 h Dornier)m1 , (Silviusm2 h Dornier)m2 >
b.
<(Der Fußball-Profim1 h Manuel Neuer, der . . . )m1 , (Der Nationaltorwartm2 h Manuel
Neuer, der . . . )m2 >
The feature Proper2 returns TRUE in the case of example (56a): m1 (resp. m2 ) contains a proper
name as its last word that occurs in m2 (resp. m1 ). However, this feature cannot handle common
proper names that are neither head nor last word of a markable, as it is the case in example (56b),
where a finite verb of a relative clause will determine the last word of each markable. For this,
the relational database model or the expressiveness of the feature definition language has to be
extended (cf (4.4.3); (7.3)).
Table 5.34 shows the result of the comparison of the new baseline against the addition of Proper2 .
Feature set
New baseline
+ Proper2
Difference:
MUC-B3 -f-score
66.44%
65.64%
-0.8%
Table 5.34.: Evaluation of the feature Proper2
The performance of Proper1 means an improvement to the new baseline, whereas Proper2 worsens the
score. As this trend is also obvious in the final feature set, the feature Proper1 is selected as final feature.
5.3. Features from inspirations of German approaches in
(2.5)
This group contains three features that arose from the inspiration of some German approaches, that are
presented in section (2.5).
Inspire1 : The idea of this feature is to simply check whether m2 constitutes a subject. The idea is
the information status: discourse-given referents are usually mentioned sentence-initially in the
canonical subject position. Therefore, if m2 is a subject, it is more likely coreferent with a preceding markable. Original features just focus on the commonality of grammatical functions or case
but do not focus on the subject role of m2 :
a) {(m2h.rewtag == rewtags.SUBJ)}
Table 5.35 shows the result of the comparison of the new baseline against the addition of Inspire1 .
Although there is a slight improvement with respect to the baseline, the adding of this feature to
the final feature set worsens the final score (5.36).
112
5.3. Features from inspirations of German approaches in (2.5)
Feature set
New baseline
+ Inspire1
Difference:
MUC-B3 -f-score
66.44%
66.48%
0.04%
Table 5.35.: Evaluation of the feature Inspire1
Feature set
Final set
+ Inspire1
Difference:
MUC-B3 -f-score
73.06%
73.02%
-0.04%
Table 5.36.: The final set with Inspire1
Inspire2 : This feature functions as prefilter and discards all markable pairs in which one markable has
the grammatical function of a copula predicate (i.e. the rewtag PRED). For instance, in the sentence
(Peter)m1 ist (ein Bauer)m2 , the markable m2 is predicative and non-referring; it rather describes
Peter’s property of being a farmer.
a) {(m1h.rewtag == rewtags.PRED)||(m2h.rewtag == rewtags.PRED)}
Inspire2 returns TRUE in the case of m1 or m2 being a copula predicate (and thereby discards the
markable pair) and FALSE otherwise.
In contrast to the baseline, the addition of Inspire2 to the final prefilter feature set worsens the
score about 0.23%.
Feature set
New baseline
+ Inspire2
Difference:
MUC-B3 -f-score
66.44%
66.58%
0.14%
Feature set
Final set
+ Inspire2
Difference:
MUC-B3 -f-score
73.06%
72.83%
-0.23%
Table 5.38.: The final set with Inspire2
Inspire3 : This clause-boundness feature is described in (Klenner and Ailloud, 2009) as a global feature
and is reimplemented as a link feature in (Broscheit et al., 2010b). It is based on binding theory.
The idea is that if both m1 and m2 are governed by the same verb (i.e. they are in the same
subclause) and none of them is a reflexive pronoun, a possessive pronoun or one is the apposition
of the other, m1 and m2 are disreferent:
a) {(m1h.stcnum==m2h.stcnum)&&
(m1h.f0!=PRF)&&(m2h.f0!=PRF)&&
((m1h.f0==POS∼)=0)&&((m2h.f0==POS∼)=0)&&
(m1h.rewtag != rewtags.APP)&&(m2h.rewtag != rewtags.APP)}
The feature Inspire3 returns TRUE (i.e. discards the markable pair) if m1 and m2 are in the same
sentence (as having the same governor, i.e. (m1h.rewpos==m2h.rewpos) returns zero values for
MUC) but neither is a reflexive or possessive pronoun and neither is an apposition.
Feature set
New baseline
+ Inspire3
Difference:
MUC-B3 -f-score
66.44%
66.44%
0.0%
113
Inspire4 : This feature concerning the first/second person is proposed by Broscheit et al. (2010b). It
checks whether both m1 and m2 are first or second person (i.e. non-third person). This feature
might be an approximation of a dialog:
a) {((m1h.f4==f4.first)||(m1h.f4==f4.second))
&&((m2h.f4==f4.first)||(m2h.f4==f4.second))}
This feature returns TRUE in the case that m1 and m2 are first or second person. It returns FALSE,
if m1 or m2 is third person.
Feature set
New baseline
+ Inspire4
Difference:
MUC-B3 -f-score
66.44%
66.42%
-0.02%
Although all features proposed in (2.5) are linguistically motivated, their performance on the final feature
set is bad. Inspire1 and Inspire2 show a positive effect on the new baseline, but the dependencies to other
features in the final feature set eliminate this slight improvement of Inspire1 and Inspire2 . Therefore,
no feature from this group is selected for the final feature set.
114
CHAPTER 6
Evaluation of the implemented link features
This chapter presents the results of choosing the best performing features of the ones proposed in the
previous chapter. In (6.1), the final link features are presented and described. In (6.2) the modified
prefilter feature set is presented and the features are described. As the improvements of the feature
set include several modifications of some features, the clearest way to show the performance of each
improvement is to group them together into several improvement steps. These and their contributions to
the final score are presented in (6.3). The final scores are presented in (6.4). Here, all four evaluation
measures introduced in (2.6) are used for showing the performance gain of the final feature set (with or
without further modifications of the prefilters and dataset) in comparison to the new baseline (cf. (4.2))
and the SemEval-2010 baseline. The final scores are also compared to the official scores of SemEval2010 in the setting German, closed, gold. In (6.5), the final features are checked for their contribution to
the final score. As there are still dependencies between the features, the series of results after each feature
addition is not always strong monotonic increasing. However in the reversed addition order, the increase
is almost weak monotonic, indicating that no feature worsens the final score. In (6.6), two additional
evaluations are done: one for a dissonance in the final feature set and one for checking how the three
sentence distance features contribute, if added to the final feature set.
6.1. The final link feature set
1. {strmatch(m1h,m2h)&&(seqmatch(m1h,m2h)==0)&&(editdist(m1h,m2h)<3)&&
((bswitchlc(m1h,m2h)&&(eswitch(m1h,s)||eswitch(m1h,es)||eswitch(m1h,e)))||
(bswitchlc(m2h,m1h)&&(eswitch(m2h,s)||eswitch(m2h,es)||eswitch(m2h,e))))&&
((strmatchlc(m2b,ein)==0)||(m2h.f0==f0.NE))&&
(strmatchlc(m1b,kein)==0)&&(strmatchlc(m1b,jede)==0)&&
((m1h.f0==f0.P∼)==0)&&((m2h.f0==f0.P∼)==0)}
⇒ The heads of m1 and m2 substring match but do not exact string match and their edit distance
is less than 3.
Moreover, the head of m1 starts with the head of m2 and ends with the suffix “s”, “es” or “e”
or vice versa.
m2 does not start with an indefinite article or its head is a proper name.
m1 does not start with the quantifier “kein” or “jede”.
Neither m1 nor m2 is a pronoun.
2. {seqmatch(m1h,m2h)&&
((strmatchlc(m2b,ein)==0)||(m2h.f0==f0.NE))&&
⇒ The heads of m1 and m2 exact string match,
115
Chapter 6. Evaluation of the implemented link features
m2 does not start with an indefinite article or its head is a proper name and
m1 does not start with the quantifier “kein” or “jede”.
3. {((m1b.txtpos<=m2b.txtpos)&&(m2e.txtpos<=m1e.txtpos))||((m2b.txtpos<=m1b.txtpos)&&
(m1e.txtpos<=m2e.txtpos))}
⇒ m1 includes m2 or vice versa.
4. {(m1b.txtpos<=m2b.txtpos)&&(m1e.txtpos<=m2e.txtpos)&&(m1e.txtpos>=m2b.txtpos)}
⇒ m1 precedes m2 but they overlap.
5. {(m2b.txtpos<=m1b.txtpos)&&(m2e.txtpos<=m1e.txtpos)&&(m2e.txtpos>=m1b.txtpos)}
⇒ m2 precedes m1 but they overlap.
6. {(m1h.f0==f0.NE)&&(m2h.f0==f0.NE)&&(seqmatch(m1h,m2a)||(seqmatch(m2h,m1a)))}
⇒ Both markables are proper names and the head of one markable is contained in the other.
7. {(m1h.f0==f0.NE)&&(m2h.f0==f0.NN)}
⇒ m1 is a proper name and m2 is a common noun.
8. {(m1h.f0==f0.NN)&&(m2h.f0==f0.NE)}
⇒ m1 is a common noun and m2 is a proper name.
9. {(m1h.f0==f0.NN)&&(m2h.f0==f0.NN)}
⇒ Both markables are common nouns.
10. {(m1h.rewtag == rewtags.SUBJ) && (m2h.rewtag == rewtags.SUBJ)}
⇒ Both markables are subjects.
11. {(m1h.rewtag != rewtags.SUBJ) && (m2h.rewtag != rewtags.SUBJ)}
⇒ Neither m1 nor m2 is a subject.
12. {m1h.f0==f0.PDS}
⇒ m1 is a substituting demonstrative pronoun.
13. {m1h.f0==f0.PIS}
⇒ m1 is a substituting indefinite pronoun.
14. {(m1h.f0==f0.PPER)&&(seqmatchlc(m1h,es)==0)}
⇒ m1 is a personal pronoun but does not exact string match with “es”.
15. {(m1h.f0==f0.PPOSS)||(m1h.f0==f0.PPOSAT)}
⇒ m1 is a substituting or attributive possessive pronoun.
16. {m1h.f0==f0.PRF}
⇒ m1 is a reflexive pronoun.
17. {(((m2h.f0==f0.PRF)||(m2h.f0==f0.PPOS∼)||(m2h.f0==f0.PREL∼))&&
(m1h.rewtag==rewtags.SUBJ))||(m1h.f0!=f0.PRELS)}
⇒ If m1 is a substituting relative pronoun, then m2 is a reflexive pronoun, a possessive pronoun
or also a relative pronoun.
18. {m2h.f0==f0.PDS}
⇒ m2 is a substituting demonstrative pronoun.
19. {m2h.f0==f0.PIS}
⇒ m2 is a substituting indefinite pronoun.
116
6.2. The final prefilter feature set
20. {(m2h.f0==f0.PPER)&&(seqmatchlc(m2h,es)==0)}
⇒ m2 is a personal pronoun but does not exact string match with “es”.
21. {(m2h.f0==f0.PPOSS)||(m2h.f0==f0.PPOSAT)}
⇒ m2 is a substituting or attributive possessive pronoun.
22. {m2h.f0==f0.PRF}
⇒ m2 is a reflexive pronoun.
23. {(m2h.f0==f0.PRELAT)||(m2h.f0==f0.PRELS)}
⇒ m2 is a substituting or attributive relative pronoun.
⇒ Both markables have the same number which is not unknown.
⇒ Both markables have the same gender which is not unknown.
6.2. The final prefilter feature set
1. {seqmatchlc(m1h,Mark) || seqmatchlc(m1h,Meter) ||
seqmatchlc(m1h,Prozent) || seqmatchlc(m1h,Mittwoch) ||
seqmatchlc(m1h,Sekunden) || seqmatchlc(m1h,Juli) ||
seqmatchlc(m1h,Milliarden) || seqmatchlc(m1h,Dollar) ||
seqmatchlc(m1h,Jahr) }
⇒ A link is filtered out, if the head of m1 exact string matches with units, currencies, month
days and the like.
2. {(m1h.f2==f2.male)&&(m2h.f2==f2.neutral)}
⇒ A link is filtered out, if the gender of m1 is “male” and the one of m2 is “neuter”.
3. {(m1h.f2==f2.neutral)&&(m2h.f2==f2.male)}
⇒ A link is filtered out, if the gender of m1 is “neuter” and the one of m2 is “male”.
4. {(m1h.f2==f2.female)&&(m2h.f2==f2.neutral)&&(seqmatch(m2h,Mädchen)==0)}
⇒ A link is filtered out, if the gender of m1 is “female” and the one of m2 is “neuter” and m2
does not exact string match with “Mädchen”.
5. {(m1h.f2==f2.neutral)&&(m2h.f2==f2.female)&&(seqmatch(m1h,Mädchen)==0)}
⇒ A link is filtered out, if the gender of m1 is “neuter” and the one of m2 is “female” and m1
does not exact string match with “Mädchen”.
6. {(m1h.f1==f1.both_ihr)&&((m2h.f2==f2.male)||(m2h.f2==f2.neutral))&&(m2h.f1==f1.singular)}
⇒ A link is filtered out, if the number of m1 is “both_ihr” and the one of m2 is “singular” and
its gender is “male” or “neuter”.
7. {(m2h.f1==f1.both_ihr)&&((m1h.f2==f2.male)||(m1h.f2==f2.neutral))&&(m1h.f1==f1.singular)}
⇒ A link is filtered out, if the number of m2 is “both_ihr” and the one of m1 is “singular” and
its gender is “male” or “neuter”.
8. {(m1h.f1==f1.singular)&&(m2h.f1==f1.plural)}
⇒ A link is filtered out, if the number of m1 is “singular” and the one of m2 is “plural”
9. {(m1h.f1==f1.plural)&&(m2h.f1==f1.singular)}
⇒ A link is filtered out, if the number of m1 is “plural” and the one of m2 is “singular”
117
10. {(abs(m2b.stcnum-m1b.stcnum)>2)&&((m1h.f0==f0.P∼)||(m2h.f0==f0.P∼))}
⇒ A link is filtered out, if the markables are more than two sentences apart from each other and
at least one of them is a pronoun.
11. {(abs(m2b.stcnum-m1b.stcnum)>0)&&((m1h.f0==f0.PRF)||(m2h.f0==f0.PRF)
||(m1h.f0==f0.PRELS)||(m2h.f0==f0.PRELS)||(m1h.f0==f0.PRELAT)||(m2h.f0==f0.PRELAT))}
⇒ A link is filtered out, if the markables are not in the same sentence and one of the markables
is a reflexive or a relative pronoun.
⇒ A link is filtered out, if m2 is a kind of indefinite pronoun.
13. {(m1h.f2==f2.female)&&(m2h.f2==f2.male)}
⇒ A link is filtered out, if the gender of m1 is “female” and the one of m2 is “male”.
14. {(m1h.f2==f2.male)&&(m2h.f2==f2.female)}
⇒ A link is filtered out, if the gender of m1 is “male” and the one of m2 is “female”.
15. {(m1h.f2!=m2h.f2)&&(m1h.f2!=f2.unknown)&&(m2h.f2!=f2.unknown)&&((m1h.f0==f0.P∼)}
⇒ A link is filtered out, if the genders of m1 and m2 are different but not unknown and m1 is a
pronoun.
6.3. Evaluation of improvement steps
As the final feature set does not only contain new added features but lacks of original features and
contains modified features, the clearest way of showing the improvements of each basic step (i.e. feature
deletion, feature insertion, feature modification and feature merging) and thereby not going too far in
details is to create groups of such basic steps. Subsequently, these groups are called improvement steps.
No.
1
2
3
4
5
6
7
8
9
10
11
Description
Involved final features
Feature engineering
Indefiniteness of the second markable
1; 2
Wrong assignment of a relative pronoun in m1
17
Problems with substring-matches
1
“Es“(“it“) as expletive pronoun in German
14; 20
Problems with the alias-feature
First markable begins with kein/jede
1; 2
A common, possibly appositive proper name
6
Simplification of the original pronoun features
12 - 23
Deletion of features concerning equality in case and person
Modification of word table, number attribute and prefilters
Disagreement in gender and number
Prefilters: 2 - 7
Units, currencies, month names, weekdays and the like
Prefilters: 1
Table 6.1.: The steps from the new baseline to the final feature set
In table 6.1, every improvement step is listed with its number (i.e. the position in the change from the
new baseline to the final feature set) and the final features that are involved in this step. The order in
which the improvement steps are set is irrelevant. It is based on the order of groups in chapter 4 and 5,
but separates the improvement steps modifying prefilter features.
118
6.3. Evaluation of improvement steps
The improvement steps from no. 1 - 9 are based on pure feature engineering, whereas improvement
steps no. 10 and 11 address the modification of the prefilter feature set, the annotation in the word table
and the modification of the number attribute (i.e. the insertion of a new value).
The basic steps in the respective improvement step are:
1. Insertion of ((strmatchlc(m2b,ein)==0)||(m2h.f0==f0.NE)) in substring and exact string matching features.
2. Insertion of feature no. 17
Deletion of the features:
(m1h.f0==f0.PRELS)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)||
(m2h.f1==f1.unknown))
(m1h.f0==f0.PRELAT)&&((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)||
(m2h.f1==f1.unknown))
3. Modification of the substring matching feature:
strmatch(m1h,m2h)&&(seqmatch(m1h,m2h)==0)&&(editdist(m1h,m2h)<3)&&
eswitch(m1h,e)))||
(bswitchlc(m2h,m1h)&&(eswitch(m2h,s)||eswitch(m2h,es)||
eswitch(m2h,e))))
Insertion of the requirement of non-pronominality:
((m1h.f0==f0.P∼)==0)&&((m2h.f0==f0.P∼)==0)
4. Modification of the pronoun features concerning personal pronouns in m1 and m2 :
(m1h.f0==f0.PPER)&&(seqmatchlc(m1h,es)==0)
(m2h.f0==f0.PPER)&&(seqmatchlc(m2h,es)==0)
5. Removal of the alias-feature of the original feature set
6. Insertion of (strmatchlc(m1b,kein)==0)&&(strmatchlc(m1b,jede)==0) in the
string matching features
7. Modification of the named entity feature
(m1h.f0==f0.NE)&&(m2h.f0==f0.NE) with the condition that the head of one markable
has to be contained in the other markable:
(seqmatch(m1h,m2a)||(seqmatch(m2h,m1a)))
8. Deletion of the features concerning attributive demonstrative pronouns (P DAT ) and attributive
indefinite pronouns (P IDAT ), as they are not annotated as markables in the dataset.
Deletion of the number condition, as a disagreement is excluded by the prefilters anyway:
((m1h.f1==m2h.f1)||(m1h.f1==f1.unknown)||(m2h.f1==f1.unknown))
Merging of the remaining pronoun features of the same class and markable:
e.g. (m1h.f0==f0.PPOSS) and (m1h.f0==f0.PPOSAT)
⇒ (m1h.f0==f0.PPOSS)||(m1h.f0==f0.PPOSAT)
9. Deletion of the last two features in the original feature set:
(m1h.f3==m2h.f3)&&(m1h.f3!=f3.unknown) (equality in case)
(m1h.f4==m2h.f4)&&(m1h.f4!=f4.unknown) (equality in person)
as they do not contribute to a better score and the person value of two coreferent markables can in
fact be different (e.g. in the case of a shift from direct speech to indirect speech or vice versa).
10. Inclusion of the features Agree1 and Agree2 :
Introduction of the new number value both_ihr, that describe the number value of a possessive
119
pronoun starting with ihr.
Annotation of every possessive pronoun starting with ihr with the number value both_ihr.
Insertion of the prefilter features no. 2 - 7 concerning the disagreement with the gender neuter and
the number value both_ihr.
11. Introduction of the prefilter feature no. 1 concerning units, month names and the like.
The performance of each improvement step in turn is listed in table 6.2. Each step shows an improvement. The best improvements in these steps are done by the improvement step no. 1, indefiniteness, and
no. 3, substring match, underlining the importance of checking the definiteness of m2 and the substring
match in coreference resolution. The improvement steps no. 10 and 11 provide a performance gain of
about 1.1%. Thus, the quality of the annotation as well as appropriate prefilter features are significant
for the performance of a coreference resolver. Sometimes, the improvement is very slight (as for the
removal of the alias-feature). But by removing features that do not perform well, the resulting feature
set becomes smaller and thus clearer. Moreover, it reduces runtime of the training of the decision tree
classifier significantly.
Improvement
step
New baseline
+ Indefiniteness
+ Relative pronoun
+ Substring match
+ es-pronoun
+ alias-removal
+ keine/jede with m1
+ Proper name
+ Simplification
+ Deletion case/pers.
+ Disagreement
+ Units
MUC
C
2390
2339
2335
2222
2194
2190
2189
2229
2235
2226
2248
2243
P
0.4608
0.492
0.4929
0.5431
0.5539
0.555
0.5578
0.5589
0.56
0.5621
0.5714
0.5821
R
0.7251
0.7096
0.7084
0.6742
0.6657
0.6644
0.6641
0.6763
0.6781
0.6754
0.682
0.6805
F1
0.5635
0.5811
0.5814
0.6016
0.6047
0.6048
0.6064
0.612
0.6134
0.6136
0.6219
0.6275
P
0.7367
0.7779
0.7802
0.8359
0.8489
0.8498
0.8516
0.8477
0.8487
0.8517
0.8545
0.8604
B3
R
0.898
0.8948
0.895
0.887
0.8856
0.8854
0.8849
0.8887
0.8905
0.8887
0.8893
0.8889
F1
0.8094
0.8322
0.8337
0.8607
0.8669
0.8673
0.8679
0.8677
0.8691
0.8698
0.8716
0.8744
MUC-B3
F1
0.664425
0.68437
0.685011
0.708185
0.712397
0.712634
0.713943
0.717789
0.719219
0.719554
0.725828
0.730662
Table 6.2.: Performance of the improvement steps
6.4. The final scores
In this section, the question is addressed, how well the final feature set performs compared to the new
baseline and to the SemEval-2010 baseline for SUCRE. This is shown in table 6.3. A comparison between the final scores of SUCRE and the results of SemEval-2010 in German, closed, gold (i.e. the
participants TANL-1, UBIU and SUCRE) is provided in table 6.4. With respect to the performance
differences, this section provides information about all four evaluation scores: MUC, B3 , CEAF and
BLANC.
The columns in table 6.3 mean:
• Evaluation score: all kinds of evaluation scores that are available in SUCRE
• New baseline: the scores of the new baseline gained by removing the first three features from the
original feature set (cf. (4.2))
• SemEval-2010: The scores of SUCRE in German, closed, gold, extracted from the literature
120
6.4. The final scores
• Feature engineering: The scores achieved by just modifying the link feature set
• Including modification of word table, number attribute and prefilters: This is the final score including all 11 improvement steps
The focus on the evaluation scores MUC and B3 is indicated by the highlighted lines of the respective
f-scores.
Evaluation score
New baseline
(cf. (4.2))
SemEval2010
Feature engineering
MUC-correct
MUC-Precision
MUC-Recall
MUC-f-score
B3 -all
B3 -Precision
B3 -Recall
B3 -f-score
CEAFM -all
CEAFM -Precision
CEAFM -Recall
CEAFM -f-score
CEAFE -correct
CEAFE -Precision
CEAFE -Recall
CEAFE -f-score
BLANC-Attraction-f-score
BLANC-Repulsion-f-score
BLANC-Precision
BLANC-Recall
BLANC-f-score
MUC-B3 -f-score
RAND-accuracy
Number of link features
2390
0.460767
0.725121
0.56348
13446
0.73673
0.898049
0.80943
13446
0.718132
0.718132
0.718132
7139.44
0.864444
0.703393
0.775647
0.315814
0.987133
0.611977
0.742423
0.670918
0.664425
0.974741
37
2439
0.481
0.74
0.584
13446
0.736
0.904
0.811
2226
0.562121
0.675364
0.613561
13446
0.851721
0.888692
0.869814
13446
0.795329
0.795329
0.795329
8277.64
0.872617
0.815532
0.843109
0.416529
0.993164
0.704375
0.70532
0.704848
0.719554
0.986487
25
0.729
0.729
0.729
0.618
0.782
0.664
0.6790
Including modification
of word table, number
attribute and prefilters
2243
0.582144
0.680522
0.6275
13446
0.860368
0.888931
0.874416
13446
0.80299
0.80299
0.80299
8395.35
0.875154
0.827128
0.850464
0.426256
0.993411
0.713553
0.706246
0.709881
0.730662
0.986972
25
Table 6.3.: The final scores
MUC: The MUC-correct value, the absolute number of correct coreference links, is biggest in SemEval.
Although, this value is not given in the literature, it can be predicted by the following formula, as
it correlates with the MUC-Recall:
0.74
M U C RecallSemEval
· M U C correctN ew baseline =
· 2390 ≈ 2439
M U C RecallN ew baseline
0.725121
(6.1)
After feature engineering, the MUC-correct value is about 213 links smaller. By modifying prefilters, word table and number attribute, it slightly rises by about 17 links. However, the precision
of MUC rises by about 10% compared with SemEval, such that the MUC-f-score of 62.75% is
about 4.35% bigger than in SemEval-2010. The reason for the decrease of recall might be due to
the fact that most features in the final feature set are based on linguistic analyses of false positives,
i.e. they try to constrain a vote for coreference.
121
B3 : A similar scene is given with B3 . The recall of 90.4% in SemEval-2010 slightly decreases to
88.9% after feature engineering and the improvement steps no. 10 and 11. However, the precision
increases about 12.4% up to 86%. Therefore, the final B3 -f-score is about 6.3% better than in
SemEval-2010.
MUC-B3 -f-score: The harmonic mean of MUC-f-score and B3 -f-score, which is focussed as an appropriate trade-off between MUC and B3 within this study, shows the same improvement as in MUC
and B3 : the SemEval-2010 baseline of 67.9% (66.4% in the new baseline) is outperformed by the
final system configuration, which achieves 73%. This means an improvement of 5.1% compared
to SemEval-2010 and even 6.6% compared to the new baseline.
CEAF: Depending on the similarity measure of the one-to-one entitiy alignment, one can distinguish
between mention-based CEAF (CEAFM ) and entity-based CEAF (CEAFE ) (see (2.6.3)). The
most common version of CEAF is the mention-based CEAF, for which precision and recall are
identical in the case of true markables.
In CEAFM , there is a clear improvement from 72.9% in SemEval-2010 (71.8% in the new baseline) up to 80.3%.
In SemEval-2010, there are no results for CEAFE . Thus, the final results are only compared to the
new baseline. Here, both in precision and recall, there is an improvement. The precision increases
by 1.1% and the recall even increases by 12.4%. So, the CEAFE -f-score is increased by 7.5%.
BLANC: The BLANC-Attraction-f-score (the BLANC-score for coreference links, see F 1c in table
2.16 in (2.6.4)) can only be compared to the new baseline. Here, there is a clear improvement in
the final scores of about 11%. The BLANC-Repulsion-f-score (the BLANC-score for disreference
links, see F 1d in table 2.16 in (2.6.4)) of the final scores shows a slight improvement of about
0.6%. With respect to the BLANC score, the recall of the final scores is much smaller (about
7.6%) than in SemEval-2010. However, the precision is significantly better (about 9.5%). Thus,
the overall BLANC score of the final scores is with 71% about 4.6% better than in SemEval-2010.
Number of features: The number of features clearly decreases by removing non-performing features
or merging two similar features together (see improvement step no. 8). This results in a faster
training step, in a clearer feature set and in the prevention of too much dependencies between the
link features.
CEAF
P
R
F1
SemEval-2010
German
closed × gold
SUCRE
72.9 72.9 72.9
TANL-1 77.7 77.7 77.7
UBIU
67.4 68.9 68.2
The final scores (rounded)
SUCRE
80.3 80.3 80.3
R
B3
P
F1
R
58.4
25.9
21.9
90.4
77.2
73.7
73.6
96.7
77.9
81.1
85.9
75.7
78.2
54.4
60.0
61.8
75.1
77.2
66.4
57.4
64.5
62.8
88.9
86.0
87.4
70.6
71.4
71.0
R
MUC
P
F1
74.4
16.4
22.1
48.1
60.6
21.7
68.1
58.2
BLANC
P
F1
Table 6.4.: SemEval-2010 Results - German, closed, gold vs. Final scores
Table 6.4 provides the comparison of the final configuration with the SemEval-2010 participants in
German, closed, gold. The involved evaluation scores with recall (R), precision (P) and f-score (F1 )
122
6.5. The performance of each feature in the final feature set
are: CEAFM , MUC, B3 and BLANC. The participants are SUCRE, TANL-1 and UBIU (for more details
on SemEval-2010 see (2.7)).
With respect to the SemEval-2010 competition, the best results in MUC and BLANC are achieved by
SUCRE, whereas it is outperformed by TANL-1 in B3 and CEAF. There is a bias that TANL-1 has a
greater precision but a lower recall than SUCRE.
Now, the final scores outperform every participant in each evaluation score with respect to f-score:
CEAF: Here, the final configuration outperforms the winner TANL-1 by 2.6%. As the CEAFM -score
is identical for recall and precision, the same improvements are given in recall and precision.
MUC: In SemEval-2010, SUCRE achieved best results in MUC and now outperforms itself by 4.4%.
However, the final configuration shows a significant decrease in the MUC-Recall. So, SUCRE2010
is better in recall and TANL-1 still outperforms the final system in precision.
B3 : The winner concerning B3 , TANL-1, is outperformed by the final system by 1.5%. As with MUC,
the precision is outperformed by TANL-1, whereas the recall is outperformed by SUCRE2010 .
BLANC: SUCRE performs best in SemEval-2010 in BLANC. The final system of SUCRE outperforms
itself by 4.6%. As with MUC and B3 , the final system is outperformed in recall by SUCRE2010
and in precision by TANL-1.
6.5. The performance of each feature in the final feature set
In table 6.5 and 6.6 the final features are checked for their contribution in the given order. As there are
still dependencies between the features, the addition of one feature might worsen the final score so far, as
it is the case with features no. 1 - 17 in table 6.5. But since in the reversed addition order, the increase is
weak monotonic except a slight decrease with features no. 9 (cf. features no. 25 - 9 in table 6.6), which
is not present in table 6.5 (cf. features no. 1 - 9), one can conclude that no feature worsens the final score.
6.6. Additional evaluations
m2 may be an indefinite pronoun?
One might be confused by the fact that the final feature set needs feature no. 19, which checks for m2
being an indefinite pronoun:
19. m2h.f0==f0.PIS
although, there is the prefilter feature no. 12, which discards all markable pairs in which m2 is any kind
of an indefinite pronoun:
However, if vector feature no. 19 is removed, the MUC-B3 -score decreases (cf. table 6.7).
Feature set
Final set
- (m2 == PIS)
Difference:
MUC-B3 -f-score
73.06%
73.00%
-0.06%
Table 6.7.: The performance without vector feature no. 19
123
MUC
Features
1-1
1-2
1-3
1-4
1-5
1-6
1-7
1-8
1-9
1 - 10
1 - 11
1 - 12
1 - 13
1 - 14
1 - 15
1 - 16
1 - 17
1 - 18
1 - 19
1 - 20
1 - 21
1 - 22
1 - 23
1 - 24
1 - 25
C
175
1158
1158
1158
1158
1197
1457
1456
1449
1449
1356
1356
1356
1355
1449
1449
1227
1227
1230
1355
1674
1788
2077
2128
2243
P
0.7292
0.5212
0.5212
0.5212
0.5212
0.5245
0.4834
0.4896
0.5254
0.5254
0.5394
0.5394
0.5394
0.5394
0.5223
0.5223
0.5298
0.5284
0.5286
0.5268
0.5589
0.5488
0.5739
0.5737
0.5821
R
0.0531
0.3513
0.3513
0.3513
0.3513
0.3632
0.4421
0.4417
0.4396
0.4396
0.4114
0.4114
0.4114
0.4111
0.4396
0.4396
0.3723
0.3723
0.3732
0.4111
0.5079
0.5425
0.6302
0.6456
0.6805
F1
0.099
0.4197
0.4197
0.4197
0.4197
0.4292
0.4618
0.4644
0.4787
0.4787
0.4668
0.4668
0.4668
0.4666
0.4774
0.4774
0.4373
0.4368
0.4375
0.4618
0.5322
0.5456
0.6007
0.6076
0.6275
P
0.9928
0.9249
0.9249
0.9249
0.9249
0.9213
0.8766
0.8809
0.8994
0.8994
0.9142
0.9142
0.9142
0.9143
0.8972
0.8972
0.9205
0.9205
0.9203
0.9101
0.8924
0.879
0.8692
0.8647
0.8604
B3
R
0.7622
0.8158
0.8158
0.8158
0.8158
0.8182
0.8381
0.8385
0.8384
0.8384
0.8289
0.8289
0.8289
0.8288
0.8379
0.8379
0.8197
0.8198
0.82
0.8259
0.847
0.8548
0.8756
0.8793
0.8889
F1
0.8623
0.8669
0.8669
0.8669
0.8669
0.8667
0.8569
0.8592
0.8678
0.8678
0.8695
0.8695
0.8695
0.8695
0.8666
0.8666
0.8672
0.8672
0.8673
0.866
0.8691
0.8667
0.8724
0.8719
0.8744
MUC-B3
F1
0.17758
0.565604
0.565604
0.565604
0.565604
0.574085
0.600162
0.60294
0.617034
0.617034
0.60745
0.60745
0.60745
0.607291
0.615662
0.615662
0.581387
0.580987
0.5816
0.602396
0.660137
0.669663
0.711503
0.716129
0.730662
Table 6.5.: Cumulative performance of the final feature set
A possible reason for this might be the following slight difference: the prefilter feature refers to the first
word in the markable and the final link feature to the head word. Usually, substituting indefinite pronouns
occur without determiner, but there are still examples where an indefinite pronoun is initialized by an
adverb or an definite article, like in so mancherm2 , die anderenm2 and die allermeistenm2 . However, if
the prefilter feature is modified to (m2h.f0==f0.PIS), the MUC-B3 -f-score decreases to 72.99%.
The final feature set and the three sentence distance features
Three features have been removed in (4.2):
If these features are re-added to the final feature set, the result decreases enormously (cf. table 6.8):
124
6.6. Additional evaluations
MUC
Features
25 - 25
25 - 24
25 - 23
25 - 22
25 - 21
25 - 20
25 - 19
25 - 18
25 - 17
25 - 16
25 - 15
25 - 14
25 - 13
25 - 12
25 - 11
25 - 10
25 - 9
25 - 8
25 - 7
25 - 6
25 - 5
25 - 4
25 - 3
25 - 2
25 - 1
C
0
0
318
318
353
681
681
706
705
721
873
899
910
918
1173
1236
1234
1234
1234
1749
1749
1749
1750
2158
2243
P
0.0
0.0
0.8051
0.8051
0.7879
0.6637
0.6637
0.6586
0.6589
0.6369
0.62
0.6635
0.6647
0.6643
0.6528
0.6581
0.6592
0.6592
0.6592
0.7081
0.7081
0.7081
0.7108
0.5789
0.5821
R
0.0
0.0
0.0965
0.0965
0.1071
0.2066
0.2066
0.2142
0.2139
0.2188
0.2649
0.2728
0.2761
0.2785
0.3559
0.375
0.3744
0.3744
0.3744
0.5306
0.5306
0.5306
0.5309
0.6547
0.6805
F1
nan
nan
0.1723
0.1723
0.1886
0.3151
0.3151
0.3233
0.323
0.3257
0.3712
0.3866
0.3901
0.3925
0.4606
0.4778
0.4776
0.4776
0.4776
0.6067
0.6067
0.6067
0.6079
0.6145
0.6275
P
1.0
1.0
0.9972
0.9972
0.9956
0.9785
0.9785
0.9756
0.976
0.9741
0.9527
0.9544
0.9538
0.9523
0.9344
0.9349
0.9354
0.9354
0.9354
0.9237
0.9237
0.9237
0.9251
0.8644
0.8604
B3
R
0.7549
0.7549
0.7775
0.7775
0.7783
0.7956
0.7956
0.7978
0.7978
0.7995
0.8024
0.8036
0.8047
0.8048
0.8218
0.8246
0.8247
0.8247
0.8247
0.8558
0.856
0.856
0.8576
0.8814
0.8889
F1
0.8603
0.8603
0.8738
0.8738
0.8737
0.8776
0.8776
0.8778
0.878
0.8782
0.8711
0.8726
0.8729
0.8724
0.8745
0.8763
0.8766
0.8766
0.8766
0.8885
0.8886
0.8886
0.8901
0.8728
0.8744
MUC-B3
F1
0.287855
0.287855
0.310187
0.463743
0.463743
0.472507
0.472204
0.475124
0.520547
0.535788
0.53926
0.541383
0.603412
0.618392
0.618278
0.618278
0.618278
0.721002
0.721046
0.721046
0.722389
0.721198
0.730662
Table 6.6.: Reversed cumulative performance of the final feature set
Feature set
Final set
+ sentence distance
Difference:
MUC-B3 -f-score
73.06%
64.02%
-9.04%
Table 6.8.: The performance with the final features and sentence distance
125
126
CHAPTER 7
Summary and conclusions
7.1. Summary
This diploma thesis intended to improve the performance of SUCRE’s coreference resolution in the
German language with the use of a more linguistic background. Therefore, misclassified markable pairs
(i.e. links) are considered in their context to determine what linguistic phenomenon is responsible for
this coreference or disreference and how this phenomenon could be modeled in a link feature in order to
provide a feature set that is able to rightly classify the respective markable pairs afterwards.
Based on the architecture of the mention-pair model used in SUCRE, this task just focuses on the
resulting feature vectors, i.e. the input for the pairwise classifier. Thus, one issue which has to be kept
in mind is that this work only modifies the classifier’s result rather than the overall result, i.e. the output
of the clustering step. And if the classification is improved, it might not take effect on the clustering (see
Combining classification and clustering in (2.3.1))
Chapter 2 first outlined the term of coreference and coreference resolution. Two markables are coreferent, if they refer to the same real-world entity. The coreference resolution over a set of markables yields
a coreference partition in which every cluster constitutes an equivalence class.
In an end-to-end coreference system, the input is raw text. Thus the markables still have to be detected.
This is a hard task as there are difficulties like nested markables.
In (2.3), three models based on supervised machine learning are presented. The mention-pair model
uses a pairwise classifier which checks for two markables whether they are coreferent or not. Afterwards, a clustering step uses these pairwise decisions to create a final coreference partition. This model
is used in SUCRE. However, one drawback of this model is that it does not provide a perspective over
the entire cluster and thereby allows two disreferent markables ending up in the same cluster. This issue
is addressed by the entity-mention model. Here, a markable is compared with a cluster of preceding
markables and thereby it is ensured that every markable in a cluster is compatible with each other. Another problem of the mention-pair model is that every candidate antecedent is considered independently
from each other and thus, there is no chance for comparing one candidate against another. This flaw is
addressed by the ranking model. A training instance of the ranking model is a triple of markables: the
respective markable, its true candidate antecedent and a false one.
Apart from the supervised approaches, Cardie and Wagstaff (1999) present an unsupervised clustering
method for coreference resolution (cf. (2.4)). Advantages of a pure clustering task is that there is no need
for labeled training data and that it includes local as well as global constraints. In contrast to the models
using a pairwise classifier, whose input is a feature vector, which corresponds to a markable pair, Cardie
and Wagstaff (1999) represent each markable as a feature vector comprising 11 markable features. As a
distance measure, Cardie and Wagstaff (1999) use a combination of feature weights and incompatibility
values.
Coreference resolution has been researched predominantly in English. In (2.5), five approaches for
127
Chapter 7. Summary and conclusions
German coreference resolution were presented. Hartrumpf (2001) (cf. (2.5.1)) applies a hybrid approach
that combines “syntactico-semantic” rules with corpus statistics. He uses 18 coreference rules that “license possible coreference”. For all licensed markable pairs, all possible partitions are created, however
pruned if certain conditions are violated.
The second approach is from Strube et al. (2002) (cf. (2.5.2)). They show the necessity of a more
complex string matching feature. When they implemented two features for the minimum edit distance
between m1 and m2 , they improved the performance for non-pronouns (i.e. definite noun phrases and
proper names).
Versley (2006) implements hard and weighted soft constraints, whose weights are estimated with a
maximum entropy model. He focusses on the feature design of proper names and definite noun phrases.
He uses several external sources like hypo-/hypernymy information from GermaNet for what he calls
coreference bridging.
In (2.5.4), the approach of Klenner and Ailloud (2009) is shown. They use a memory-based pairwise
classifier and a modification of a Zero-One Integer Linear Programming system (ILP). This ILP framework is used as a clustering method that enables the use of global constraints like clause-boundedness
(see binding theory). In addition, they show that the first solution of their algorithm (“Balas-First”) is
very often the best and thus “optimization [. . . ] is not needed”.
Broscheit et al. (2010b) try to create a coreference system that is freely avaible for further research in
coreference resolution. Therefore, they extend the coreference system BART for the use of German data.
Beside the feature set from Klenner and Ailloud (2009), they implement some features concerning the
first/second person or quoted speech. The performance is best with a maximum entropy classifier and a
separation for pronouns and non-pronouns (“split”).
In (2.6), the four most common evaluation scores (i.e. MUC, B3 , CEAF and BLANC) are described.
The MUC-score focusses on coreference links, disregarding disreference links and singletons. It
counts each link that occurs both in the predicted partition and in the true partition. By dividing it
by the number of links in the true partition or in the predicted partition, one can get the recall or the
precision of MUC. Their harmonic mean is the f-score of MUC.
The B3 -score on the other hand disregards links and measures the overlap of predicted and true clusters. For each markable, the number of common markables in the respective cluster is divided by the
number of markables in the respective true or predicted cluster. This results in recall and precision,
whose harmonic mean again is the f-score.
One drawback of B3 is that clusters may be used several times for an alignment of predicted and true
clusters. The solution for this is done by CEAF. It only allows a one-to-one alignment between clusters.
The clusters are chosen according to a similarity measure (Luo, 2005). Depending on different measures,
one can distinguish between CEAFM and CEAFE .
The most complex evaluation score is the BiLateral Assessment of Noun-Phrase Coreference (BLANC).
BLANC computes all coreference and disreference links in a cluster. For each link class, precision and
recall are computed by dividing the number of right links by the number of predicted links or by the
number of true links (cf. table 2.15). The final BLANC score is the arithmetic mean of f-scores for both
link classes.
The SemEval-2010 competition in coreference resolution is sketched in (2.7). It provides analyses of
six languages including German and English. Based on the properties closed/open (meaning the use of
external sources) and gold/regular (meaning the use of automatic annotation), there are four evaluation
settings. Here, SUCRE performs best in closed × regular in German, English and Italian.
Chapter 3 presented the SUCRE system. For coreference resolution, it uses a mention-pair model.
For the feature definition, there are the relational database model and the related regular feature definition
language. Feature engineering and the classification are clear separated. The output of a preprocessing
step is the relational database model containing at least the word table, the markable table and the link
table, which are connected to each other via foreign keys.
The use of a regular feature definition language enables to simply express features and to have access
to the different levels in the relational database model.
128
7.1. Summary
After creating links and thereof feature vectors, a classifier (e.g. a decision tree classifier) is trained.
The classification results (i.e. the pairwise decisions) are used for best-first clustering in order to get to
the final coreference chains (i.e. the final partition).
In (3.3), the interactive visualization of the feature space with the use of Self Organizing Maps was
presented. The advantage of presenting high dimensional coreference data in a low dimensional space
enables the user to better understand the distribution of coreference data in the feature space. By exploring areas in the feature space with gray nodes (i.e. with nodes that are assigned with coreference and
disreference links), one can find new insights for feature engineering. Moreover, SOMs help to annotate
a larger amount of data faster.
Kobdani and Schütze (2010b) show the multi-lingual aspect of SUCRE by implementing identical and
universal features that are applicable on several languages. Here, the proposed languages are Dutch,
German, Italian and Spanish.
In chapter 4, the linguistic analysis of false positives and false negatives took place. The first issue
was a dissonance between the MUC-B3 -f-score of SemEval-2010 and of the initial configuration. This
was mitigated by removing three features concerning the distance in terms of sentences.
Within the analysis of the false positives, several groups of most frequent link errors were found. One
issue was the definiteness of the anaphor. Here, an exact string match outweighs an existing feature
for indefinite anaphors. A solution was to exclude the indefiniteness of m2 within the string matching
features.
Another problem was the wrong assignment of relative pronouns, for instance m1 as a relative pronoun
is assigned to a succeeding markable with a grammatical function that is not permitted or m2 as a relative
pronoun is related to an antecedent which is not directly adjacent as it is usually the case.
Sometimes, the right antecedent of an anaphor was between the predicted antecedent and m2 . This
problem cannot be solved as there is no way for checking the context of a markable (i.e. whether there
are compatible markables closer to m2 ) and the link features are only defined on two markables and do
not provide a set of markables between m1 and m2 .
A further group addressed the case where a reflexive pronoun is related to a non-subject in the same
sentence. A solution would be to extend the feature concerning reflexive pronouns by the check for the
subject role of the other markable. However, this feature does not perform well and thus has been ignored
in the final feature set.
The most interesting issue were link errors concerning substring matches. In the case of compound
words, two new predicates have been introduced, however, the combination of compound words and
inflectional suffixes cannot be handled yet.
The German personal pronoun es is very often non-referring. However, it has always the state of a
markable. A way of solving this was to check whether a given personal pronoun differs from es.
Common nouns like Mark (a former German currency), Prozent, Meter or Sekunden usually do not
corefer with other markables that string match. A simple link feature that checks for such a keyword and
returns TRUE, if the markables’s heads are different, does not perform well enough. A better solution
was to code this feature as a prefilter feature that excludes all links containing such keywords.
The alias feature in the inital feature set produces few false positives but no real true positive.
Removing this feature slightly improves the overall results.
If a markable m1 starts with the quantifier kein or jede, the succeeding markable m2 is restricted to a
small set of possible NP-forms (e.g. possessive pronouns or reflexive pronouns). This most often occurs
in the case of a string match. Modifying the string matching features with a condition described above,
improves the results.
Another problem in the false positives was the disagreement in grammatical attributes like number,
gender or person which was traced back to the lack of gold annotation of attributive pronouns like ihr in
the original SemEval-2010 dataset. A re-annotation with the use of heuristics could partially solve this
problem.
By analysing the false negatives, it became obvious that the false negative output of SUCRE was
misleading. Not the coreference links that are misclassified as disreferent were printed out but links that
129
come up by regarding the final cluster.
In the case of reflexive pronouns, one has to take into account complex syntactic structures like with
control verbs.
Another issue was that markables whose heads are common nouns might corefer if the common nouns
are related semantically. This cannot be captured by SUCRE as there are no external information sources
like GermaNet.
A last problem concerned common proper names that do not constitute the head of a markable. Here, a
check, whether one markable head that is a proper name occurs anywhere in the other markable, improves
the final score.
In chapter 5, the features proposed in chapter 4 and in some German approaches in (2.5) were implemented and evaluated with the new baseline. Sometimes, this evaluation misled as the results were
opposed to the evaluation on the final feature set. This conflict is based on the different features in the
baseline feature set and in the final feature set, as different features mean different dependencies.
The best performing features were selected for the final feature set.
7.2. Conclusion
The final scores: As a bottom line under the results in chapter 6, one can conclude that the linguistic
analyses of the false positives/negatives as well as further modifications of the feature set yield
very good outcomes. The f-scores of each evaluation measure are increased in comparison to the
new baseline (cf. (4.2)) and the results for SUCRE in SemEval-2010.
Moreover, whereas SUCRE only wins with respect to MUC and BLANC in SemEval-2010, now,
the final system is best for all four evaluation measures and thereby also outperforms TANL-1 in
CEAF and B3 .
However, one problem is that the MUC-Recall decreases significantly (from 74% down to 68%).
A possible reason for this deterioration might be the fact, that most new features are based on the
analysis of false positives and thereby try to create new constraints for coreference.
On the other hand, the MUC-Precision increases much more than the recall decreases, in particular
from 48.1% up to 58.2%.
Similar trade-offs for precision and recall are also given with B3 and BLANC.
The reason for this kind of improvement is based on the fact that it is more simple to find restrictions for coreference resolution (i.e. indicators for disreference) than characteristics/indicators for
coreference.
SUCRE architecture: The benefit of the architecture of SUCRE in this feature research was considerable. By the use of the relational database model and the feature definition language, it was
possible to implement fine-grained features that can model certain linguistic patterns and thereby
improve the quality of features. However, the expressiveness of the definition language and also
the set of attributes for words or markables could be extended further (see (7.3) for future work).
Fewer features: A further result of the feature engineering within this diploma thesis is the decrease
of the number of features. In the initial configuration, the number of features is 40, in the new
baseline there are 37 features and now in the final feature set, the number of features is 25. Keeping
in mind that the feature values constitute the input for the decision tree classifier, the runtime for
the training step increases exponentially with the number of link features. Table 7.1 shows the
runtime1 comparison for the initial configuration, the new baseline and the final feature set. While
the runtime for the test step is roughly constant, there is a significant difference between 40 features
1 An
130
IMS-server (“koenigspinguin”) was used with 4 x Dual-Core Opteron 8216, 2,4 GHz and 32 GByte memory
7.3. Future work
in the initial feature set and 25 features in the final feature set. Further investigations in the feature
set will be more comfortable resp. with respect to time, more research is possible.
Training
Test
Total
Runtime comparison
Initial configuration
New baseline
40 features
37 features
4093 seconds
1953 seconds
≈ 68.21 minutes
= 32.55 minutes
557 seconds
541 seconds
≈ 9.28 minutes
≈ 9 minutes
4650 seconds
2494 seconds
= 77.5 minutes
≈ 41.6 minutes
Final feature set
25 features
496 seconds
≈ 8.3 minutes
519 seconds
≈ 8.7 minutes
1015 seconds
≈ 16.9 minutes
Table 7.1.: Runtime comparison
Moreover, Kobdani et al. (2010) argue that “removing redundant features often improves classification accuracy. Smaller feature sets also can be more robust and cut down classification times.”
Furthermore, a smaller feature set provides a clearer overview about each feature and its contribution to the final score. As the link features show dependencies to others, pruning features avoids
too many dependencies.
Annotation quality: The quality of annotation is not perfect in the used TüBa-D/Z dataset (cf. (4.3.10)).
By improving the annotation (e.g. for attributive possessive pronouns like ihr) the performance of
the system improves considerably. With respect to the improvement steps no. 10 (i.e. modifying
the word table and the number attribute, see table 6.1), the performance gains 0.6%.
So, one can conclude that the quality of annotation is important for a good performance.
7.3. Future work
Although the system’s performance has been improved significantly, the result is certainly not optimal
yet. There are several areas in which further work should be done. This section proposes four issues to
be further improved resp. investigated. As this way of doing feature engineering shows a great effect on
the performance of SUCRE, the feature engineering using linguistic analysis should be continued. There
are a lot of linguistic phenomena that cannot be captured because of a lack of enough expressivity of the
feature definition language or external knowledge, if it should be used (cf. closed settings in SemEval2010). Moreover, the system of SUCRE shows some problems that have to be solved in order to get
reliable data.
Pseudo language/Relational database model: The feature definition language and the relational
database model are not flexible enough for capturing all linguistic phenomena in German. Subsequently, some extensions are proposed and exemplified by some linguistic problems that occured
in the course of of this study.
• Hartrumpf (2001) uses a unification mechanism instead of the identity of feature values. A
drawback in SUCRE’s identity check is the dealing with the feature value unknown. If one
wants to check the agreement of two markables in terms of gender, the setting of unknown
has to be excluded. For instance the prefilter feature
{(m1h.f2!=m2h.f2)&&(m1h.f2!=f2.unknown)&&(m2h.f2!=f2.unknown)
&&((m1h.f0==f0.P∼)}
131
checks whether the genders of two markables mismatch. Such a mismatch does not include the value unknown, i.e. female mismatches male but female does not mismatch
unknown. If the equality predicate ==/2 resp. inequality predicate !=/2 is redefined in
a unification check, say =u /2 resp. !=u /2, such that unknown is unifiable with any other
value, the resulting prefilter feature would be:
{(m1h.f2!=u m2h.f2)&&((m1h.f0=u f0.P∼)}.
• In the analysis of false positives, one insoluble problem was the case where m1 is compatible
with m2 but precedes the right antecedent (cf. (4.3.3)). Due to the local perspective of
link features that is based on markable pairs, this problem remains unsolvable. A possible
solution might be the extension from markable pairs to a triple containing m1 , m2 and a set of
markables M that are inbetween. A possible link feature could be one that uses the markable
m3 that is universally quantified over M . So one can check for incompatibility between m2
and m3 .
This or other solutions for this problem of “relative proximity in context” can be researched
in future work.
• As morphology is very predominant in German (other than in English), a good way of defining strings is necessary. SUCRE provides the predicates seqmatch/2 and strmatch/2
in a case-sensitive and case-insensitive version, which check for exact string match or substring match between two words. However, these predicates are not expressive enough for
German morphology. In order to capture valid compound words, during this study, the predicates bswitch/2 and eswitch/2 (case-sensitive and case-insensitive) have been introduced for checking if one string starts or ends with the other. Now one can vote for a coreference with the markable pair <Schäferhundm1h , Hundm2h > and for a disreference with the
markable pair <Hundefängerm1h , Hundm2h >, although both markable pairs show a substring
match.
However, these two predicates are still not expressive enough. Consider the markable pair
<Bürgerombudsmannsm1h , Ombudsmannm2h >. Here, the head of m1 is the compound word
Bürgerombudsmann with the inflectional suffix s. This suffix prevents a positive response for
the check whether the head of m1 ends with the head of m2 (i.e. eswitchlc(m1h,m2h)).
A possible solution for this might be a string concatenation predicate +c /2 which concatenates two strings to a new one which functions as predicate input. This way, the markable pair <Bürgerombudsmannsm1h ,Ombudsmannm2h > could be captured by the expression:
eswitchlc(m1h,m2h+c s).
An alternative solution could be the general introduction of a lemmatized version of each
word in the word table.
• Although the markable pair <(derm1b Rennfahrerm1h Michael Schumacher, der sieben mal
Weltmeister wurdem1e ), (derm2b Weltmeisterm2h Michael Schumacher, der 1969 geboren
wurdem2e )> obviously contain two coreferent markables, SUCRE cannot handle this, as the
markables’ heads do not show a match.
There might be two ways of solving this drawback. On the one hand, it becomes obvious,
that three “pointers” (i.e. first word, head word and last word) are not enough for large noun
phrases. So an extension of the possible “pointers” in a markable could yield a solution. For
instance, if the markable head is a common noun and the succeeding word is a proper name,
it should be pointed to by a name-word-pointer m1n, otherwise m1n is identical to the head
m1h. Thereby, a check of exact string match between m1n and m2n would yield a positive
response, whereas a simple head match check would deny coreference.
On the other hand, a more flexible solution is the introduction of an existential bounded
variable, say ∃X. This means, a variable that corresponds to a string or word object and is
132
7.3. Future work
bounded within the respective feature. A possible solution for the problem sketched above
might be a feature like this:
{seqmatch(X,m1a)&&seqmatch(X,m2a)&&(X.f0==NE)}.
This feature checks whether there is common word X in both markables and whether this
common word is a proper name.
The use of bounded variables could also be used in the context of lists.
• A further extension for the pseudo language is the introduction of lists. This enables more
readable features in the cases, several options have to be checked. For instance, consider valid
case suffixes for German nouns. The new introduced substring feature checks for three suffixes: (eswitch(m1h,s)||eswitch(m1h,es)||eswitch(m1h,e)). If the feature definition language is extended with bounded variables, a list structure and a member
predicate member/2, this check can be simplified to:
(eswitch(m1h,X)&& member(X,[s,es,e])).
Another task for which the usage of lists is helpful would be the check for a control verb (cf.
example (36d) in (4.4.1)). For instance, given the markable pair <Peterm1 ,sichm2 >, where
Peter constitutes the object of a Verb V , one can check whether there is a dependency relation
between V and the main verb governing sich. If this is true, a check whether V is in a list of
object control verbs [bitten, überreden, . . . ], could indicate a confident coreference between
Peter and sich.
The prefilter feature concerning units, month names and the like could be simplified enormously. An alternative to the final prefilter feature no. 1 could be:
seqmatchlc(m1h,X)&& member(X,[Mark, Meter, Prozent, Sekunden,
...]). Morever, this way, the list of keywords can be augmented easier.
A possible list could be expressed by: m1a.f0, i.e. a list of part-of-speech tags corresponding to each word in m1 . Then, one can check whether a markable contains a proper name
somewhere, rather than just checking the head for a proper name.
• Another way for making a feature set clearer is to use macros, that could be defined in another
file. This way, complex features become readable, typing errors (i.e. missing parentheses)
can be prevented and thus, creating complex features would be encouraged.
• A drawback of the relational database model is that there is no way to check the context of
a markable. For instance, the words before or after a markable. Broscheit et al. (2010b)
implement a feature that checks whether a markable (or both markables) is (are) inside a
quoted speech. A possible way for checking such a situation could be an examination within
a window of n words before and after the markable, whether there is a quotation mark. The
benefit of this feature is an appropriate check for person match. If there is a shift from direct
speech to indirect speech or vice versa, a person mismatch does not indicate disreference.
For implementing such a feature, the relational database model has to provide information
about the context of a markable, i.e. what n tokens are before and after the markable.
Problems in the annotation of the dataset: One issue that caused trouble was the annotation of
the used dataset of TüBa-D/Z. One case was the markable annotation of non-referring markables
like the es-pronoun (cf. (4.3.6)), which contradicts the idea of true markables. As there are no
clues for deciding whether an es-pronoun is referring or not (given the local perspective and the
current annotations), a revised annotation addressing the question of pronoun reference may help.
However, the most problematic case was the annotation of attributive possessive pronouns like
sein∼ or ihr∼. Here, every grammatical attribute was set to unknown as there are no gold annotation for them in the original SemEval-2010 task dataset (cf. (4.3.10)). Moreover, the annotation
of attributive pronouns in SemEval-2010 grammatically agrees with the corresponding NP-head,
133
rather than with the antecedent. Thus, an annotation of this kind would be useless, if not counterproductive, given that the attributes of the corresponding NP-head and of the antecedent contradict.
One way of solving this problem is to use an heuristic like: the antecedent of ihr∼ can only be
singular/female or plural. This works fine for the possessive pronoun ihr∼ but not for the pronoun
sein∼, for which the antecedent has to be singular/(male or neuter) (cf. feature Agree3 in (5.1.8)).
A future work could investigate this issue or find further solutions for the annotation problem.
Using of external sources of knowledge: As described in (4.4.2), sometimes, it is important to
know some semantic relations between two common nouns, in order to decide whether they could
be coreferent or not. This requires external knowledge, for instance a knowledge base for ontological information like in GermaNet. Given a markable pair <dem Tierm1 , dem Hundm2 >, there is
no linguistic basis for regarding m1 and m2 as coreferent. However, a feature that returns TRUE
for a check of hyponymy or synonymy, could provide evidence for coreference.
Inside the SUCRE system: Apart from the relational database model and the feature definition language there are further issues concerning the SUCRE system in general.
• Klenner and Ailloud (2009) show that a good clustering algorithm is important for coreference resolution with a mention-pair model. They use an ILP (Integer Linear Programming)
framework that enables to use global constraints such as clause-boundedness (cf. (2.5.4))
which is not possible with best-first clustering.
Thus, a further improvement of SUCRE might be the use of such an ILP framework or any
other clustering method that provides the inclusion of global constraints.
• The MUC-B3 -result of SUCRE in SemEval-2010, German, closed, gold is 67.9%. However,
the inital configuation performs worse (namely 61.78%). By removing the first three features
concerning the markable distance in terms of sentences the system achieves 66.44%. This
was defined as new baseline from which the feature research started. By adding the three
features to the final feature set, the score of 73.06% collapses down to 64.02% (cf. (6.6)).
As the distance between two markables is a very important feature, it is not plausible why
a removing of these features improves the performance. This problem has to be solved, in
order to get reliable information and possibly an even better result than the one achieved
within this study.
• A misleading output was detected within the analysis of the false negatives concerning the
reflexive pronoun sich (cf. (4.4.1)). Here, the output should just show disreference links
that are misclassified rather than all possible links that come up by regarding a cluster of
markables. So, the output has to be moved in front of the clustering step. Then, a more
precise analysis of false negatives is possible. This yields useful statistics about the distances
in terms of sentences and the like which might be important for tuning constants in features.
• Another issue that concerns this output problem is that there is no way for evaluating the pure
classification step. Although the input of the classifier (i.e. the feature vectors) are modified
within the feature research in this study, the result is evaluated by scoring the partition, that
comes up after using best-first clustering (e.g. by MUC or B3 ). A more simple way for scoring the classification quality would be to count the number of true positives, false positives
TP
P
and false negatives and compute precision ( T PT+F
P ), recall ( T P +F N ) and f-score. However,
for different feature sets, the sums (T P + F P + F N + T N ) are unequal. So, the output
has to be revised in order to get reliable numbers for T P , F P , . . . and to be able to score the
classification without the aspect of a clustering step inbetween.
134
APPENDIX A
The Stuttgart-Tübingen tag set (STTS)
The part of speech tag set which is used in the TüBa-D/Z corpus, listed in (STTS, 2011).
Word class
Adjectives
Adverbs
part of speech tag
description
example
ADJA
attributive adjective
[das] große [Haus]
ADJD
adverbial/predicative adjective
[er fährt] schnell; [er
ist] schnell
ADV
adverb
schon, bald, doch
PAV
pronominal adverb
dafür, dabei, deswegen
preposition; left part of a circumposition
in [der Stadt]
fused preposition and determiner
im [Haus],
[Sache]
APPO
postposition
[ihm] zufolge
APZR
right part of a circumposition
[von jetzt] an
(in)definite article
der, die, das, ein, eine
APPR
Adpositions
Determiners
APPRART
ART
Cardinal number
CARD
cardinal number
zwei [Männer]
Foreign material
FM
foreign material
“A big fish”
Interjection
ITJ
interjection
mhm, ach, tja
KOUI
infinitival subjunction
um [zu leben]
KOUS
finite subjunction
weil, dass
KON
parataxis/conjunction
und, oder
comparative conjunction
als, wie
Conjunctions
KOKOM
Common noun
NN
common noun
Tisch, Herr
Proper name
NE
proper name
Hans, Hamburg
zur
135
Appendix A. The Stuttgart-Tübingen tag set (STTS)
Word class
part of speech tag
description
example
substituting demonstrative pronoun
dieser, jener
attributive demonstrative pronoun
jener [Mensch]
substituting indefinite pronoun
keiner, man
attributive indefinite pronoun without
determiner
kein [Mensch]
PIDAT
attributive indefinite pronoun
[die]
[Brüder]
PPER
personal pronoun
ich, er, ihm
PPOSS
substituting possessive pronoun
meins, deiner
attributive possessive pronoun
mein [Buch], deine
[Mutter]
substituting relative pronoun
[der Hund ,] der
attributive relative pronoun
[der Mann ,] dessen
[Hund]
PRF
reflexive pronoun
sich, einander
PWS
substituting interrogative pronoun
wer, was
PWAT
attributive interrogative pronoun
welche [Farbe]
PWAV
adverbial interrogative pronoun or relative pronoun
warum, wo
PTKZU
“zu”-particle before infinitive
zu [gehen]
negation particle
nicht
PTKVZ
verbal particle
[er kommt] an
PTKANT
answer particle
ja, nein
particle with adjectives/adverbs
am [schönsten]
TRUNC
truncation
An- [und Abreise]
VVFIN
finite full verb
[du] gehst
VVIMP
imperative full verb
komm [!]
VVINF
infinite full verb
gehen, ankommen
VVIZU
infinite full verb with “zu”-particle
anzukommen
VVPP
full verb as past participle
gegangen
VAFIN
finite auxiliary verb
[du] bist
VAIMP
imperative auxiliary verb
sei [ruhig !]
VAINF
infinite auxiliary verb
werden, sein
VAPP
auxiliary verb as past participle
gewesen
VMFIN
finite modal verb
dürfen
VMINF
infinitival modal verb
wollen
VMPP
modal verb as past participle
gekonnt
PDS
PDAT
PIS
PIAT
Pronouns
PPOSAT
PRELS
PRELAT
PTKNEG
Particles
PTKA
Truncation
Verbs
136
beiden
APPENDIX B
The pseudo language for SUCRE’s link feature definition
Functions and operations for link features that are available for TüBa-D/Z corpus, listed on (IMSWikipedia, 2011):
B.1. Markable keywords
m1: The first markable m1 (usually the antecedent)
m1b: The first word (the beginning) of markable m1 (e.g. the determiner)
m1h: The head word of markable m1 (e.g. the noun)
m1e: The last word (the ending) of markable m1 (usually the head or a post-nominal adjunct)
m2: The second markable m2 (usually the anaphor)
m2b: The first word (the beginning) of markable m2 (e.g. the determiner)
m2h: The head word of markable m2 (e.g. the noun)
m2e: The last word (the ending) of markable m2 (usually the head or a post-nominal adjunct)
Two special markable keywords:
m1a: All words of markable m1 (e.g. <determinerm1b , adjective, nounm1h/e >)
⇒ The only way to get access to other words as m1b, m1h or m1e (e.g. a pre-nominal adjective)
m2a: All words of markable m2 (e.g. <determinerm2b , adjective, nounm2h/e >)
⇒ The only way to get access to other words as m2b, m2h or m2e (e.g. a pre-nominal adjective)
B.2. Attributes
Attributes to the markable keywords m1/m1b/m1e/m1h/m2/m2b/m2e/m2h:
Parts-of-speech: Parts of speech determined by a tagger; here, STTS (appendix A) (e.g. m1h.f0)
Number: The grammatical attribute number: can be unknown, singular, plural or both (e.g. m1h.f1)
Gender: The grammatical attribute gender: can be unknown, neuter, female or male (e.g. m1h.f2)
Case: The grammatical attribute case: can be unknown, nominative, genitive, dative or accusative (e.g.
m1h.f3)
137
Appendix B. The pseudo language for SUCRE’s link feature definition
Person: The grammatical attribute person: can be unknown, first, second, third (e.g. m1h.f4)
Dependency relation: The relation of a markable to its mother in a dependency tree (e.g. m1h.rewtag)
Token number: The token-ID of the respective word in the markable (e.g. m1h.txtpos)
Sentence number: The sentence-ID of the markable (e.g. m1h.stcnum)
Paragraph number: The paragraph-ID of the markable (e.g. m1h.prfnum)
Document number: The document-ID of the markable (e.g. m1h.docnum)
B.3. Arithmetic operations
Given the integers A and B, these operations return an integer:
Addition: A + B
Subtraction: A − B
Multiplication: A ∗ B
Division: A/B
B.4. Arithmetic predicates
Given the integers A and B, these operations return a boolean expression (i.e. TRUE/FALSE resp. 1/0):
Equality: A == B
Inequality: A ! = B
Greater: A > B
Less: A < B
Greater or equal: A >= B
Less or equal: A <= B
B.5. Boolean operations
Given the boolean expressions A and B, these operations return a boolean expression:
Equivalence: A == B
Inequality: A ! = B
Disjunction: A||B
Conjunction: A&&B
Boolean constants: 0, 1
138
B.6. Functions
They have the following semantics:
A
0
0
1
1
B
0
1
0
1
A == B
1
0
0
1
A!=B
0
1
1
0
A||B
0
1
1
1
A&&B
0
0
0
1
Table B.1.: A value table for the defined boolean operators
Further boolean operators can be expressed in terms of the operators above:
• The negation ¬A can be expressed as A == 0.
• The implication A → B can be expressed as ¬A ∨ B (i.e. (A == 0)||B).
B.6. Functions
Functions in the SUCRE link feature definition language return integers or boolean expressions (integers
with the value 1 or 0).
Absolute value: abs(A) returns the absolute value of A (e.g. abs(−5) = 5)
Maximum value: max(A,B) returns the maximum value of A and B (e.g. max(1, 3) = 3)
Minimum value: min(A,B) returns the minimum value of A and B (e.g. min(1, 3) = 1)
Exact string matching: seqmatch(str1,str2) returns TRUE if str1 and str2 are identical,
otherwise FALSE (e.g. seqmatch(Hund,Hund) returns TRUE)
Exact string matching; case-insenstive: seqmatchlc(str1,str2) returns TRUE if str1
and str2 are identical, independent of upper/lower case, otherwise FALSE
(e.g. seqmatchlc(Ich,ich) returns TRUE)
Substring matching: strmatch(str1,str2) returns TRUE if str1 is in str2 or vice versa,
otherwise FALSE (e.g. strmatch(Schäferhund,hund) returns TRUE)
Substring matching; case-insenstive: strmatchlc(str1,str2) returns TRUE if str1 is
in str2 or vice versa, independent of upper/lower case, otherwise FALSE
(e.g. strmatchlc(Schäferhund,Hund) returns TRUE)
Levenshtein distance: editdist(str1,str2) returns the edit distance between str1 and
str2
(e.g. editdist(Hund,Hundes) returns 2)
139
Appendix B. The pseudo language for SUCRE’s link feature definition
140
APPENDIX C
A python script for computing the BLANC-score
#!/usr/bin/env python
# Patrick Ziering
# Date: 10.02.2011
# "pairs" returns a set of non-reflexive unordered pairs for a given set
def pairs(in_set):
return set([tuple(sorted([i,j])) for i in in_set for j in in_set
if i!=j])
# The given example
markables = set(range(1,10)+["A","B","C"])
GOLD = [set(range(1,6)),set([6,7]),set([8,9,"A","B","C"])]
SYS_a = [set(range(1,6)),set([6,7,8,9,"A","B","C"])]
SYS_b = [set([6,7]),set(range(1,6)+[8,9,"A","B","C"])]
SYS_c = [markables]
SYS_d = [set([i]) for i in markables]
# Definition
c_links_true
c_links_pred
d_links_true
d_links_pred
of coreference/disreference link sets
= set([pair for entity in GOLD for pair in pairs(entity)])
= set([pair for entity in SYS_a for pair in pairs(entity)])
= pairs(markables) - c_links_true
= pairs(markables) - c_links_pred
# Combinations of decisions rc, rd, wc, wd:
rc = len(c_links_pred & c_links_true)
rd = len(d_links_pred & d_links_true)
wc = len(c_links_pred - c_links_true)
wd = len(d_links_pred - d_links_true)
# Creating the BLANC-scores
Pc,Rc = float(rc)/(rc+wc),float(rc)/(rc+wd)
F1c = (2*Pc*Rc)/(Pc+Rc)
print "Pc: %f\nRc: %f\nF1c: %f"%(Pc,Rc,F1c)
Pd,Rd = float(rd)/(rd+wd),float(rd)/(rd+wc)
F1d = (2*Pd*Rd)/(Pd+Rd)
print "Pd: %f\nRd: %f\nF1d: %f"%(Pd,Rd,F1d)
print "BLANC: %f"%((F1c+F1d)/2)
141
Appendix C. A python script for computing the BLANC-score
142
APPENDIX D
Upper and lower bounds / evaluation results in (Versley, 2006)
Pmax Rmax
always allow non-resolution
head identity
100.0 54.4
same head
100.0 76.9
uniq_name
100.0 74.3
force resolution
all
27.0
98.7
4gram
31.1
76.6
head identity
52.1
54.4
same_head
49.0
76.9
+agr_num
52.1
76.5
+comp_mod
56.4
71.4
uniq_name
57.1
74.3
+hard_seg(8)
64.9
68.7
+loose_seg(8) 62.8
71.1
include coreferent bridging
no filter
62.3
92.5
+gwn_only
62.3
92.5
filter_ne
61.7
90.1
+gwn only
61.7
90.1
unique_mod
60.7
86.3
+segment
60.6
85.6
+num
60.6
85.6
+gwn
59.8
83.0
+syn_role
59.8
83.0
NE_semdist
59.8
83.0
+pred_arg
59.8
83.0
Pmin
Rmin
Perp
Prec
Recl
0.0
0.0
0.0
0.0
0.0
0.0
1.89
1.98
1.88
62.5
58.3
66.8
38.5
40.5
58.4
0.0
13.3
32.1
33.6
36.3
38.2
40.5
43.8
43.0
0.0
37.5
47.1
59.0
60.4
57.7
61.6
59.0
59.8
23.68
2.28
1.68
1.65
1.62
1.57
1.57
1.61
1.58
1.2
26.3
58.2
51.6
56.0
62.1
62.0
67.8
66.6
4.9
54.7
50.5
69.4
69.7
64.8
68.6
63.2
65.8
14.3
14.3
17.1
17.1
21.2
21.4
21.4
21.7
21.7
21.7
21.7
61.6
61.6
61.6
61.6
61.6
61.6
61.6
61.6
61.6
61.6
61.6
1.42
1.28
1.68
1.31
1.51
1.49
1.49
1.28
1.27
1.27
1.26
62.0
62.0
62.0
62.0
62.0
62.0
62.0
61.7
61.9
61.9
61.9
68.6
68.6
68.6
68.6
68.6
68.6
68.6
69.2
69.5
69.7
70.0
Table D.1.: Upper and lower bounds / evaluation results in (Versley, 2006)
143
Appendix D. Upper and lower bounds / evaluation results in (Versley, 2006)
144
APPENDIX E
All link errors from Chapter 4
E.1. False positives
E.1.1. The second markable is indefinite
(57)
a.
b.
c.
d.
Da dies auch Armin Holz nicht gelungen zu sein scheint ( siehe nebenstehende Rezension )
, bleibt ( Valle-Inclan , der Exzentriker der Moderne )m1 , auch weiterhin ( ein Geheimtip )m2
.
D. N.
Glanzpunkte .[. . . ]
Deutschen Theater , mit der übrigens Ignaz Kirchner ursprünglich sein Debüt als neues
Ensemblemitglied hätte geben sollen , mehr versprochen .
Kirchner wird nun erst im Herbst mit ( einer Langhoff-Inszenierung )m2 seine Arbeit aufnehmen
.
(ID: 218x250); (Calc-Prob:52)
wieder zu schick .[. . . ]
Diese wenigen atmosphärischen Momente lassen ( ein dichtes , interessantes Stück )m2
erahnen , das aber in dieser Fassung weit unter dem Möglichen inszeniert scheint .
(ID: 208x221); (Calc-Prob:52)
Gott guckt uns nicht zu , ( der )m1 hat der Welt längst den Rücken gekehrt “, läßt der
spanische Dichter Ramon del Valle-Inclan einen seiner Helden gleich zu Beginn seiner
Grotske “Wunderworte “erklären .[. . . ]
Laureano ( ein einträgliches Geschäft )m2 hinterläßt , möchte sich so mancher in ihrer
Verwandtschaft gerne vor den Karren spannen , in dem die Verblichene ihren irren Laure-
145
Appendix E. All link errors from Chapter 4
ano von Jahrmarkt zu Jahrmarkt geschoben hatte .
146
e.
Das gefiel dem Hund so gut , daß ( er )m1 unmittelbar hinter der Tür Stellung bezog und
Als ( ein Bekannter des Hundehalters )m2 versuchte , die Wohnung zu räumen , wurde er
(ID: 332x336); (Calc-Prob:56)
f.
In letzter Zeit kümmern sich die Besetzer allerdings nicht mehr sonderlich darum , ob
( eine Wohnung )m1 bewohnt ist oder nicht .[. . . ]
Die Rechtslage , die so entstanden ist , spricht so ziemlich allem Hohn , was sich Juristen
je ausgedacht haben :
Bricht jemand in ( eine Wohnung )m2 ein , wird er in Polen , wie überall auf der Welt , mit
bis zu mehreren Jahren Gefängnis bestraft .
(ID: 447x458); (Calc-Prob:83)
g.
Helena , deren Fall inzwischen von einigen Zeitungen aufgegriffen wurde , kann bis heute
nicht in ( ihre Wohnung )m1 .[. . . ]
bürokratischer Verwicklungen Sozialwohnungen leerstanden , die dann von wilden Mietern besetzt wurden .
In letzter Zeit kümmern sich die Besetzer allerdings nicht mehr sonderlich darum , ob
( eine Wohnung )m2 bewohnt ist oder nicht .
(ID: 427x447); (Calc-Prob:83)
h.
Bei der Polizei erfuhr ( die alte Dame )m1 , daß es sich bei ihrem Fall nicht um ein Vergehen handele , welches von Amts wegen verfolgt werden könne .[. . . ]
( Eine andere alte Dame , der gleiches widerfuhr )m2 , mußte einen Monat auf dem örtlichen
(ID: 389x432); (Calc-Prob:83)
i.
Doch hätte ( die )m1 nicht gezahlt , hätte Helena G. sie auch nicht rauswerfen können .
Denn das darf man nur , wenn man ( eine Ersatzwohnung )m2 ...
Aufgrund des gleichen Paragraphen gibt es in Warschau inzwischen Tausende kommunaler Sozialmieter , die ihre Zahlungen eingestellt haben - Kündigung droht ihnen deshalb
nicht .
(ID: 505x508); (Calc-Prob:52)
j.
bürokratischer Verwicklungen Sozialwohnungen leerstanden , die dann von ( wilden
Mietern )m1 besetzt wurden .[. . . ]
man ( wilde Mieter )m2 auf eigene Faust rauswerfe .
(ID: 442x492); (Calc-Prob:52)
k.
( Helena , deren Fall inzwischen von einigen Zeitungen aufgegriffen wurde )m1 , kann bis
heute nicht in ihre Wohnung .[. . . ]
Bleibt er und klaut dabei auch noch die Wohnung , bleibt er ungeschoren bis ans Ende
seiner Tage .
Rücken und einer Maske vor dem Gesicht das Weite sucht , ist ( eine Erkenntnis )m2 ,
die auch nach Ansicht des polnischen Bürgerombudsmanns die Auffassungsgabe der polnischen Polizei bei weitem übersteigt .
(ID: 425x477); (Calc-Prob:52)
l.
Warschau ( ( taz )m1 ) - Helena G. hatte Pech , daß es ausgerechnet sie traf .[. . . ]
Als sie zurückkam , war sie ihre Wohnung los .
( Eine sogenannte wilde Mieterin )m2 hatte sich dort eingenistet , die Schlösser ausgewechselt , und so war Helena G. draußen und die Neue drin .
(ID: 368x381); (Calc-Prob:52)
m.
( Dieses )m1 gab ihr recht und verurteilte die wilde Mieterin dazu , die Wohnung zu verlassen .[. . . ]
Der Exmissionstitel ist allerdings vergilbt , weil die städtischen Behörden , eigentlich
zuständig , Helena zu ihrem Recht zu verhelfen , es ablehnten einzugreifen .
Begründung : Nach polnischem Mietrecht dürfe man ( einen Mieter )m2 nur aus der Wohnung entfernen , wenn man in der Lage sei , ihm Ersatzraum zur Verfügung zu stellen .
(ID: 400x412); (Calc-Prob:51)
n.
» Leichtfertig « ist , nach Eberhard Diepgens Ansicht , mit den Informationen über die
IOC-Mitglieder umgegangen worden .
So leichtfertig wie er ( das )m1 dahersagt , so freudig wurde es vom Aufsichtsrat der
Marketing GmbH aufgenommen und dem Geschäftsführer ( ein Strick daraus )m2 gedreht
.
Ein Kopf mußte rollen , damit sich die anderen aus der Schlinge ziehen konnten .
(ID: 643x648); (Calc-Prob:51)
o.
Mit einer Mahnwache vor der Innenverwaltung am Fehrbelliner Platz machten etwa zehn
Leute gestern vormittag auf die Lage ( der Flüchtlinge aus dem Kriegsgebiet )m1 aufmerksam .[. . . ]
In einer Petition forderten sie Innensenator Dieter Heckelmann ( CDU ) auf , den Visumszwang für Kriegsflüchtlinge aus allen Teilen des ehemaligen Jugoslawien in Berlin
aufzuheben .
» Heckelmann kann sehr wohl entscheiden , ( Flüchtlinge , die mit dem Flugzeug kommen )m2
, unbürokratisch nach Berlin einreisen zu lassen « , so Christoph Koch , Dozent an der FU
.
(ID: 720x733); (Calc-Prob:83)
p.
Senat soll Visumzwang für ( Kriegsflüchtlinge )m1 aufheben[. . . ]
Leute gestern vormittag auf die Lage der Flüchtlinge aus dem Kriegsgebiet aufmerksam .
In einer Petition forderten sie Innensenator Dieter Heckelmann ( CDU ) auf , den Visumszwang für ( Kriegsflüchtlinge aus allen Teilen des ehemaligen Jugoslawien )m2 in Berlin
aufzuheben .
(ID: 702x727); (Calc-Prob:83)
q.
» Schmerzgekrümmt « reagierte Daimler Benz auf den gestrigen taz-Bericht über die Konflikte um die geplante U-Bahn-Linie 3 .
Man sei nicht gegen die U-Bahn , man würde sie im Gegenteil begrüßen , wenn ( man )m1
einen schnellen Ausbau der Linie garantiert bekomme , sagte ( eine
Unternehmenssprecherin )m2 .
Man wolle jedoch keine Vertagung des U-Bahn-Baus bis ins nächste Jahrtausend und auch
keine Vorhalteröhre , wie es der Senat vorschlage .
(ID: 869x872); (Calc-Prob:50)
r.
Über den sogenannten Häuslebauerparagraphen 10e werden Erwerber von Eigentumswohnungen - auch bewohnten - großzügige Steuererleichterungen eingeräumt , als ob sie
147
s.
t.
u.
v.
w.
x.
148
( eine neue Wohnung )m1 schaffen würden .[. . . ]
Von dieser Regelung profitiert bespielsweise auch die Bundesbauministerin Irmgard
Schwaetzer ( FDP ) selbst mit ihren circa 15.000 Mark monatlichen Bruttoeinkommen
und ihrem nicht unerheblichen Steuersatz .
Die Ministerin besaß bis Anfang 1991 ( eine Wohnung in der Bonner Riemannstraße )m2
in Bonn , mit deren Erwerb sie bis zu tausend Mark Steuern im Monat sparen konnte .
(ID: 1092x1103); (Calc-Prob:83)
Zwischen 60 und 70 Prozent der Vermieter , schätzt Hanka Fiedler , wollen nur den Mieter
herausbekommen , um ( die Wohnung , die dann im Preis steigt )m1 , besser verkaufen zu
können .[. . . ]
Und dies alles wird vom Steuerzahler noch bezuschußt .
Über den sogenannten Häuslebauerparagraphen 10e werden Erwerber von Eigentumswohnungen - auch bewohnten - großzügige Steuererleichterungen eingeräumt , als ob sie
( eine neue Wohnung )m2 schaffen würden .
(ID: 1067x1092); (Calc-Prob:83)
Eine leere Eigentumswohnung bringt hingegen unverändert zwischen 4.000 und 5.000
Mark - ( den Mieter )m1 herauszuklagen , lohnt sich da schon .[. . . ]
Und anders geht es auch kaum .
» ( Einen vertragestreuen Mieter )m2 kriegen Sie heutzutage nur über eine Eigenbedarfskündigung raus « , heißt es in Hausbesitzerkreisen .
(ID: 1078x1081); (Calc-Prob:83)
Was mit ( diesen Mietern )m1 geschieht , kann man in West-Berlin ablesen .[. . . ]
Wenn die Kündigung nicht zieht , macht der Vermieter eben nächtlichen Telefonterror ,
schickt Bauarbeiter in Haus , kündigt Mieterhöhungen an oder veranstaltet allwöchentliche
Besichtigungstouren von potentiellen Käufern durch die Wohnung .
Außerdem , erzählt Frau Fiedler , gebe es häufig westdeutsche Hausbesitzer , die
( renitenten Mietern )m2 drohten , nach Berlin zu ziehen und dann eben Eigenbedarf anzumelden
.
(ID: 985x1042); (Calc-Prob:83)
Nur zehn Prozent der umgewandelten Wohnungen sind an die dort wohnenden Mieter
verkauft worden , ein Drittel der Mieter hat ( die Wohnung )m1 verlassen müssen , viele
Viele Mieter ziehen entweder vorzeitig entnervt aus oder lassen sich auf einen Vergleich
ein .
Denn Vermieter , die ( eine Wohnung )m2 freibekommen wollen , lassen sich einiges einfallen , um die Mieter herauszuekeln , berichtet Frau Fiedler .
(ID: 992x1022); (Calc-Prob:83)
Nur zehn Prozent der umgewandelten Wohnungen sind an die dort wohnenden Mieter
verkauft worden , ( ein Drittel der Mieter )m1 hat die Wohnung verlassen müssen , viele
Denn Eigentumswohnungen werden , so der Ring Deutscher Makler in seiner neuesten
Bilanz , hauptsächlich für eigene Wohnzwecke und weniger stark von Anlegern gekauft .
( Ein Drittel der Mieter )m2 muß gehen
(ID: 991x1007); (Calc-Prob:83)
Nun steht es still - und alle Welt wundert sich , daß ( es )m1 sich nicht von der Stelle bewegt hat .
Bei der hohen Drehgeschwindigkeit - immer um den Mittelpunkt jenes 24 Milliarden Dollar schweren Hilfspakets für Rußland - ist es eher ( ein Wunder )m2 , daß eine neue Einsicht dennoch aufspringen konnte : die Erkenntnis , daß sich mit einem IWFStandardprogramm die Schwierigkeiten Rußlands nicht Schlag auf Schlag lösen lassen ,
y.
z.
wie Boris Jelzin den Kohls und Bushs klarmachen konnte .
(ID: 8196x8204); (Calc-Prob:50)
Sieben der 24 Mannschaften werden nach ( dieser Saison )m1 ihre regionalen Oberligen
verstärken .[. . . ]
Eben .
Wir stehen vor ( einer großen Saison )m2 .
(ID: 8706x8726); (Calc-Prob:83)
Er sei zwar nicht dafür , alles zu konservieren , aber » ein bißchen von ( dem )m1 , was
gewesen ist , sollten wir erhalten « .[. . . ]
Berlin sei schließlich der einzige Ort in der Welt , wo » die Historie an jeder Ecke noch
atmet « .
( Eine Feststellung , der David Cornell wohl auch zustimmen würde )m2 .
(ID: 1443x1453); (Calc-Prob:51)
E.1.2. Wrong assignment of a relative pronoun
(58)
a.
b.
c.
d.
Der neben Garcia Lorca bedeutendste spanische Dramatiker des 20. Jahrhunderts wurde
für das deutschsprachige Theater spät entdeckt .
Erst in den 70er Jahren entstanden Übersetzungen und ( Inszenierungen )m1 , ( die )m2
jedoch für die grell-grausigen aber poetischen Stücke keine überzeugenden Lösungen fanden .
) , bleibt Valle-Inclan , der Exzentriker der Moderne , auch weiterhin ein Geheimtip .
esperpentos heraus , Schauerpossen , die die von Leidenschaft und Gewalt deformierte
Gesellschaft wie durch einen Zerrspiegel betrachten .
Zu ( diesem Genre )m1 gehören neben den Wunderworten ( ( die )m2 im Original als Tragikomödie
.
Es sind sperrige , sprachgewaltige Grotesken , die Mystik und Mythen karikieren und eine
erhebliche Fortschrittsskepsis ausdrücken .
.
(ID: 242x243); (Calc-Prob:51)
Bonn ( dpa ) - Unter dem Motto “Volkswagen für Volksvertreter “hat der Jenaer SPDBundestagsabgeordnete Christoph Matschie eine Umrüstung der Fahrbereitschaft des Bundestages gefordert .
Die knapp 100 Autos umfassende Flotte bestehe fast ausschließlich aus BMW- und
149
e.
f.
g.
h.
i.
j.
150
( Mercedes-Limousinen )m1 , ( die )m2 durch umweltverträglichere Fahrzeuge ersetzt werden sollten .
Um ein Signal für umweltbewußtes Handeln zu setzen , sollten die Abgeordneten auf den
Diesel-Golf umsteigen , der 5,5 Liter Kraftstoff auf 100 Kilometer benötige .
(ID: 307x309); (Calc-Prob:51)
An diesem Zustand hat sich seither nichts geändert .
Bei der Polizei erfuhr die alte Dame , daß ( es )m1 sich bei ihrem Fall nicht um ein Vergehen handele , ( welches )m2 von Amts wegen verfolgt werden könne .
(ID: 390x394); (Calc-Prob:53)
Aufgrund des gleichen Paragraphen gibt es in Warschau inzwischen Tausende ( kommunaler
Sozialmieter )m1 , ( die )m2 ihre Zahlungen eingestellt haben - Kündigung droht ihnen
deshalb nicht .
.
(ID: 512x514); (Calc-Prob:51)
Die Realität noch viel grauenerregender , als es sich der wildeste Hammerwerfer in seinen
perversesten Träumen ersinnen könnte ?
Die Briten Vyv Simson und Andrew Jennings haben ihre Version vom olympischen Märchen
unter dem Titel Geld , Macht und ( Doping )m1 auch schon als Buch veröffentlicht ,
( was )m2 dem IOC-Präsidenten Juan Antonio Samaranch wenig gefallen hat .
Der 71jährige Ober-Olympier kommt aber auch wirklich nicht gut weg : Nicht nur ,
daß er die ihm anvertrauten Coubertinschen Ideale verraten und auf ihren Trümmern ein
Wirtschaftsunternehmen aufgebaut hat , kreiden ihm die Briten an .
(ID: 552x554); (Calc-Prob:51)
Der Aufsichtsrat der Olympia Marketing GmbH ernannte ihn am Donnerstag abend überraschend zum Geschäftsführer der Firma .
( Er )m1 löst auf dem Posten Nikolaus Fuchs ab , ( der )m2 wegen der Intimdatei über
IOC-Mitglieder seinen Hut nehmen mußte .
Fuchs ist geschäftsführender Gesellschafter der Bossard Consultants , die die Datenmasken als » Spielmaterial « für die Olympia GmbH gefertigt hatte .
(ID: 787x790); (Calc-Prob:53)
hinter der Mauer ausgehalten hat « , einen gewaltigen Trumpf in der Hand , wie er erst
kürzlich wieder feststellen konnte : ( Ein schwedischer Gast , den er durch den Ostteil
führte )m2 , sei von den Einschußlöchern an den Häusern » ganz fasziniert « gewesen .
(ID: 1430x1439); (Calc-Prob:53)
hinter der Mauer ausgehalten hat « , ( einen gewaltigen Trumpf )m2 in der Hand , wie er
erst kürzlich wieder feststellen konnte : Ein schwedischer Gast , den er durch den Ostteil
führte , sei von den Einschußlöchern an den Häusern » ganz fasziniert « gewesen .
k.
l.
m.
n.
o.
(ID: 1430x1433); (Calc-Prob:53)
Sabine Schröder , Sprecherin der Hotel- und Gaststätteninnung , glaubt , daß die Mauer »
ein Selbstläufer « gewesen und in den vergangenen zwei Jahren zu wenig in die Werbung
für Berlin investiert worden sei .
Hoffnung setzt ( sie )m1 auf die geplante » Tourismus GmbH « , ( die )m2 noch in diesem
Jahr eingerichtet werden soll .
Hauptstütze wird mit 60 bis 70 Prozent die Privatwirtschaft sein - den Rest trägt der Senat
.
(ID: 1308x1309); (Calc-Prob:51)
Mit dem routinierten Blick auf die Uhr , » es ist 1.30 h und der Flug zum nächsten Festival startet bereits in 5 Stunden « - energiegeladen , unaufhörlich gute Laune verbreitend ,
selbst wenn seine Witze kaum Eckkneipenniveau erreichen , lädt Paquito in der Tradition
der Afro-Cuban-Entertainment-Schule seines Altmeisters Dizzy Gillespie zum Abend der
offenen Tür : pure Kommunikation der wirksamen Art .
Im Gepäck hat er den 22jährigen Pianisten Ed Simon , ( der )m1 ( einem verliebten Jungen )m2
gleicht und mit flüchtigen Seitenblicken auf seinen Mentor den Club zum brodeln bringt .
» Ich bin sehr jung und glaube , daß ich mich noch selbst finden muß « , sagt der introvertierte Tastenromantiker venezuelanischer Herkunft , der zugleich Mitglied von Bobby
Watson’s Post-Motown Bop Band Horizon ist , Herbie Mann begleitet und mit Kevin Eubanks oder der M-Base Gruppe um Greg Osby funkt , » je unterschiedlicher die Musik ist
, die ich mache , desto offener werde ich « .
(ID: 1563x1564); (Calc-Prob:53)
Mit dem routinierten Blick auf die Uhr , » es ist 1.30 h und der Flug zum nächsten Festival startet bereits in 5 Stunden « - energiegeladen , unaufhörlich gute Laune verbreitend ,
selbst wenn seine Witze kaum Eckkneipenniveau erreichen , lädt Paquito in der Tradition
der Afro-Cuban-Entertainment-Schule seines Altmeisters Dizzy Gillespie zum Abend der
offenen Tür : pure Kommunikation der wirksamen Art .
Im Gepäck hat ( er )m1 den 22jährigen Pianisten Ed Simon , ( der )m2 einem verliebten
Jungen gleicht und mit flüchtigen Seitenblicken auf seinen Mentor den Club zum brodeln
bringt .
» Ich bin sehr jung und glaube , daß ich mich noch selbst finden muß « , sagt der introvertierte Tastenromantiker venezuelanischer Herkunft , der zugleich Mitglied von Bobby
Watson’s Post-Motown Bop Band Horizon ist , Herbie Mann begleitet und mit Kevin Eubanks oder der M-Base Gruppe um Greg Osby funkt , » je unterschiedlicher die Musik ist
, die ich mache , desto offener werde ich « .
(ID: 1561x1563); (Calc-Prob:53)
In erster Instanz hatte das Amtsgericht Kreuzberg das Räumungsbegehren im März abgewiesen
.
In der Zwischenzeit war ein Grundsatzurteil des Bundesverfassungsgerichts für Bauherrenmodelle ergangen , laut dem ein Untermieter , ( dem )m1 eine Wohnung von ( einem
gewerblichen Zwischenmieter )m2 vermietet wurde , vollen Kündigungsschutz genießt .
Das Amtsgericht erkannte an , daß diese Rechtsauffassung für sämtliche gewerbliche Untermietsverhältnisse , die Wohnmietverhältnisse sind , gelte .
(ID: 11545x11547); (Calc-Prob:53)
.
In der Zwischenzeit war ein Grundsatzurteil des Bundesverfassungsgerichts für Bauherrenmodelle ergangen , laut ( dem )m1 ein Untermieter , dem ( eine Wohnung )m2 von
151
p.
q.
r.
s.
einem gewerblichen Zwischenmieter vermietet wurde , vollen Kündigungsschutz genießt
.
(ID: 11544x11546); (Calc-Prob:52)
.
In der Zwischenzeit war ein Grundsatzurteil des Bundesverfassungsgerichts für Bauherrenmodelle ergangen , laut ( dem )m1 ( ein Untermieter , dem eine Wohnung von einem
gewerblichen Zwischenmieter vermietet wurde )m2 , vollen Kündigungsschutz genießt .
(ID: 11544x11548); (Calc-Prob:52)
Und in der Tat läßt sich schwer vorstellen , wie das bisherige Land Brandenburg als
umgebender Rand einer wachsenden Metropole eine eigenständige Existenz weiterführen
könnte .
Auch für die Stadt dürfte eine Konstruktion , ( die )m1 ( eine einheitliche Planung für den
Stadtkern , den Stadtrand , die nähere und die weitere Umgebung )m2 möglich macht , im
wesentlichen Vorteile haben .
Überhaupt nicht bedacht wurde bisher das Binnenverhältnis , das sich bei einer einfachen
Zusammenlegung der beiden bisherigen Bundesländer zwischen der Metropole Berlin und
dem neuen Bundesland Brandenburg-Berlin ergeben würde .
(ID: 11728x11732); (Calc-Prob:53)
Ein » Zweckverband Berlin und Umland « ist auf die Dauer in Sachen Verkehr , Energieversorgung , Wohnungsbau und Wirtschaftsplanung in jedem Fall erforderlich .
Wenn die 17 äußeren Bezirke des bisherigen Berlins nicht zu Berlin , sondern mit Potsdam , Nauen , Oranienburg , Bernau , Strausberg , Königs Wusterhausen , Zossen und
anderen zum Umland zählten und entsprechenden Einfluß hätten , wäre das den Interessen
aller Bürgerinnen und Bürger bestimmt dienlicher als die heutige Konstruktion , ( die )m1
gegenüber den Gemeinden des Umlandes nur auf ( ein Diktat der Metropole )m2 hinauslaufen würde .
Wenn die Ausdehnung der Stadtgebiete und die Zentralisierung der Verwaltungen Kennzeichen des Fortschritts sind , warum sind dann Offenbach und Hanau , Rüsselsheim und
Eschborn , Kronberg und Oberursel noch nicht längst in Frankfurt eingemeindet ?
(ID: 12076x12080); (Calc-Prob:52)
Es gibt keine gesellschaftliche Kraft , die die Vorteile eines gemeinsamen Landes nicht
herausstellt .
Allein schon vom Blick auf die Landkarte , in ( der )m1 die große Stadt Berlin mitten im
Land Brandenburg liegt , macht überdeutlich , daß es in Zukunft ( eine sehr enge
wirtschaftliche , verkehrsmäßige , kulturelle , bildungsmäßige Verflechtung der beiden
Länder )m2 geben wird .
(ID: 12130x12138); (Calc-Prob:53)
E.1.3. Relative proximity in context
(59)
152
a.
Gott guckt uns nicht zu , der hat der Welt längst den Rücken gekehrt “, läßt der spanische Dichter Ramon del Valle-Inclan einen seiner Helden gleich zu Beginn ( seiner )m1
Grotske “Wunderworte “erklären .
b.
c.
d.
e.
f.
Frömmigkeit ( seiner )m2 Bewohner nicht verloren haben .
Laureano ein einträgliches Geschäft hinterläßt , möchte sich so mancher in ihrer Verwandtschaft gerne vor den Karren spannen , in dem die Verblichene ihren irren Laureano
von Jahrmarkt zu Jahrmarkt geschoben hatte .
zu ( seinen )m1 Schauspielern auf die Bühne kam , unter die wenigen Bravo-Rufe auch
lautstarke Unmutsbekundungen .
Deutschen Theater , mit der übrigens Ignaz Kirchner ursprünglich ( sein )m2 Debüt als
neues Ensemblemitglied hätte geben sollen , mehr versprochen .
.
(ID: 232x245); (Calc-Prob:53)
kann .
(ID: 156x179); (Calc-Prob:53)
In der Inszenierung vn Armin Holz in den Kammerspielen des Deutschen Theaters scheint
dieser bittere Kern hinter viel Regie-Schnickschnack wieder zu verschwinden .
Als habe ( er )m1 von Anfang an klarstellen wollen , daß hier lediglich eine “Schauperposse “zu sehen ist , läßt Armin Holz den Figuren des Stücks gar keine Gelegenheit ,
( sich )m2 erst einmal sinnhaft vorzustellen .
kann .
(ID: 156x163); (Calc-Prob:52)
Als ( die Witwe Juana la Reina )m1 plötzlich auf offener Straße stirbt und ihrem irren
Sohn Laureano ein einträgliches Geschäft hinterläßt , möchte ( sich )m2 so mancher in
ihrer Verwandtschaft gerne vor den Karren spannen , in dem die Verblichene ihren irren
Laureano von Jahrmarkt zu Jahrmarkt geschoben hatte .
Die Totenklage der trauernden Familie mischt sich dann auch schnell mit munteren Jubelgesängen .
Das gefiel dem Hund so gut , daß ( er )m1 unmittelbar hinter der Tür Stellung bezog und
153
Als ein Bekannter des Hundehalters versuchte , die Wohnung zu räumen , wurde ( er )m2
(ID: 332x338); (Calc-Prob:83)
154
g.
Hamburg ( ap ) - Ein zwei Jahre alter Schäferhund namens “Prinz “hat im Hamburger
Stadtteil Altona eine Wohnung besetzt .
( Der 24jährige Besitzer )m1 hatte dem Tier am Vortag ( sein )m2 zukünftiges Heim gezeigt
.
(ID: 325x328); (Calc-Prob:51)
h.
Bei der Polizei erfuhr die alte Dame , daß es sich bei ( ihrem Fall )m1 nicht um ein Vergehen handele , welches von Amts wegen verfolgt werden könne .[. . . ]
Da die Stadt keinen habe , dürfe sie das Urteil auch nicht exekutieren .
Helena , ( deren Fall )m2 inzwischen von einigen Zeitungen aufgegriffen wurde , kann bis
heute nicht in ihre Wohnung .
(ID: 393x423); (Calc-Prob:83)
i.
Rücken und einer Maske vor dem Gesicht das Weite sucht , ist eine Erkenntnis , die auch
nach Ansicht des polnischen Bürgerombudsmanns die Auffassungsgabe der polnischen
Polizei bei weitem übersteigt .
Inzwischen klagt ( dieser )m1 beim Obersten Gerichtshof , ( dessen )m2 Richter vorsichtshalber auch gleich in Urlaub gefahren sind .
man wilde Mieter auf eigene Faust rauswerfe .
(ID: 484x485); (Calc-Prob:53)
j.
Eine andere alte Dame , der gleiches widerfuhr , mußte einen Monat auf dem örtlichen
(ID: 428x435); (Calc-Prob:53)
k.
Da die Stadt keinen habe , dürfe ( sie )m1 das Urteil auch nicht exekutieren .[. . . ]
Helena , deren Fall inzwischen von einigen Zeitungen aufgegriffen wurde , kann bis heute
nicht in ihre Wohnung .
( Sie )m2 hat dabei noch Glück gehabt .
(ID: 420x428); (Calc-Prob:53)
l.
Bei der Polizei erfuhr die alte Dame , daß es sich bei ihrem Fall nicht um ( ein Vergehen
handele , welches von Amts wegen verfolgt werden könne )m1 .[. . . ]
( Dieses )m2 gab ihr recht und verurteilte die wilde Mieterin dazu , die Wohnung zu verlassen .
(ID: 396x400); (Calc-Prob:52)
m.
Doch hätte die nicht gezahlt , hätte ( Helena G. )m1 sie auch nicht rauswerfen können
.[. . . ]
Aufgrund des gleichen Paragraphen gibt es in Warschau inzwischen Tausende kommunaler Sozialmieter , die ( ihre )m2 Zahlungen eingestellt haben - Kündigung droht ihnen
deshalb nicht .
(ID: 506x515); (Calc-Prob:51)
n.
Aufgrund des gleichen Paragraphen gibt es in Warschau inzwischen Tausende kommunaler Sozialmieter , die ihre Zahlungen eingestellt haben - Kündigung droht ( ihnen )m1
deshalb nicht .[. . . ]
.
Jemanden vor die Tür setzen dürfen ( die )m2 allerdings auch nicht .
(ID: 518x523); (Calc-Prob:50)
o.
Als ob das irgend etwas mit Sport zu tun hätte , daß Hans Anton ein paar Jährchen
in faschistischer Uniform rumgekaspert ist und ( seine )m1 Briefe mit “es grüßt mit erhobenem Arm “unterschrieben hat .
Und die Geschichte mit dem inzwischen verblichenen Adidas-Chef Horst Dassler , der als
erster erkannt hatte , daß man umso mehr Turnschuhe verkauft , je größer der Einfluß auf
das IOC ist , und er deshalb ( seinen )m2 Spezl Juan 1980 an die Spitze des Vereins boxte
?
Na bittschön , Vollbeschäftigung in Herzogenaurach , und irgendwelche Schuhe müssen
die Sportler ja anziehen !
(ID: 574x583); (Calc-Prob:53)
p.
An finanzieller Potenz und Aktionsradius wird sie ( ihr )m2 bald überlegen sein .
Damit vollzieht sich , was Geschäftsführer Nawrocki immer wollte : Die Unternehmerschaft wird zunehmend das Sagen haben , die öffentliche Hand hat allenfalls für die
notwendige Infrastruktur zu sorgen .
(ID: 673x680); (Calc-Prob:53)
q.
Der Fuchs hat seine Schuldigkeit getan , ( der Fuchs )m1 kann gehn .[. . . ]
» Leichtfertig « ist , nach Eberhard Diepgens Ansicht , mit den Informationen über die
IOC-Mitglieder umgegangen worden .
So leichtfertig wie ( er )m2 das dahersagt , so freudig wurde es vom Aufsichtsrat der
Marketing GmbH aufgenommen und dem Geschäftsführer ein Strick daraus gedreht .
(ID: 638x642); (Calc-Prob:52)
r.
Daß ( Nawrocki )m1 von dieser bigotten Inszenierung profitiert , ist weder sein Verdienst
An finanzieller Potenz und Aktionsradius wird sie ihr bald überlegen sein .
(ID: 668x673); (Calc-Prob:51)
s.
Doch ( der )m1 ist schon längst eingetreten , denn die Olympia-Gerontokraten werden
kaum verzeihen , daß öffentlich wurde , worauf eine jede Bewerbungsstrategie fußt : daß
sie korrumpierbar sind .[. . . ]
Daß » sexuelle Neigungen « zur Zielpalette der » persönlichen Ansprache « gehören ,
mag dem die Krone aufsetzen , doch lenkt dieser Umstand eher von der Normalität dieser
Bestechlichkeit ab .
Daß Nawrocki von dieser bigotten Inszenierung profitiert , ist weder ( sein Verdienst )m2
(ID: 657x671); (Calc-Prob:50)
t.
( Der Aufsichtsrat der Olympia Marketing GmbH )m1 ernannte ihn am Donnerstag abend
überraschend zum Geschäftsführer der Firma .
Er löst auf dem Posten Nikolaus Fuchs ab , der wegen der Intimdatei über IOC-Mitglieder
( seinen )m2 Hut nehmen mußte .
155
u.
v.
w.
x.
y.
z.
156
(ID: 782x793); (Calc-Prob:51)
Die Alternativplanung des Senats sei teurer und schwieriger .
( Die AL-Abgeordnete Michaele Schreyer )m1 erklärte , daß ( sich )m2 das Land Berlin
damals nicht die entsprechenden Rechte gesichert habe , sei eine bewußte Entscheidung
des damaligen Bürgermeisters Momper gewesen .
OBERBAUMBRÜCKE
(ID: 881x882); (Calc-Prob:51)
» So ein Käufer weiß ja , worauf ( er )m1 sich einläßt « , begründet dies Nagel .
Kündigt der Vermieter und nutzt nachher die Wohnung nicht selbst , muß ( er )m2 nachweisen , daß die Kündigung nicht mißbräuchlich war - bisher liegt die Beweislast beim
Mieter .
Dann wird eventuell Schadensersatz fällig .
(ID: 1149x1154); (Calc-Prob:83)
Nur die FDP und ( ihre )m1 Ministerin wehren sich mit Händen und Füßen dagegen .
Der Mieterschutz , so verkündeten Irmgard Schwaetzer und ( ihre )m2 FDP-Kollegin Sabine
Leuthheuser-Schnarrenberger aus dem Justizressort , reiche aus .
Das sieht man in Berlin anders .
(ID: 1129x1136); (Calc-Prob:83)
In der Folge steigen auch ( die Transportkosten )m1 um das 16- bis 22fache .[. . . ]
Nur die von Monopolen geprägte Industriestruktur wird so statisch bleiben wie sie ist , und
damit auch das Preisdiktat , das lediglich von den Planbehörden direkt zu den Monopolbetrieben verschoben wurde .
Das Schwindelgefühl könnte bei den G-7-Herren nach dem Absteigen vom Gipfelkarussell zurückkehren , wenn ( sie )m2 weitere Fakten fest in den Blick nehmen - wie die Zunahme der Tauschgeschäfte auf 60 bis 70 Prozent des Geschäftsvolumens aller Betriebe
oder das um elf Prozent sinkende Bruttosozialprodukt .
(ID: 8325x8338); (Calc-Prob:52)
Daß sich die Nachfolgerepubliken der UdSSR bereits im Oktober zügig auf eine Neudefinition der Rubelzone und die Aufteilung der Altschulden einigen werden , dürfte von den
russischen Realitäten weit abgehobenes IWF-Wunschdenken sein .
Und Stufe drei des Plans , in ( der )m1 endlich der sechs Milliarden Dollar teure RubelStabilisierungsfonds zum Einsatz kommen soll , ist qua Programm auf den St. Nimmerleinstag verschoben : Voraussetzung sei , so Camdessus , daß die wirtschaftliche Entwicklung ( sich )m2 stabilisiere .
Die jedoch schlingert auf Abwärtskurs .
(ID: 8295x8305); (Calc-Prob:52)
.
Bei der hohen Drehgeschwindigkeit - immer um den Mittelpunkt jenes 24 Milliarden Dollar schweren Hilfspakets für Rußland - ist es eher ein Wunder , daß ( eine neue Einsicht )m1
dennoch aufspringen konnte : die Erkenntnis , daß ( sich )m2 mit einem IWFStandardprogramm die Schwierigkeiten Rußlands nicht Schlag auf Schlag lösen lassen ,
(ID: 8205x8206); (Calc-Prob:52)
E.1.4. Reflexive pronouns and non-subjects
(60)
a.
b.
c.
d.
e.
f.
g.
Man hatte ( sich )m1 am Ende ( einer erfolgreichen Spielzeit )m2 von dieser letzten Premiere im Deutschen Theater , mit der übrigens Ignaz Kirchner ursprünglich sein Debüt
als neues Ensemblemitglied hätte geben sollen , mehr versprochen .
.
(ID: 239x240); (Calc-Prob:51)
Und die Behörden reagieren darauf genau so indolent wie im Fall Helena G.
Die Rechtslage , die so entstanden ist , spricht ( so ziemlich allem )m1 Hohn , was ( sich )m2
Juristen je ausgedacht haben :
Bricht jemand in eine Wohnung ein , wird er in Polen , wie überall auf der Welt , mit bis
zu mehreren Jahren Gefängnis bestraft .
(ID: 452x455); (Calc-Prob:53)
Uaaaaaah !
Die Realität noch viel grauenerregender , als ( es )m1 ( sich )m2 der wildeste Hammerwerfer in seinen perversesten Träumen ersinnen könnte ?
Die Briten Vyv Simson und Andrew Jennings haben ihre Version vom olympischen Märchen
unter dem Titel Geld , Macht und Doping auch schon als Buch veröffentlicht , was dem
IOC-Präsidenten Juan Antonio Samaranch wenig gefallen hat .
(ID: 541x542); (Calc-Prob:52)
Die Herren der Ringe , ARD , Do. , 23 Uhr
Haben nun also Edwin Kleins Bitterer Sieg mit einem wohligen Gruseln , aber doch in der
festen Überzeugung studiert , es handele ( sich )m1 um ( einen Roman )m2 , mithin um
Fiktion !
Und was müssen wir nun mitbekommen ?
(ID: 535x536); (Calc-Prob:51)
» Heckelmann kann sehr wohl entscheiden , Flüchtlinge , die mit dem Flugzeug kommen
, unbürokratisch nach Berlin einreisen zu lassen « , so Christoph Koch , Dozent an der FU
.
Der Innensenator solle ( sich )m1 bei der Innenministerkonferenz für ( eine Aufhebung des
Visumszwangs an den deutschen Grenzen )m2 einsetzen .
Außerdem fordert die Initiative einen sofortigen Abschiebestopp .
(ID: 738x742); (Calc-Prob:51)
Die Alternativplanung des Senats sei teurer und schwieriger .
Die AL-Abgeordnete Michaele Schreyer erklärte , daß ( sich )m1 das Land Berlin damals
nicht die entsprechenden Rechte gesichert habe , sei ( eine bewußte Entscheidung des
damaligen Bürgermeisters Momper )m2 gewesen .
OBERBAUMBRÜCKE
(ID: 882x886); (Calc-Prob:51)
Wirtschaft .
Beim Haushaltsdefizit orientierten ( sich )m1 die amtlichen Statistiker noch an der GaidarCamdessus-Vereinbarung : Es soll ( sich )m2 1992 auf deutlich weniger als die festgeschriebenen fünf Prozent , nämlich 2,3 Prozent des Bruttosozialprodukts , belaufen .
Gegenwärtig jedoch liegt es bei 17 Prozent .
(ID: 8248x8252); (Calc-Prob:83)
157
h.
i.
j.
k.
l.
m.
158
Das Münchner Gipfelkarussell hat sich mit hoher Geschwindigkeit gedreht .
Nun steht es still - und alle Welt wundert ( sich )m1 , daß es ( sich )m2 nicht von der Stelle
bewegt hat .
Bei der hohen Drehgeschwindigkeit - immer um den Mittelpunkt jenes 24 Milliarden Dollar schweren Hilfspakets für Rußland - ist es eher ein Wunder , daß eine neue Einsicht dennoch aufspringen konnte : die Erkenntnis , daß sich mit einem IWF-Standardprogramm
die Schwierigkeiten Rußlands nicht Schlag auf Schlag lösen lassen , wie Boris Jelzin den
Kohls und Bushs klarmachen konnte .
(ID: 8195x8197); (Calc-Prob:83)
Es wurde , wie der russische Wirtschaftsstar Gregori Jawlinski anläßlich des Gipfels erinnerte , bereits 1990 und 1991 Michail Gorbatschow als Kreditlinie eingeräumt .
Der Kern , um ( den )m1 das Karussell ( sich )m2 drehte , löst sich somit auf .
Hat es den Münchner Rummel überhaupt gegeben ?
(ID: 8376x8378); (Calc-Prob:52)
Allein der Weg dahin ist weit und mühsam und dauert ganz offensichtlich länger als jene
“wenigen Wochen “, die Camdessus für die Regelung der Eigentums- und Investitionsfragen vorsieht .
Daß ( sich )m1 die Nachfolgerepubliken der UdSSR bereits im Oktober zügig auf ( eine
Neudefinition der Rubelzone )m2 und die Aufteilung der Altschulden einigen werden ,
dürfte von den russischen Realitäten weit abgehobenes IWF-Wunschdenken sein .
Und Stufe drei des Plans , in der endlich der sechs Milliarden Dollar teure RubelStabilisierungsfonds zum Einsatz kommen soll , ist qua Programm auf den St. Nimmerleinstag verschoben : Voraussetzung sei , so Camdessus , daß die wirtschaftliche Entwicklung sich stabilisiere .
(ID: 8284x8289); (Calc-Prob:51)
.
Bei der hohen Drehgeschwindigkeit - immer um den Mittelpunkt jenes 24 Milliarden Dollar schweren Hilfspakets für Rußland - ist es eher ein Wunder , daß eine neue Einsicht
dennoch aufspringen konnte : die Erkenntnis , daß ( sich )m1 mit ( einem IWFStandardprogramm )m2 die Schwierigkeiten Rußlands nicht Schlag auf Schlag lösen lassen
, wie Boris Jelzin den Kohls und Bushs klarmachen konnte .
(ID: 8206x8207); (Calc-Prob:51)
Offenbar schien es nicht im Interesse Radio Bremens zu sein , deutlich zu machen , warum
die Lesben / Frauen so zahlreich ihren Unmut kundtaten .
Die Sendung mutete sich als eine hysterische Stimmungsmache an , in der ( sich )m1 Frau
Roggendorf im Vorfeld der ganzen Affäre verbal zu verteidigen suchte und die Lesben /
Frauen gleich mit Buttersäure um ( sich )m2 warfen , um ihr Dasein als “kleine radikale
Minderheit “unter Beweis zu stellen .
Der benannte Buttersäureanschlag war eine Reaktion auf eine Veranstaltung unter der
Leitung von Egbert Richter im “Ambrosia “, “Sexualität und Wohngemeinschaft “.
(ID: 2717x2725); (Calc-Prob:83)
Offenbar schien es nicht im Interesse Radio Bremens zu sein , deutlich zu machen , warum
die Lesben / Frauen so zahlreich ihren Unmut kundtaten .
Die Sendung mutete ( sich )m1 als eine hysterische Stimmungsmache an , in der ( sich )m2
Frau Roggendorf im Vorfeld der ganzen Affäre verbal zu verteidigen suchte und die Les-
n.
ben / Frauen gleich mit Buttersäure um sich warfen , um ihr Dasein als “kleine radikale
Minderheit “unter Beweis zu stellen .
Der benannte Buttersäureanschlag war eine Reaktion auf eine Veranstaltung unter der
Leitung von Egbert Richter im “Ambrosia “, “Sexualität und Wohngemeinschaft “.
(ID: 2715x2717); (Calc-Prob:83)
Sie alle versuchen ihren eigenen Weg zu gehen , aber am Schluß reißt der Strudel der
Maueröffnung ihnen den Boden unter den Füßen weg .
Es gibt keine DDR mehr , in ( der )m1 man ( sich )m2 einrichten , oder für die man sich
engagieren kann .
Der schmerzhafte Entschluß zur Flucht , der Vertrauensbruch mit denen , die blieben , hat
plötzlich keinen Sinn mehr .
(ID: 1841x1842); (Calc-Prob:52)
E.1.5. Problems with substring-matches
(61)
a.
b.
c.
d.
e.
f.
niemanden mehr durchließ .[. . . ]
Herrchen wollte ( den Hundefänger )m2 holen .
(ID: 331x345); (Calc-Prob:52)
(ID: 331x335); (Calc-Prob:52)
Um so mehr , als man das Absurde an dieser Praxis noch auf die Spitze treiben kann .
Im Fall Helena G. verurteilte das Gericht ( die wilde Mieterin )m1 zur Zahlung von ( Miete )m2
.
Doch hätte die nicht gezahlt , hätte Helena G. sie auch nicht rauswerfen können .
(ID: 502x503); (Calc-Prob:52)
In ( letzter Zeit )m1 kümmern sich die Besetzer allerdings nicht mehr sonderlich darum ,
ob eine Wohnung bewohnt ist oder nicht .[. . . ]
Einer von ihnen erklärte ( einer Zeitung )m2 schon mal anonym , es sei durchaus rechtens
, wenn man wilde Mieter auf eigene Faust rauswerfe .
(ID: 444x490); (Calc-Prob:52)
faschistischer Uniform rumgekaspert ist und seine Briefe mit “es grüßt mit ( erhobenem Arm )m1
“unterschrieben hat .[. . . ]
Schließlich stehen sie im Dienst einer großen und gerechten Sache , dem Bankkonto des
IOC .
Also , wenn wir die Briten richtig verstanden haben wollen , handelt es sich bei Juan und
seinen 94 Komplizen aus ( dem Lausanner Marmorpalast )m2 um die korrupteste Bande
auf Gottes Erdboden .
(ID: 577x623); (Calc-Prob:52)
Das kathartische Schauspiel wurde inszeniert , um , wie es so schön heißt , weiteren
Schaden von ( der Bewerbung )m1 abzuwenden .
159
g.
h.
i.
j.
k.
l.
160
Doch der ist schon längst eingetreten , denn die Olympia-Gerontokraten werden kaum
verzeihen , daß öffentlich wurde , worauf ( eine jede Bewerbungsstrategie )m2 fußt : daß
sie korrumpierbar sind .
Daß » sexuelle Neigungen « zur Zielpalette der » persönlichen Ansprache « gehören ,
mag dem die Krone aufsetzen , doch lenkt dieser Umstand eher von der Normalität dieser
Bestechlichkeit ab .
(ID: 656x659); (Calc-Prob:52)
( Senat )m1 soll Visumzwang für Kriegsflüchtlinge aufheben[. . . ]
Leute gestern vormittag auf die Lage der Flüchtlinge aus dem Kriegsgebiet aufmerksam .
In einer Petition forderten sie ( Innensenator Dieter Heckelmann ( )m2 CDU ) auf , den
Visumszwang für Kriegsflüchtlinge aus allen Teilen des ehemaligen Jugoslawien in Berlin
aufzuheben .
(ID: 701x724); (Calc-Prob:52)
Allerdings sind im Wirtschaftsplan des Jahres 1992 der Olympia GmbH unter dem Haushaltstitel » Agenturleistungen « für » ( Bewerbungsstrategie )m1 « 1,9 Millionen Mark veranschlagt .[. . . ]
An der Marketing GmbH sind neun namhafte Unternehmen beteiligt .
Nawrocki , der immer privat organisierten Olympischen Spielen das Wort geredet hat ,
sieht durch seine Ernennung das unternehmerische Engagement bei ( der Berliner
Bewerbung )m2 aufgewertet .
(ID: 828x854); (Calc-Prob:52)
Wie Nawrocki gestern erklärte , erhält Fuchs keine Abfindung , da ( sein Vertrag )m1 regulär am 15. August ausläuft .[. . . ]
Nawrocki selbst erhält für seinen Doppeljob kein zusätzliches Salär .
( Sein Geschäftsführervertrag mit der Marketing GmbH )m2 ist unbefristet .
(ID: 814x836); (Calc-Prob:52)
Der Aufsichtsrat der Olympia Marketing GmbH ernannte ihn am Donnerstag abend überraschend zum ( Geschäftsführer der Firma )m1 .[. . . ]
Wie der Aufsichtsratsvorsitzende Peter Weichhardt erklärte , sei man mit ( dem
Geschäftsführerwechsel )m2 der Auffassung des Regierenden Bürgermeisters Diepgen
gefolgt , daß mit den Intimdaten » leichtfertig « umgegangen worden sei .
(ID: 786x802); (Calc-Prob:52)
( Der Aufsichtsrat der Olympia Marketing GmbH )m1 ernannte ihn am Donnerstag abend
überraschend zum Geschäftsführer der Firma .[. . . ]
Wie ( der Aufsichtsratsvorsitzende Peter Weichhardt )m2 erklärte , sei man mit dem Geschäftsführerwechsel der Auffassung des Regierenden Bürgermeisters Diepgen gefolgt , daß mit
den Intimdaten » leichtfertig « umgegangen worden sei .
(ID: 782x801); (Calc-Prob:52)
Der IWF hält standardprogrammgemäß den Blick fest geheftet auf das Haushaltsdefizit ,
die Stabilisierung der Währung und ( freie Preise )m1 .[. . . ]
Unregelmäßige Lieferungen von Bauteilen und Rohstoffen , so die russischen Statistiker ,
werden die Exporte Rußlands um 17 bis 22 Prozent drücken .
( Die Energiepreise )m2 werden nicht - wie der IWF fordert - freigegeben , aber um das
30fache erhöht .
(ID: 8273x8321); (Calc-Prob:52)
m.
n.
o.
p.
q.
r.
s.
Sobald der Schwindel ihrer rasanten Karussellfahrt nachläßt und der verschwommene
Blick auf die Welt wieder klarer wird , werden auch die Kanten ( der Vereinbarung
zwischen dem russischen Premierminister Jegor Gaidar und dem IWF-Exekutiv-Direktor
Michel Camdessus )m1 Kontur gewinnen .[. . . ]
Wirtschaft .
Beim Haushaltsdefizit orientierten sich die amtlichen Statistiker noch an ( der GaidarCamdessus-Vereinbarung )m2 : Es soll sich 1992 auf deutlich weniger als die festgeschriebenen fünf Prozent , nämlich 2,3 Prozent des Bruttosozialprodukts , belaufen .
(ID: 8240x8250); (Calc-Prob:52)
Das Münchner Gipfelkarussell hat sich mit ( hoher Geschwindigkeit )m1 gedreht .[. . . ]
.
Bei ( der hohen Drehgeschwindigkeit )m2 - immer um den Mittelpunkt jenes 24 Milliarden Dollar schweren Hilfspakets für Rußland - ist es eher ein Wunder , daß eine neue
Einsicht dennoch aufspringen konnte : die Erkenntnis , daß sich mit einem IWFStandardprogramm die Schwierigkeiten Rußlands nicht Schlag auf Schlag lösen lassen ,
(ID: 8192x8199); (Calc-Prob:52)
Baumann ( Leverkusen ) in ( seinem )m2 deutschen Rekordlauf am 6. Juni in Sevilla .
(ID: 8433x8441); (Calc-Prob:53)
Erstmals seit ( langer Zeit )m1 schaffte die Vielspringerin und Aktivistin in Sachen Dopingbekämpfung die 2-m-Marke nicht und scheiterte bei 1,98 m .[. . . ]
Mit 10,06 Sekunden gab ihm der Nigerianer Olapade Adeniken um 1/100-Sekunde das
Nachsehen .
Mit ( der gleichen knappen Zeitspanne )m2 unterlag im 200-m-Lauf der Afrikaner dem
Weltmeister Michael Johnson ( USA ) , der für die halbe Stadionrunde 20,10 Sekunden
benötigte .
(ID: 8427x8466); (Calc-Prob:52)
Die große Verliererin des WM-Jahres aus Jamaika setzte sich mit 22,18 Sekunden gegen
ihre Landsfrau Juliet Cuthbert durch .
Dagegen mußte sich im 100-Meter-Sprint ( der Männer )m2 überraschend der OlympiaZweite Linford Christie ( Großbritannien ) vor rund 12.000 Zuschauern erstmals in dieser
Saison geschlagen geben .
(ID: 8433x8456); (Calc-Prob:52)
Der Satz der 27jährigen ist der weiteste Sprung ( einer Frau )m1 in diesem Jahr .[. . . ]
Eine starke Rückkehr feierte die Weltmeisterschafts-Dritte Merlene Ottey über 200 Meter
.
( ihre Landsfrau Juliet Cuthbert )m2 durch .
(ID: 8407x8454); (Calc-Prob:52)
Der Satz der 27jährigen ist der weiteste Sprung einer Frau in ( diesem Jahr )m1 .[. . . ]
161
.
Die große Verliererin ( des WM-Jahres )m2 aus Jamaika setzte sich mit 22,18 Sekunden
gegen ihre Landsfrau Juliet Cuthbert durch .
(ID: 8409x8448); (Calc-Prob:52)
162
t.
Dazu hat die Mutter eines dreijährigen Sohnes ( ihren Deutschen Rekord )m1 eingestellt
.[. . . ]
Baumann ( Leverkusen ) in ( seinem deutschen Rekordlauf am 6. Juni in Sevilla )m2 .
(ID: 8413x8444); (Calc-Prob:52)
u.
In der ersten Runde des Federation Cup vom 12. bis 19. Juli in Frankfurt / Main trifft
( die an Nummer eins gesetzte deutsche Mannschaft um Wimbledon-Siegerin Steffi Graf )m1
Das Eröffnungsspiel ( der Mannschaftsweltmeisterschaft der Damen mit K.O.-System )m2
bestreitet Anke Huber am Montag um 11.30 Uhr auf dem Centre Court .
PRESS-SCHLAG
(ID: 8587x8591); (Calc-Prob:52)
v.
Das Internationale Olympische Komitee und das Nationale Olympische Komitee Jugoslawiens haben sich darauf geeinigt , daß die Sportler aus Serbien und Montenegro unter der
Bezeichnung “Mannschaft der Freundschaft “sowie der Olympia-Fahne und der olympischen Hymne an ( den Start )m1 gehen .[. . . ]
Der 800-Meter-Läufer hatte geklagt , er sei beim Ausscheidungslauf von einem Mitläufer
getreten worden und deshalb so lahm gewesen .
Auch dem Weltmeister im 10.000-Meter-Lauf , Moses Tanui , ist im nachhinein ( der
Olympia-Start )m2 genehmigt worden .
(ID: 8502x8544); (Calc-Prob:52)
w.
Die Idee , in Irakisch-Kurdistan Bildungsarbeit zu leisten , entstand im ( vergangenen Jahr )m1
, als ein Asta-Hilfskonvoi nach dem Ende des Golfkriegs in das Kriegsgebiet gereist war
.[. . . ]
Auch die Berliner Landesstelle für Entwicklungszusammenarbeit der Senatsverwaltung
für Wirtschaft signalisierte Bereitschaft , 30.000 Mark zu übernehmen , sofern die Restfinanzierung gesichert sei .
Das Kurdistan-Komitee hofft nun , das restliche Geld möglichst schnell zusammenzubekommen , damit der Schulbetrieb auch tatsächlich zum ( Winterhalbjahr )m2 aufgenommen
werden kann .
(ID: 12356x12419); (Calc-Prob:52)
x.
» Wir werden hierbleiben und unser Dorf und ( unsere Häuser )m1 wiederaufbauen . «[. . . ]
Alle drei Dörfer sind auf einer Straße zu erreichen und bereits wieder an die Wasserversorgung angeschlossen .
Ein Teil ( der Wohnhäuser )m2 wurde bereits wiederaufgebaut .
(ID: 12293x12385); (Calc-Prob:52)
y.
Aber diesmal ist es keine Koketterie , ( keine » Flucht nach vorn )m1 « .[. . . ]
Alle Mitwirkenden leben noch ihr Leben , sprechen ihren Jargon , und obwohl es nicht
ihre persönliche Geschichte ist , die erzählt wird , spürt man , daß sie alle Gefühle so oder
ähnlich erlebt haben .
Die Tränen in ( der bewegend einfach gefilmten Fluchtszene )m2 sind so echt wie die Dokumentaraufnahmen , die dazwischengeschnitten werden .
(ID: 1865x1877); (Calc-Prob:52)
z.
In ( keiner anderen Stadt unseres Kontinents )m1 muß die Feuerwehr so oft raus wie an
der Spree .[. . . ]
D.N.T.T. gibt sich redlich Mühe , das Publikum zu schockieren .
Das sei “die Aufgabe des Theaters in der rauhen und reizüberfluteten Welt von 1992 “- so
ein Mitglied des Ensembles gegenüber ( der Stadtzeitschrift ’ Zitty ‘ , die das Spektakel
mit veranstaltet )m2 .
(ID: 6316x6407); (Calc-Prob:52)
E.1.6. “Es“(“it“) as expletive pronoun in German
(62)
a.
b.
c.
d.
e.
So leichtfertig wie er das dahersagt , so freudig wurde ( es )m1 vom Aufsichtsrat der Marketing GmbH aufgenommen und dem Geschäftsführer ein Strick daraus gedreht .[. . . ]
Ein Kopf mußte rollen , damit sich die anderen aus der Schlinge ziehen konnten .
Das kathartische Schauspiel wurde inszeniert , um , wie ( es )m2 so schön heißt , weiteren
Schaden von der Bewerbung abzuwenden .
(ID: 644x654); (Calc-Prob:83)
Damit vollzieht sich , was Geschäftsführer Nawrocki immer wollte : Die Unternehmerschaft wird zunehmend das Sagen haben , die öffentliche Hand hat allenfalls für die
notwendige Infrastruktur zu sorgen .
( Das Beispiel Fuchs )m1 zeigt , daß Unternehmer auch mit ihresgleichen nicht zimperlich
umgehen , wenn ( es )m2 ihren Interessen entspricht .
Nawrocki hat sich bislang für olympische Verhältnisse erstaunlich gut gehalten , doch nun
sitzt er auf zwei Schleuderstühlen .
(ID: 688x690); (Calc-Prob:52)
Und anders geht ( es )m1 auch kaum .
» Einen vertragestreuen Mieter kriegen Sie heutzutage nur über eine Eigenbedarfskündigung raus « , heißt ( es )m2 in Hausbesitzerkreisen .
Und dies alles wird vom Steuerzahler noch bezuschußt .
(ID: 1080x1084); (Calc-Prob:83)
Außerdem , erzählt Frau Fiedler , gebe ( es )m1 häufig westdeutsche Hausbesitzer , die
renitenten Mietern drohten , nach Berlin zu ziehen und dann eben Eigenbedarf anzumelden
.[. . . ]
Vor allem bei Mietern , die gerade eine Mieterhöhung oder eine Modernisierung verweigert haben , ist das der Fall .
Hanka Fiedler : » Wenn ( es )m2 schriftliche Unterlagen über solche Rechtsstreitigkeiten
gibt , hat der Mieter vor Gericht gute Karten nachzuweisen , daß der Eigenbedarf nur
vorgeschoben ist . «
(ID: 1039x1051); (Calc-Prob:83)
Nun steht es still - und alle Welt wundert sich , daß ( es )m1 sich nicht von der Stelle bewegt hat .[. . . ]
Bei der hohen Drehgeschwindigkeit - immer um den Mittelpunkt jenes 24 Milliarden Dollar schweren Hilfspakets für Rußland - ist es eher ein Wunder , daß eine neue Einsicht dennoch aufspringen konnte : die Erkenntnis , daß sich mit einem IWF-Standardprogramm
die Schwierigkeiten Rußlands nicht Schlag auf Schlag lösen lassen , wie Boris Jelzin den
Kohls und Bushs klarmachen konnte .
Rußlands Präsident war deshalb wohl wirklich “sehr zufrieden mit dem Treffen “der G-7Regierungschefs aus den USA , Japan , der Bundesrepublik , Frankreich , Großbritannien
, Italien und Kanada - auch wenn ( es )m2 nur die konkrete Zusage gab , die erste IWFKredittranche von einer Milliarde Dollar im August zu überweisen .
163
(ID: 8196x8227); (Calc-Prob:83)
164
f.
( Es )m1 wurde , wie der russische Wirtschaftsstar Gregori Jawlinski anläßlich des Gipfels
erinnerte , bereits 1990 und 1991 Michail Gorbatschow als Kreditlinie eingeräumt .[. . . ]
Der Kern , um den das Karussell sich drehte , löst sich somit auf .
Hat ( es )m2 den Münchner Rummel überhaupt gegeben ?
(ID: 8372x8381); (Calc-Prob:53)
g.
Besonders beeindruckt habe sie ( das Selbstbewußtsein )m1 und der Lebenswille der kurdischen Bevölkerung .[. . . ]
» Einen derartigen Grad an Zerstörung kann man sich überhaupt nicht vorstellen .
( Es )m2 ist der helle Wahnsinn . «
(ID: 12394x12400); (Calc-Prob:52)
h.
Mit all diesen obskuren Äußerungen fällt Frau Rogendorf nicht nur den Lesben / Frauen
auf den Wecker , die ( ihr )m1 Leben selbstbestimmt und -bewußt leben , sondern auch all
den Frauen / Lesben , die sich für die Abschaffung des § 218 einsetzen .
Wenn “eine Gesellschaft sich nur über das Kind entwickeln kann “, wie ( es )m2 Frau
Roggendorf behauptet , ist zu fragen , wie die Gesellschaft mit den ungewollten Kindern
klarkommt , für die wir Frau Roggendorf hiermit dankbar die Adoptionsurkunde ausstellen
wolllen .
Amen .
(ID: 2775x2787); (Calc-Prob:50)
i.
Aber diesmal ist ( es )m1 keine Koketterie , keine » Flucht nach vorn « .
Alle Mitwirkenden leben noch ihr Leben , sprechen ihren Jargon , und obwohl ( es )m2
nicht ihre persönliche Geschichte ist , die erzählt wird , spürt man , daß sie alle Gefühle
so oder ähnlich erlebt haben .
Die Tränen in der bewegend einfach gefilmten Fluchtszene sind so echt wie die Dokumentaraufnahmen , die dazwischengeschnitten werden .
(ID: 1863x1871); (Calc-Prob:83)
j.
Der Regisseur bezeichnet ihn als Film von der Straße für die Straße , und ( es )m1 ist ein
Kompliment für ihn , wenn man die Darsteller » unbeholfen « nennt .
Aber diesmal ist ( es )m2 keine Koketterie , keine » Flucht nach vorn « .
Alle Mitwirkenden leben noch ihr Leben , sprechen ihren Jargon , und obwohl es nicht
ihre persönliche Geschichte ist , die erzählt wird , spürt man , daß sie alle Gefühle so oder
ähnlich erlebt haben .
(ID: 1859x1863); (Calc-Prob:83)
k.
( Es )m1 gibt keine gesellschaftliche Kraft , die die Vorteile eines gemeinsamen Landes
nicht herausstellt .
Allein schon vom Blick auf die Landkarte , in der die große Stadt Berlin mitten im
Land Brandenburg liegt , macht überdeutlich , daß ( es )m2 in Zukunft eine sehr enge
wirtschaftliche , verkehrsmäßige , kulturelle , bildungsmäßige Verflechtung der beiden
Länder geben wird .
(ID: 12125x12135); (Calc-Prob:53)
l.
Weit und breit gibt ( es )m1 keine Bildungsmöglichkeit , ein Desaster nicht nur für die
Kinder , sondern auch für die kurdische Kultur und Sprache .
Gäbe ( es )m2 in Kani Balav eine Schule , würde dort Ende September das neue Schuljahr
beginnen .
Um diese Utopie zu verwirklichen , reiste die Berlinerin Ulrike Hoffmann vier Wochen
durch die irakisch-kurdischen Berge .
(ID: 12325x12331); (Calc-Prob:83)
m.
Am 14. 7. um 20 Uhr im Tempodrom , In den Zelten , Tiergarten
Umsonst und draußen ist ( es )m1 immer voll , da braucht ( es )m2 keine Werbung .
Deshalb hier nur ein paar Anmerkungen zu Rico Rodriguez , weil er mehr ist als nur ein
Musiker , der auch noch ganz gut ins diesjährige Konzept der » Heimatklänge « paßt .
(ID: 12700x12701); (Calc-Prob:83)
n.
Aber so geht ( es )m1 nicht : Das Urteil des Verwaltungsgerichtes Bremen hat auf den
Wangen der Herren Wilhelm und Jachmann rote Streifen hinterlassen .[. . . ]
Man wird jetzt laut überlegen müssen , ob dieses Amt nicht langsam aufgelöst werden
sollte .
Nicht nur , daß ( es )m2 kaum noch lohnende Feindbilder gibt .
(ID: 13419x13427); (Calc-Prob:83)
o.
Noch wesentlich mehr Wohnungen wird ( es )m1 nach dem Urteil pro Jahr treffen , meint
Hartmann Vetter , Geschäftsführer des Berliner Mietervereins .[. . . ]
Zum einen liegen Tausende von Anträgen auf Umwandlung auf Halde .
Inzwischen können auch Wohnungen in Ost-Berlin umgewandelt und verkauft werden ,
soweit ( es )m2 die ungeklärten Eigentumsverhältnisse zulassen , so daß Vetter mit bis zu
10.000 Umwandlungen im Jahr rechnet .
(ID: 964x975); (Calc-Prob:83)
p.
Damit wurde eine laufende Umwandlungswelle gestoppt : Über 85.000 Westberliner Altbauwohnungen waren bis zu diesem Zeitpunkt umgewandelt worden , circa 5.000 bis
6.000 waren ( es )m1 im Jahr .
Noch wesentlich mehr Wohnungen wird ( es )m2 nach dem Urteil pro Jahr treffen , meint
Hartmann Vetter , Geschäftsführer des Berliner Mietervereins .
Zum einen liegen Tausende von Anträgen auf Umwandlung auf Halde .
(ID: 961x964); (Calc-Prob:83)
q.
In zwei Jahren werde ( es )m1 tausend Filme über diese historische Umbruchszeit geben ,
und viel professionellere dazu .[. . . ]
Aber der Laienfilmer Worm scharte seine Freunde , Kollegen und Stammgäste um sich ,
drehte mit einfachsten Mitteln trotzdem seinen Film und behielt recht .
Bis heute gibt ( es )m2 keine vergleichbare Auseinandersetzung , zumindest nicht in Spielfilmform .
(ID: 1789x1801); (Calc-Prob:83)
r.
Wieviel Hoffnungen , Wünsche , Träume , Utopien gab ( es )m1 ?[. . . ]
Als Mario Worm , ein Ostberliner Kneipenwirt , schon 1989 mit der Idee , das alles in
einem Film festzuhalten , hausieren ging , riet man ihm ab .
In zwei Jahren werde ( es )m2 tausend Filme über diese historische Umbruchszeit geben ,
und viel professionellere dazu .
(ID: 1783x1789); (Calc-Prob:83)
s.
Sicher , » Keine Gewalt « ist ein Laienfilm , auf S-VHS gedreht , mit schlechtem Ton ,
technisch und dramturgisch oft unzulänglich .
Der Regisseur bezeichnet ihn als Film von der Straße für die Straße , und ( es )m1 ist
( ein Kompliment für ihn )m2 , wenn man die Darsteller » unbeholfen « nennt .
Aber diesmal ist es keine Koketterie , keine » Flucht nach vorn « .
(ID: 1859x1861); (Calc-Prob:50)
t.
In dem Leitantrag des Vorstandes , der zur Zeit erarbeitet wird , wird ( es )m1 Forderungen zur Bekämfung der Fluchtursachen sowie nach eine Einwanderungsgesetz und einer
Beschleunigung des Asylverfahrens geben .
Der innenpolitische Sprecher der CDU , Ralf Borttscheller , nannte ( es )m2 gestern positiv , wenn sich Wedemeier von der “starren Haltung “der SPD absetze .
165
Borttscheller .
(ID: 1999x2008); (Calc-Prob:83)
u.
Dabei gab ( es )m1 laut Vorstandsmitglied Heiner Erling eine eindeutige Mehrheit für den
Erhalt des Artikels 16 in seiner jetzigen Form .
In dem Leitantrag des Vorstandes , der zur Zeit erarbeitet wird , wird ( es )m2 Forderungen zur Bekämfung der Fluchtursachen sowie nach eine Einwanderungsgesetz und einer
Beschleunigung des Asylverfahrens geben .
Der innenpolitische Sprecher der CDU , Ralf Borttscheller , nannte es gestern positiv ,
wenn sich Wedemeier von der “starren Haltung “der SPD absetze .
(ID: 1988x1999); (Calc-Prob:83)
v.
Jetzt sitzt sie auf einem der Holzstühle , wippt das Kind auf den Knien .
“Daß ( es )m1 einen Wickelraum gibt , ist ( ein Gerücht )m2 . “
Sie habe ihn jedenfalls noch nicht gesehen .
(ID: 3813x3815); (Calc-Prob:50)
E.1.7. Units, currencies, month names, weekdays and the like
(63)
166
a.
Die Ministerin besaß bis Anfang 1991 eine Wohnung in der Bonner Riemannstraße in
Bonn , mit deren Erwerb sie ( bis zu tausend Mark Steuern )m2 im Monat sparen konnte .
Auf großstädtische Verhältnisse umgerechnet nehmen sich solche Summen noch ganz anders aus .
(ID: 1097x1108); (Calc-Prob:83)
b.
Eine leere Eigentumswohnung bringt hingegen unverändert ( zwischen 4.000 und 5.000
Mark )m1 - den Mieter herauszuklagen , lohnt sich da schon .[. . . ]
Über den sogenannten Häuslebauerparagraphen 10e werden Erwerber von Eigentumswohnungen - auch bewohnten - großzügige Steuererleichterungen eingeräumt , als ob sie eine
neue Wohnung schaffen würden .
(ID: 1077x1097); (Calc-Prob:83)
c.
Eine vermietete Eigenumswohnung kostet , so Makler Bendzko , ( zwischen 2.700 und
3.000 Mark den Quadratmeter )m1 .[. . . ]
Diese Preise werden nach dem Urteil mindestens stagnieren , wenn nicht sinken , schätzt
Bendzko .
Eine leere Eigentumswohnung bringt hingegen unverändert ( zwischen 4.000 und 5.000
Mark )m2 - den Mieter herauszuklagen , lohnt sich da schon .
(ID: 1072x1077); (Calc-Prob:83)
d.
Denn das macht Hunderttausende von ( Mark )m1 aus .
Eine vermietete Eigenumswohnung kostet , so Makler Bendzko , ( zwischen 2.700 und
3.000 Mark den Quadratmeter )m2 .
Diese Preise werden nach dem Urteil mindestens stagnieren , wenn nicht sinken , schätzt
Bendzko .
(ID: 1068x1072); (Calc-Prob:83)
e.
( Nur zehn Prozent der umgewandelten Wohnungen )m1 sind an die dort wohnenden Mieter verkauft worden , ein Drittel der Mieter hat die Wohnung verlassen müssen , viele
f.
g.
h.
i.
j.
k.
( Zwischen 60 und 70 Prozent der Vermieter )m2 , schätzt Hanka Fiedler , wollen nur den
zu können .
(ID: 988x1062); (Calc-Prob:83)
Aber schon bei der Inflationsrate , die nach Gaidars Plänen bis Ende des Jahres von
monatlich 20 auf ( höchstens zehn Prozent )m1 sinken soll , steigen die Statistiker aus dem
gemeinsamen Boot mit Gaidar aus .
Sie errechneten ( 30 bis 35 Prozent monatliche Inflationsrate )m2 .
Der IWF hält standardprogrammgemäß den Blick fest geheftet auf das Haushaltsdefizit ,
die Stabilisierung der Währung und freie Preise .
(ID: 8261x8267); (Calc-Prob:83)
Gegenwärtig jedoch liegt es bei ( 17 Prozent )m1 .
Aber schon bei der Inflationsrate , die nach Gaidars Plänen bis Ende des Jahres von
monatlich 20 auf ( höchstens zehn Prozent )m2 sinken soll , steigen die Statistiker aus dem
gemeinsamen Boot mit Gaidar aus .
Sie errechneten 30 bis 35 Prozent monatliche Inflationsrate .
(ID: 8255x8261); (Calc-Prob:83)
Das Gipfelkarussell ist abgebaut / Sein Kern , das Hilfsprogramm für Rußland , verschwindet zusehends / Der aufgeschobene Schuldendienst hat Rußland bis heute ( 2,5 Milliarden
Dollar )m1 gekostet[. . . ]
.
Bei der hohen Drehgeschwindigkeit - immer um den Mittelpunkt jenes ( 24 Milliarden
Dollar )m2 schweren Hilfspakets für Rußland - ist es eher ein Wunder , daß eine neue
Einsicht dennoch aufspringen konnte : die Erkenntnis , daß sich mit einem IWFStandardprogramm die Schwierigkeiten Rußlands nicht Schlag auf Schlag lösen lassen ,
(ID: 8188x8200); (Calc-Prob:83)
Hürden )m1 hinnehmen .[. . . ]
Seine 48,18 Sekunden reichten nicht , um Kevin Young ( USA , 47,97 ) zu schlagen .
Über ( 800 Meter )m2 setzte sich der Kenianer William Tanui mit 1:43,62 Minuten gegen
den Weltjahresbesten Johnny Gray ( USA , 1:44,19 ) durch .
(ID: 8476x8481); (Calc-Prob:83)
benötigte .[. . . ]
( Seine 48,18 Sekunden )m2 reichten nicht , um Kevin Young ( USA , 47,97 ) zu schlagen
.
(ID: 8473x8478); (Calc-Prob:83)
Eine starke Rückkehr feierte die Weltmeisterschafts-Dritte Merlene Ottey über ( 200 Meter )m1
.[. . . ]
Mit der gleichen knappen Zeitspanne unterlag im 200-m-Lauf der Afrikaner dem Weltmeister Michael Johnson ( USA ) , der für die halbe Stadionrunde 20,10 Sekunden benötigte
.
167
Hürden )m2 hinnehmen .
(ID: 8447x8476); (Calc-Prob:83)
168
l.
Mit ( 10,06 Sekunden )m1 gab ihm der Nigerianer Olapade Adeniken um 1/100-Sekunde
das Nachsehen .
benötigte .
(ID: 8461x8473); (Calc-Prob:83)
m.
Die große Verliererin des WM-Jahres aus Jamaika setzte sich mit ( 22,18 Sekunden )m1
gegen ihre Landsfrau Juliet Cuthbert durch .[. . . ]
Dagegen mußte sich im 100-Meter-Sprint der Männer überraschend der Olympia-Zweite
Linford Christie ( Großbritannien ) vor rund 12.000 Zuschauern erstmals in dieser Saison
geschlagen geben .
Mit ( 10,06 Sekunden )m2 gab ihm der Nigerianer Olapade Adeniken um 1/100-Sekunde
das Nachsehen .
(ID: 8452x8461); (Calc-Prob:83)
n.
Damit war er ( über fünf Sekunden )m1 schneller als der hoffnungsvolle Ohrringträger Dieter Baumann ( Leverkusen ) in seinem deutschen Rekordlauf am 6. Juni in Sevilla .[. . . ]
.
Die große Verliererin des WM-Jahres aus Jamaika setzte sich mit ( 22,18 Sekunden )m2
gegen ihre Landsfrau Juliet Cuthbert durch .
(ID: 8439x8452); (Calc-Prob:83)
o.
Der Keniate Yobes Ondieki lief über ( 5.000 Meter )m1 in 13:03,58 Minuten eine neue
Jahresweltbestleistung .[. . . ]
Baumann ( Leverkusen ) in seinem deutschen Rekordlauf am 6. Juni in Sevilla .
Eine starke Rückkehr feierte die Weltmeisterschafts-Dritte Merlene Ottey über ( 200 Meter )m2
.
(ID: 8435x8447); (Calc-Prob:83)
p.
Lausanne ( dpa ) - 17 Tage vor Eröffnung des Olympiaspektakels in Barcelona ist Heike
Drechsler bereits in weltbester Flugform : Mit ( 7,48 Meter )m1 segelte sie beim GrandPrix-Meeting der Leichtathleten am Mittwoch abend in Lausanne nur um vier Zentimeter
am Weltrekord von Galina Tschistjakowa ( 7,52 ) vorbei .[. . . ]
Erstmals seit langer Zeit schaffte die Vielspringerin und Aktivistin in Sachen Dopingbekämpfung die 2-m-Marke nicht und scheiterte bei 1,98 m .
Der Keniate Yobes Ondieki lief über ( 5.000 Meter )m2 in 13:03,58 Minuten eine neue
Jahresweltbestleistung .
(ID: 8396x8435); (Calc-Prob:83)
q.
Heike Hochsprung-Henkel wurde geschlagen , Heike Drechsler mit neuer Weltjahresbestleistung nach Barcelona
Lausanne ( dpa ) - 17 Tage vor Eröffnung des Olympiaspektakels in Barcelona ist Heike
Drechsler bereits in weltbester Flugform : Mit ( 7,48 Meter )m1 segelte sie beim GrandPrix-Meeting der Leichtathleten am Mittwoch abend in Lausanne nur um ( vier Zentimeter )m2
am Weltrekord von Galina Tschistjakowa ( 7,52 ) vorbei .
Der Satz der 27jährigen ist der weiteste Sprung einer Frau in diesem Jahr .
(ID: 8396x8402); (Calc-Prob:52)
r.
Die Endrunde besteht aus zwei Spielen , die am ( 12. und 19. Juli )m1 ausgetragen werden
.[. . . ]
TENNIS
In der ersten Runde des Federation Cup vom 12. bis ( 19. Juli )m2 in Frankfurt / Main
trifft die an Nummer eins gesetzte deutsche Mannschaft um Wimbledon-Siegerin Steffi
Graf auf Außenseiter Neuseeland .
(ID: 8566x8581); (Calc-Prob:83)
s.
In Bonn werden die Stimmen immer lauter , die ein Ende der Berliner Olympia-Bewerbung
für ( das Jahr 2000 )m1 voraussagen .[. . . ]
WECHSEL
Ivo Knoflicek hat in Österreich endlich einen neuen Verein gefunden : Der frühere CSFRNationalspieler wurde vom FC St. Pauli Hamburg zunächst für ( ein Jahr )m2 an Vorwärts
Steyr ausgeliehen .
(ID: 8506x8574); (Calc-Prob:83)
t.
1
Nachdem der Olympiasieger von Seoul , Paul Ereng , sich in der vergangenen Woche
nicht für Barcelona hatte qualifizieren können , hat am ( Mittwoch )m1 eine Sportkommission in Nairobi entschieden , daß Ereng doch mit darf .[. . . ]
Flamengo Rio de Janeiro hat sich als erstes Team für die Endrunde um die brasilianische
Fußball-Meisterschaft qualifiziert .
Sie besiegte am ( Mittwoch )m2 die Burschen aus Santos mit 3:1 .
(ID: 8533x8553); (Calc-Prob:83)
u.
Wer an die jüngsten Abstiegsdramen in ( der Ersten Liga )m1 denkt , der weiß , was für
ein süßes Versprechen das Gedränge um Platz 17 birgt .[. . . ]
Das Beste am ganzen Programm aber bleibt seine epische Anlage : Bis über finale Siege
und Niederlagen entschieden ist , werden 49.680 Minuten gespielt sein .
Welcher Erstligist kann da noch singen “( Niemals Zweite Liga )m2 “?
(ID: 8710x8724); (Calc-Prob:83)
v.
In dieser Hinsicht ist ( die Liga )m1 doch wieder zweigeteilt , diesmal allerdings im besten
, weil dramaturgisch spannendsten Sinne : Wer nicht um den Aufstieg spielt , der spielt
gegen den Abstieg .[. . . ]
Sieben der 24 Mannschaften werden nach dieser Saison ihre regionalen Oberligen verstärken .
Wer an die jüngsten Abstiegsdramen in ( der Ersten Liga )m2 denkt , der weiß , was für
ein süßes Versprechen das Gedränge um Platz 17 birgt .
(ID: 8698x8710); (Calc-Prob:83)
E.1.8. Problems with the alias feature
(64)
a.
1 This
Ivo Knoflicek hat in Österreich endlich einen neuen Verein gefunden : Der frühere CSFRNationalspieler wurde vom ( FC St. Pauli Hamburg )m1 zunächst für ein Jahr an Vorwärts
Steyr ausgeliehen .[. . . ]
TENNIS
In der ersten Runde ( des Federation Cup vom 12. bis 19. Juli in Frankfurt / Main )m2 trifft
die an Nummer eins gesetzte deutsche Mannschaft um Wimbledon-Siegerin Steffi Graf
(ID: 8573x8583); (Calc-Prob:50)
example does not show a clear disreference as both events could have happened on the same Wednesday (“Mittwoch”).
169
b.
c.
d.
e.
f.
WECHSEL
Ivo Knoflicek hat in Österreich endlich einen neuen Verein gefunden : ( Der frühere CSFRNationalspieler )m1 wurde vom ( FC St. Pauli Hamburg )m2 zunächst für ein Jahr an Vorwärts Steyr ausgeliehen .
Leihgebühr : 100.000 Mark .
(ID: 8572x8573); (Calc-Prob:50)
Vorweg : ( Sieben spannende Seiten Widmungen an die Menschen , die Torsten Schmidt
als Junkies knipste )m1 :[. . . ]
S. P.
Ausstellung : bis 6. 8. im Schlachthof ; Buch : “Ich bin einmalig , und daß ich lebe , das
freut mich . Menschen in der Drogenszene “, Rasch und Röhring Verlag , ( 29.80 DM )m2
(ID: 2927x2954); (Calc-Prob:50)
Naheliegende Frage : Warum wehrt sie sich dann so vehement gegen ( ein generelles Verbot
für Zigarettenreklame )m1 ?[. . . ]
Warum wird überhaupt noch geworben , wenn es so nutzlos ist ?
Acht von zwölf Gesundheitsministern ( der EG )m2 haben sich im November des Vorjahres für ein generelles Verbot von Zigarettenreklame ausgesprochen , vier ( die Minister
der Bundesrepublik , Großbritanniens , Dänemarks und der Niederlande ) waren dagegen
.
(ID: 7826x7828); (Calc-Prob:50)
So konnte den Käufern ein imaginärer Wert des Unternehmes von ( drei Millionen Mark )m1
vorgespiegelt werden , der sich bei näherem Hinsehen auf im Grunde unverkäufliche Altbestände bezog .[. . . ]
Noch kurioser sind die Vorgänge um den Theaterverlag “Henschel-Schauspiel “, neben
dem Leipziger “Reclam-Verlag “, der mittlerweile an den Alteigentümer , das heißt die
Reclam GmbH & Co KG im schwäbischen Ditzingen , zurückgegeben wurde , war dies
das einzige ostdeutsche Editionshaus , in dem Mitarbeiter und Autoren gleich Anfang
1990 die Geschicke selbst in die Hand nahmen , um das Überleben des kritisch engagierten
Projekts zu sichern .
Nach dem Vorbild des Verlags der Autoren gründeten 68 Bühnenautoren , Übersetzer
und Mitarbeiter eine GmbH , in der der Henschel-Buchverlag als ehemaliges Mutterhaus
lediglich mit einer Sacheinlage vertreten war , nämlich den alten Textbüchern im damaligen Wert von ( 60.000 DM )m2 .
(ID: 9919x10017); (Calc-Prob:50)
Die Räumung , die am Mittwoch abend gegen 19 Uhr erfolgte , wurde von AL , ( PDS )m1
und dem BUND verurteilt .[. . . ]
Eine Öffnung der Oberbaumbrücke für den Autoverkehr werde die Klimabelastung verschärfen - insbesondere den Sommersmog .
Nach der Räumung am Mittwoch war es zu ( zwei Demonstrationen gekommen , die sich
gegen die bisherige Planung des Senats richteten , die Oberbaumbrücke für den
Autoverkehr zu öffnen )m2 - statt Individualverkehr soll die Tram fahren .
(ID: 11404x11464); (Calc-Prob:50)
E.1.9. First markable begins with “kein“
(65)
170
a.
» ( Keine Gewalt )m1 « war ein Slogan der großen Novemberdemonstrationen .[. . . ]
Im puren , physischen Sinn war es auch eine weitgehend » gewaltlose « Revolution .
( Die andere Gewalt , die Gewalt der Geschichte , des Alltags , unserer Gefühle und
Vorurteile )m2 , entlud sich dagegen , und sie entlädt sich immer noch .
(ID: 1881x1892); (Calc-Prob:83)
b.
c.
d.
e.
f.
g.
h.
Der schmerzhafte Entschluß zur Flucht , der Vertrauensbruch mit denen , die blieben , hat
plötzlich ( keinen Sinn mehr )m1 .[. . . ]
» Keine Gewalt « war ein Slogan der großen Novemberdemonstrationen .
Im ( puren , physischen Sinn )m2 war es auch eine weitgehend » gewaltlose « Revolution
.
(ID: 1850x1884); (Calc-Prob:83)
“Druckräume sind ( keine Lösung )m1 “[. . . ]
Dadurch werden dann Schulhöfe und Spielplätze wieder frei , und vielleicht findet sich
dann auch der Wohnraum .
( Eine wirklich billige und bequeme Lösung für den Senat )m2 .
(ID: 2800x2838); (Calc-Prob:83)
Zum Vergleich werden 30 Kinder aus der Region Plön ( Schleswig-Holstein ) untersucht ,
in der es ( keine Atomanlagen )m1 gibt .[. . . ]
Parallel dazu wird nach Ablagerungen langlebiger Radionukleide in Mensch und Natur
gesucht .
Bisher sind aufgrund der amtlichen Messungen keine besonderen radioaktiven Belastungen aus ( den Atomanlagen in Krümmel und Geesthacht )m2 bekannt .
(ID: 3493x3508); (Calc-Prob:83)
Weil die Sachbearbeiterin Lindner - nach Rücksprache mit der Geschäftsleitung - dem
homosexuellen Buchhändler und Konzertveranstalter Hasso Müller-Kittnau ( 39 ) und
seinem Lebensgefährten in Saarbrücken ( keine Wohnung )m1 vermieten wollte , geriet die
noble Weltfirma unter Beschuß : Der Schwulenverband in Deutschland ( SVD ) forderte
am Mittwoch Lesben und Schwule auf , bei Vertragsabschlüssen mit Versicherungen den
“Diskriminierungsfall “bei der Allianz zu berücksichtigen .
Auf Nachfrage erklärte gestern die Sachbearbeiterin Lindner , daß Müller-Kittnau und
seinem Freund ( die Wohnung in Saarbrücken )m2 nicht verweigert worden sei , weil es
sich um ein homosexuelles Paar gehandelt habe - “obgleich in dieser Wohnanlage sehr
konservative Mieter wohnen “.
Vielmehr habe Müller-Kittnau von der Anmietung der umgehend neu zu belegenden Wohnung Abstand genommen , weil er erst zu einem späteren Zeitpunkt habe einziehen wollen
.
(ID: 5614x5633); (Calc-Prob:83)
( Keine Allianz fürs Leben )m1 [. . . ]
Versicherungskonzern diskriminiert Schwule / Homosexuelles Paar als Mieter abgelehnt
Frankfurt / Main ( taz ) - Bei ( der Allianz Grundstücks AG in Karlsruhe )m2 ist die Belegschaft
verunsichert .
(ID: 5594x5602); (Calc-Prob:83)
( Keine Zahlen über rassistische Straftaten )m1 [. . . ]
Von Bernd Siegler
Nürnberg ( taz ) - Während das Bundeskriminalamt ( BKA ) angesichts der Vielzahl von
rassistisch motivierten Straftaten betont , daß es “keinen Grund zur Entwarnung “gebe ,
weigert sich die Bundesregierung hartnäckig , ( monatliche Zahlen über solche Straftaten
und daraus resultierende Ermittlungsverfahren )m2 zu veröffentlichen .
(ID: 6526x6544); (Calc-Prob:83)
Der Text am Rand erklärt dem Betrachter , daß Prostituierte ( kein Recht auf Klage )m1
haben , wenn ein Freier sie um die Bezahlung für erbrachte Dienstleistungen prellt .[. . . ]
Sie halten sich an der Hand oder im Arm und könnten genausogut Hans und Sabine von
gegenüber sein .
Dazu gibt es knappe Sprüche , die aus dem Mund eines Werbetexters stammen könnten :
» Er hat ( ein Recht auf seine Lust )m2 und Sie Lust auf ihr Recht « oder : » Er ist potent
171
und Sie kompetent « .
(ID: 13240x13265); (Calc-Prob:83)
E.1.10. Disagreement in gender and number
(66)
a.
b.
c.
d.
e.
172
kann .
(ID: 156x179); (Calc-Prob:53)
.
(ID: 242x243); (Calc-Prob:51)
Der zweieinhalbstündige Theaterabend in den Kammerspielen des Deutschen Theaters
blieb dann auch entsprechend unentschieden .
( Viele Gedanken )m1 , Blitzlichter einer Idee wurden angerissen , aber nicht ausgeführt ,
wie zum Beispiel die Inzest-Anspielung zwischen Pedro und ( seiner )m2 Tochter Simonina ( verläßlich gut : Ulrike Krumbiegel ) .
(ID: 188x193); (Calc-Prob:51)
Als die Witwe Juana la Reina plötzlich auf offener Straße stirbt und ihrem irren Sohn Laureano ein einträgliches Geschäft hinterläßt , möchte sich ( so mancher )m1 in ( ihrer )m2
Verwandtschaft gerne vor den Karren spannen , in dem die Verblichene ihren irren Laureano von Jahrmarkt zu Jahrmarkt geschoben hatte .
Helena , ( deren Fall )m1 inzwischen von einigen Zeitungen aufgegriffen wurde , kann bis
heute nicht in ( ihre )m2 Wohnung .
(ID: 423x426); (Calc-Prob:51)
f.
g.
h.
i.
j.
k.
l.
Der Exmissionstitel ist allerdings vergilbt , weil die städtischen Behörden , eigentlich
zuständig , Helena zu ( ihrem )m1 Recht zu verhelfen , es ablehnten einzugreifen .
Begründung : Nach polnischem Mietrecht dürfe man einen Mieter nur aus der Wohnung
entfernen , wenn man in der Lage sei , ( ihm )m2 Ersatzraum zur Verfügung zu stellen .
(ID: 406x416); (Calc-Prob:50)
Wirtschaftsunternehmen aufgebaut hat , kreiden ihm ( die Briten )m1 an .
Auch ( seine )m2 dunkelbraune Vergangenheit in Francos Unrechtsstaat paßt ihnen nicht .
faschistischer Uniform rumgekaspert ist und seine Briefe mit “es grüßt mit erhobenem
Arm “unterschrieben hat .
(ID: 564x565); (Calc-Prob:51)
Wirtschaftsunternehmen aufgebaut hat , kreiden ihm die Briten an .
Auch ( seine )m1 dunkelbraune Vergangenheit in Francos Unrechtsstaat paßt ( ihnen )m2
nicht .
faschistischer Uniform rumgekaspert ist und seine Briefe mit “es grüßt mit erhobenem
Arm “unterschrieben hat .
(ID: 565x569); (Calc-Prob:50)
So leichtfertig wie er ( das )m1 dahersagt , so freudig wurde es vom Aufsichtsrat der
Marketing GmbH aufgenommen und dem Geschäftsführer ein Strick daraus gedreht .
Ein Kopf mußte rollen , damit sich ( die anderen )m2 aus der Schlinge ziehen konnten .
Das kathartische Schauspiel wurde inszeniert , um , wie es so schön heißt , weiteren
Schaden von der Bewerbung abzuwenden .
(ID: 643x651); (Calc-Prob:52)
Denn ( Vermieter , die eine Wohnung freibekommen wollen )m1 , lassen sich einiges einfallen , um die Mieter herauszuekeln , berichtet Frau Fiedler .[. . . ]
Zwischen 60 und 70 Prozent ( der Vermieter )m2 , schätzt Hanka Fiedler , wollen nur den
zu können .
(ID: 1023x1061); (Calc-Prob:83)
( ihre )m1 Landsfrau Juliet Cuthbert durch .[. . . ]
Dagegen mußte sich im 100-Meter-Sprint der Männer überraschend der Olympia-Zweite
Linford Christie ( Großbritannien ) vor rund 12.000 Zuschauern erstmals in dieser Saison
geschlagen geben .
Mit 10,06 Sekunden gab ( ihm )m2 der Nigerianer Olapade Adeniken um 1/100-Sekunde
das Nachsehen .
(ID: 8453x8462); (Calc-Prob:50)
In Bonn werden die Stimmen immer lauter , die ein Ende der Berliner Olympia-Bewerbung
für das Jahr 2000 voraussagen .
“( Berlin 2000 )m1 ist für ( mich )m2 eigentlich schon tot “, meinte der sportpolitische
Sprecher der SPD-Bundestagsfraktion , Wilhelm Schmidt , nach der peinlichen Affäre um
die Daten-Sammlung über Mitglieder des IOC .
Schmidt behauptet , daß ihm konkrete Informationen über den bevorstehenden Rücktritt
173
m.
von Nikolaus Fuchs vorliegen würden .
(ID: 8509x8510); (Calc-Prob:51)
Mit all diesen obskuren Äußerungen fällt Frau Rogendorf nicht nur den Lesben / Frauen
auf den Wecker , die ( ihr )m1 Leben selbstbestimmt und -bewußt leben , sondern auch all
den Frauen / Lesben , die sich für die Abschaffung des § 218 einsetzen .
Wenn “eine Gesellschaft sich nur über das Kind entwickeln kann “, wie es Frau Roggendorf behauptet , ist zu fragen , wie die Gesellschaft mit den ungewollten Kindern klarkommt
, für die ( wir )m2 Frau Roggendorf hiermit dankbar die Adoptionsurkunde ausstellen wolllen .
Amen .
(ID: 2775x2791); (Calc-Prob:50)
E.2. False negatives
E.2.1. Reflexive pronouns with non-subjects or considerable sentence
distance
(67)
a.
b.
c.
d.
e.
174
Der 71jährige Ober-Olympier kommt aber auch wirklich nicht gut weg : Nicht nur , daß
er die ( ihm )m1 anvertrauten Coubertinschen Ideale verraten und auf ihren Trümmern ein
Wirtschaftsunternehmen aufgebaut hat , kreiden ihm die Briten an .[. . . ]
Der Spanier-Hansl kümmert sich wenigstens .
Um ( sich )m2 und auch um seine Kumpels .
(ID: 558x595); (Calc-Prob:0)
Er appellierte an ( Kinkel )m1 , ( sich )m2 für die politischen Häftlinge einzusetzen .
SEITE 8
(ID: 5030x5031); (Calc-Prob:7)
Auch beim Nato-Verbündeten Türkei , in die ( Außenminister Klaus Kinkel )m1 in der
kommenden Woche reisen wird , werde unvermindert gefoltert .[. . . ]
Er appellierte an Kinkel , ( sich )m2 für die politischen Häftlinge einzusetzen .
(ID: 5023x5031); (Calc-Prob:0)
Zwar hatte ( er )m1 zu den 30 bis 40 Jugendlichen gehört , die in der Novembernacht
“Neger aufklatschen “wollten .
Nur erinnern konnte er ( sich )m2 gestern nicht .
Kamp behandelte ihn wie schon andere Zeugen während der Verhandlung : Er schickte
ihn unter Polizeibewachung zum Strafsitzen und Nachdenken in ein kleines Kabuff .
(ID: 5138x5144); (Calc-Prob:0)
( Hawkins )m1 bietet solides Rhythm & Blues-Entertainment und hat vor allem nicht den
Kontakt zur Realität verloren .
Daß der ganze Voodoo-Zauber heutzutage kein kleines Kind mehr zum Schwitzen bringt
, hat er gut erkannt , stellt sich deshalb konsequent neben ( sich selbst )m2 und gibt dem
teilweise mitgealterten Publikum eine gute Zeit .
Wenn er in einem quietschbunten Plüschanzug auf die Bühne kommt und den mit einem
Totenschädel verzierten Stock auf die Bretter knallt , hat er stets ein Augenzwinkern zur
Hand , das gleichsam Screaming Jay Hawkins durch Screaming Jay Hawkins selbst kommentiert .
(ID: 12673x12682); (Calc-Prob:0)
f.
20 Millionen für ( drei ältere rockmusizierende Herrschaften )m1 [. . . ]
Nichts hätte konsequenter sein können .
Die drei rockmusizierenden älteren Herrschaften von Genesis verbanden ( sich )m2 mit
den autofabrizierenden älteren Herren von Volkswagen , um fürderhin für gegenseitige
Belebung des Geschäfts zu sorgen .
(ID: 12555x12610); (Calc-Prob:0)
g.
Was soll man von einer Band halten , die sich nach einem italienischen Glasbläsermeister
benennt , aber ( sich )m1 konsequent der Vermatschung aller verfügbaren jamaikanischen
Musiken widmet ?[. . . ]
Kann man nur großartig finden , wenn sie Messer Banzani heißen .
Die sechs Herren aus Leipzig haben den Eisernen Vorhang anscheinend recht durchlässig
erlebt , sonst könnten sie wohl kaum heute bereits so versiert Ska , Reggae , Dub und
Ragga adaptieren , ohne ( sich )m2 lächerlich zu machen .
(ID: 12562x12576); (Calc-Prob:0)
E.2.2. Semantic Relations between the markables
(68)
a.
Glanzpunkte .
Diese wenigen atmosphärischen Momente lassen ein dichtes , interessantes Stück erahnen
, das aber in ( dieser Fassung )m2 weit unter dem Möglichen inszeniert scheint .
Gegen Ende zerfaselte der Spannungsbogen immer deutlicher , die Schauspieler agierten
zusehends einfallsloser , weil dem Regisseur anscheinend die Einfälle ausgegangen waren
.
(ID: 218x223); (Calc-Prob:10)
b.
Als ein Bekannter des Hundehalters versuchte , die Wohnung zu räumen , wurde er gebissen und flüchtete ins Wohnzimmer zur Gattin ( des Besitzers )m1 .[. . . ]
( Herrchen )m2 wollte den Hundefänger holen .
(ID: 340x344); (Calc-Prob:4)
c.
gebissen und flüchtete ins Wohnzimmer zur Gattin des Besitzers .[. . . ]
(ID: 335x344); (Calc-Prob:4)
d.
( Der 24jährige Besitzer )m1 hatte dem Tier am Vortag sein zukünftiges Heim gezeigt
.[. . . ]
(ID: 325x344); (Calc-Prob:7)
e.
gebissen und flüchtete ins Wohnzimmer zur Gattin ( des Besitzers )m2 .
(ID: 335x340); (Calc-Prob:10)
f.
Der 24jährige Besitzer hatte dem Tier am Vortag ( sein zukünftiges Heim )m1 gezeigt .[. . . ]
175
g.
h.
i.
j.
k.
l.
176
Als ein Bekannter des Hundehalters versuchte , ( die Wohnung )m2 zu räumen , wurde er
(ID: 329x337); (Calc-Prob:5)
Edgar ist neu in ( der Stadt )m1 , aber schon jetzt Stammgast in rund 200 Kneipen und
Cafes von Altona bis Pöseldorf .[. . . ]
Doch Bromeck hat Größeres im Sinn , die Vorinvestitionen von “weit über 100000 Mark
“sollen sich bald lohnen : Edgar auf der Karte ist Platzhalter für potente Kunden , die
derzeit eifrig akquiriert werden .
Denn wie in Kopenhagen etwa der Bier-Riese Tuborg seine Image-Werbung längst vom
dortigen Edgar-auf-der-Karte-Vorbild Gocard drucken und vertreiben läßt , sollen auch
in ( Hamburgs )m2 Kneipen demnächst Werbepostkarten von hiesigen Unternehmen zur
Gratis-Mitnahme einladen .
(ID: 4822x4893); (Calc-Prob:3)
Edgar ist neu in ( der Stadt )m1 , aber schon jetzt Stammgast in rund 200 Kneipen und
Cafes von Altona bis Pöseldorf .[. . . ]
Es meldet sich eine Werbeagentur “Bromeck “.
“Die Idee mit den Gratispostkarten stammt aus Kopenhagen “, erklärt der bereits werbegewiefte studierte Jurist Christian Meckenstock , der mit seiner Kollegin Nana Bromberg
eigens eine Agentur gründete , um die Kampagne Edgar auf der Karte von einem Büro im
PPS-Bunker aus in ( Hamburg )m2 und auf Sylt zu starten .
(ID: 4822x4853); (Calc-Prob:3)
Daraufhin weigerte sich ( Daimler-Benz )m1 , ihn nach Abschluß seiner Ausbildung als
Schlosser zu übernehmen - der Artikel sei nämlich ein “Bekenntnis zur Gewalt “.
Es sei zu befürchten , daß der junge Mann in bestimmten Situationen auch im Betrieb
Gewalt befürworten werde , argumentierte ( das Unternehmen )m2 .
Das Bundesarbeitsgericht teilte den Standpunkt und wies die Klage auf Einstellung ab .
(ID: 6010x6022); (Calc-Prob:7)
So konnte ( den Käufern )m1 ein imaginärer Wert des Unternehmes von drei Millionen
Mark vorgespiegelt werden , der sich bei näherem Hinsehen auf im Grunde unverkäufliche Altbestände bezog .[. . . ]
Schließlich brauchten die bayerischen Jungunternehmer auch nur für zehn Arbeitsplätze
Beschäftigungsgarantien und Finanzierungszusagen für ganze 18 Monate abzugeben ( bei
trickreich vollzogenem rückwirkendem Verkauf zum 1. Januar 1992 sogar nur für zwölf
Monate ) , während man von der Belegschaft , die den Verlag übernehmen wollte , Finanzierungsgarantien für vier bis fünf Jahre verlangt hatte .
Aber selbst mit diesen 18 Monaten scheinen ( die neuen Eigentümer )m2 überfordert zu
sein , denn ihr heimatliches Kalender-Unternehmen wirft längst nicht soviel ab , wie Volk
und Welt momentan an jährlichen Defiziten einfährt .
(ID: 9917x9950); (Calc-Prob:9)
Frieda Mermet heißt seine Angebetete , und eine einfache Wäscherin in einem Schweizer
Bergkaff ist sie .
bringt , hockt ( die Holde )m2 tatsächlich im Dichterolymp .
(ID: 13029x13033); (Calc-Prob:9)
In ( der Inszenierung vn Armin Holz in den Kammerspielen des Deutschen Theaters )m1 scheint
dieser bittere Kern hinter viel Regie-Schnickschnack wieder zu verschwinden .[. . . ]
Diese wenigen atmosphärischen Momente lassen ein dichtes , interessantes Stück erahnen
, das aber in ( dieser Fassung )m2 weit unter dem Möglichen inszeniert scheint .
(ID: 153x223); (Calc-Prob:10)
m.
So kann ( die Groteske )m1 nicht zu ihrer Wirkung kommen , kann nichts von der Form
ins Formlose umschlagen , vom Maßvollen ins Sinnlose kippen .[. . . ]
wieder zu schick .
(ID: 178x208); (Calc-Prob:4)
n.
Der Bürgersmann Pedro beweint zudem noch seine eheliche Ehre , die ( seine Angetraute )m1
gerade mit dem sittenlosen Gaukler Septimo verspielt .[. . . ]
Als habe er von Anfang an klarstellen wollen , daß hier lediglich eine “Schauperposse
“zu sehen ist , läßt Armin Holz den Figuren des Stücks gar keine Gelegenheit , sich erst
einmal sinnhaft vorzustellen .
Septimo ( Ulrich Haß ) verhöhnen lassen , ( seine Frau Mari-Gaila )m2 und Schwester
Marica agieren gleich zu Beginn so exaltiert , daß einem der später so wüst entbrennende
Streit über den lukrativen Karren nicht mehr bizarr , sondern nur noch folgerichtig erscheinen kann .
(ID: 135x173); (Calc-Prob:17)
o.
Als ein Bekannter des Hundehalters versuchte , die Wohnung zu räumen , wurde er gebissen und flüchtete ins Wohnzimmer zur Gattin des Besitzers .
(ID: 326x331); (Calc-Prob:5)
p.
Hamburg ( ap ) - ( Ein zwei Jahre alter Schäferhund namens “Prinz )m1 “hat im Hamburger Stadtteil Altona eine Wohnung besetzt .
(ID: 322x326); (Calc-Prob:3)
q.
Aufgrund des gleichen Paragraphen gibt es in ( Warschau )m1 inzwischen Tausende kommunaler Sozialmieter , die ihre Zahlungen eingestellt haben - Kündigung droht ihnen
deshalb nicht .
( Die Stadt )m2 hat inzwischen sogar schon private Schuldenjäger beauftragt , die Mieten
einzutreiben .
Jemanden vor die Tür setzen dürfen die allerdings auch nicht .
(ID: 511x519); (Calc-Prob:5)
r.
Das anfängliche Entsetzen , das viele beim Lesen ( der düsteren Texte von Garcia Marquez )m1
empfinden , haben die Schauspieler und Schauspielerinnen rasch überwinden .[. . . ]
Natürlich gebe es in den Texten reichlich Todesbilder , meint Cristina Tetzner , eine der
SchauspielerInnen , “aber sie sind auch gespickt mit Bildern der Sinnlichkeit “.
Und ihr Szenen-Partner Henner Schneider entdeckte in ( den Geschichten )m2 trotz der
177
ihm sehr fremden Denkweise sogar humorvolle Stellen .
(ID: 2174x2190); (Calc-Prob:0)
E.2.3. Both markables contain a common, possibly appositive proper
name
(69)
a.
b.
c.
d.
e.
f.
178
( Ramon Valle-Inclan )m1 [. . . ]
) , bleibt ( Valle-Inclan , der Exzentriker der Moderne )m2 , auch weiterhin ein Geheimtip
.
( Ramon Valle-Inclan )m1
( Der Spanier Ramon Maria Valle-Inclan ( 1866 - )m2 1939 ) war schon äußerlich ein Bürgerschreck : langer Bart , schwarze Kleidung und ein amputierter Arm .
Nach eigenen Angaben entstammte er einer alten Adelsfamilie .
Bernd Stempel als Pedro Gailo muß in seinem viel zu kleinen Anzug und einem zweifarbigen Rauschebart über die Bühne stelzen und sich von ( der überzeichneten Zuhälterfigur
Septimo ( Ulrich )m1 Haß ) verhöhnen lassen , seine Frau Mari-Gaila und Schwester Marica agieren gleich zu Beginn so exaltiert , daß einem der später so wüst entbrennende Streit
kann .[. . . ]
Musik : Irmin Schmidt .
Mit Corinna Harfouch , Bernd Stempel , ( Ulrich Haß )m2 , Margit Bendokat , Elsa GrubeDeister , Ulrike Krumbiegel , Horst Lebinsky , Markus Gertken u. a.
(ID: 171x267); (Calc-Prob:0)
Deutschen Theater , mit der übrigens ( Ignaz Kirchner )m1 ursprünglich sein Debüt als
neues Ensemblemitglied hätte geben sollen , mehr versprochen .
( Kirchner )m2 wird nun erst im Herbst mit einer Langhoff-Inszenierung seine Arbeit
aufnehmen .
Er wird wissen , warum .
(ID: 244x248); (Calc-Prob:17)
Es meldet sich ( eine Werbeagentur “Bromeck )m1 “.[. . . ]
Der Wahlhamburger Kevin Bauer entwarf das Edgar-Logo : Der Mann mit der Pfeife und
dem Balken über den Augen .
Rund 200000 ihrer , im weitesten Sinne , Kunstpostkarten hat ( Bromeck )m2 in den ersten
drei Wochen der Aktion unters Volk gebracht .
(ID: 4840x4867); (Calc-Prob:3)
“Die Idee mit den Gratispostkarten stammt aus Kopenhagen “, erklärt ( der bereits
werbegewiefte studierte Jurist Christian Meckenstock )m1 , der mit seiner Kollegin Nana
Bromberg eigens eine Agentur gründete , um die Kampagne Edgar auf der Karte von
einem Büro im PPS-Bunker aus in Hamburg und auf Sylt zu starten .
Der Name Edgar , großzügig abgeleitet vom Begriff der “Advertising Card “, ist , alte
Werbeweisheit , “als Figur ausbaufähig “, ( so Meckenstock )m2 .
Der Wahlhamburger Kevin Bauer entwarf das Edgar-Logo : Der Mann mit der Pfeife und
dem Balken über den Augen .
(ID: 4844x4858); (Calc-Prob:12)
g.
h.
i.
j.
k.
l.
Im November 1990 sollen sie ( den Angolaner Amadeu Antonio )m1 so zusammengeschlagen haben , daß der 28jährige starb .[. . . ]
Sie wehrte sich beredt gegen die Verteidiger , die ihrer Darstellung nicht glauben wollten
.
Durch ihre Schilderung sind die Vorgänge , die ( Amadeu Antonio )m2 das Leben kosteten
, noch weniger klar als vorher .
(ID: 5116x5211); (Calc-Prob:5)
Die Fässer sollen zur Konditionierung aus ( dem Atommüllager Gorleben )m1 in eine Lagerhalle nach Duisburg-Wanheim transportiert werden .[. . . ]
Keiner will die Mol-Fässer haben .
Gleichzeitig versuchte die Essener Gesellschaft für Nuklearservice ( GNS ) die 1.000
Atommüllfässer aus ( Gorleben )m2 zur Verarbeitung nach Duisburg zu bringen , konnte
der Gewerbeaufsicht aber nie die erforderlichen Papiere präsentieren .
(ID: 6075x6104); (Calc-Prob:6)
Die fehlende Nachwirkung Bleis dürfte ihren Grund darin haben , daß ( der Schriftsteller Blei )m1
keinen wiedererkennbaren Stil , keinen durchgängigen Erzählton besitzt : weder den
ironisch-epischen eines Thomas Mann , den essayistisch-analytischen eines Robert Musil
noch den skeptisch-melancholischen eines Joseph Roth .[. . . ]
Ein Autor , der sich , wenn auch in parodierender Absicht , so eng an die stilistischen und
thematischen Vorgaben anderer Autoren hält , läuft Gefahr , sich selbst zu verlieren .
Eine Gefahr , der ( Blei )m2 ironisch ins Auge sah : “Der Blei “, schreibt Blei im Bestiarium , “ist ein Süßwasserfisch , der sich geschmeidig in allen frischen Wassern tummelt
und seinen Namen mhd. bli , ahd. blio = licht , klar von der außerordentlich glatten und
dünnen Haut trägt , durch welche die jeweilige Nahrung mit ihrer Farbe deutlich sichtbar
wird .
(ID: 9103x9159); (Calc-Prob:19)
So enthält das Bestiarium einen Exkurs - eine andere Lieblingsform Bleis - zur politischen Romantik , der sich als wortgetreue Wiedergabe eines Kapitels aus der Politischen
Romantik von Carl Schmitt herausstellt .
Bleis Essay über ( den Poeten , Maler und Giftmörder Thomas Griffith Wainewright ( )m1
1794-1852 ) hingegen ist eine parodierende Paraphrase von Oscar Wildes Lobeshymne
auf ( Wainewright )m2 in Pen , Pencil and Poison .
Das Oeuvre Bleis ist voll von literarischen Längs- und Querbezügen , die - mit oder ohne
Autorennennung - in die eigene literarische Produktion mit eingearbeitet wurden .
(ID: 9137x9140); (Calc-Prob:6)
Daran ist dann offensichtlich auch der Versuch von Schmidt-Braul gescheitert ,
( den Luftfahrtunternehmer Dornier )m1 für die Übernahme zu gewinnen .
( Silvius Dornier )m2 wußte aus seinen zähen Verhandlungen mit Daimler-Benz , denen er
einen Großteil seiner Aktien verkauft hatte , daß auch bei der Treuhand etwas rauszuschlagen wäre .
Wenn sie schon die teure Immobilie einsackte , sollte sie wenigstens noch den Verlag
entschulden und etwas Geld für die Anschubfinanzierung locker machen .
(ID: 9868x9870); (Calc-Prob:7)
Als mir Anfang des Jahres ( Martin Flug )m1 sein Manuskript Treuhand-Poker - Die Mechanismen des Ausverkaufs auf den Verlagstisch legte , schien mir manches recht überzogen
, und ich bat ihn , mir die haarsträubendsten Geschichten mit Dokumenten zu belegen , da
ich wenig Lust verspürte , gleich nach Erscheinen verklagt zu werden .[. . . ]
Doch in jedem einzelnen Fall konnte er mich von der Sauberkeit seiner Recherche überzeugen , und die Tatsache , daß bis heute - drei Monate nach der Erstauslieferung - keine Einstweiligen Verfügungen bei uns herniedergegangen sind , scheinen ihm zusätzlich recht zu
179
geben .
Heute würde ich wahrscheinlich nicht mehr so skeptisch fragen , denn das , was ich in den
letzten Wochen in meiner unmittelbaren Umgebung , der Ostberliner Verlagsszene , erlebt
habe , stellt ( Flugs )m2 Report noch um einiges in den Schatten .
(ID: 9647x9680); (Calc-Prob:9)
180
m.
“Das ist die alte Übung , Herr Präsident , auch wir müssen umlernen “, mußte sich Bundeskanzler Helmut Kohl auf seiner Pressekonferenz mit ( dem russischen Präsidenten Boris
Jelzin )m1 am Mittwoch in München verteidigen .
Er hatte ( seinen neuen “Duzfreund “Boris )m2 nämlich aus Versehen als sowjetischen
Präsidenten bezeichnet und wurde daraufhin auch prompt von ihm unterbrochen .
WAFFENSCHMUGGEL IM KLEINEN RAHMEN
(ID: 10091x10097); (Calc-Prob:8)
n.
Gurke des Tages : ( Helmut Kohl )m1
“Das ist die alte Übung , Herr Präsident , auch wir müssen umlernen “, mußte sich
( Bundeskanzler Helmut Kohl )m2 auf seiner Pressekonferenz mit dem russischen Präsidenten Boris Jelzin am Mittwoch in München verteidigen .
Er hatte seinen neuen “Duzfreund “Boris nämlich aus Versehen als sowjetischen Präsidenten bezeichnet und wurde daraufhin auch prompt von ihm unterbrochen .
(ID: 10084x10089); (Calc-Prob:27)
o.
Da hat er das wildgemusterte Hemd und die Schirmmütze » von irgendwoher , wo es Elefanten gibt « , abgelegt , den Blues Max auf den Bügel gehängt und genießt erst mal als
( Max Lässer )m1 ein kühles Pils .[. . . ]
Seit drei Jahren füllen die beiden in der Schweiz als » Blues Max « große Säle , erzählen
ihre mal melancholischen , mal ironischen , auch traurigen Geschichten von Onkel Hermann , der einfach gar nie den Blues hat , vom armen Kasimir Benz oder von dem Mann
, der seinen Himmel auf Erden nicht mehr finden konnte .
» Manchmal weinen die Leut’ , odr ! « weiß ( der Blues Max )m2 .
(ID: 12946x12965); (Calc-Prob:13)
p.
( Frieda Mermet )m1 heißt seine Angebetete , und eine einfache Wäscherin in einem Schweizer
Bergkaff ist sie .
bringt , hockt die Holde tatsächlich im Dichterolymp .
(ID: 13022x13029); (Calc-Prob:12)
q.
( Westend )m1 Be Thy Name[. . . ]
Kaum gehn die Bauarbeiten mal voran im Kulturzentrum Walle , da haben sich seine
glücklichen Alten schon einen Namen für das Balg ausgesucht , weil ’s ja “Kulturwerkstatt
für ArbeitnehmerInnen “nicht im Ernst heißen kann .
Der Trägerverein , also u.a. DGB und Kultursenatorium , haben sich nach langem Ringen
und einem Wettbewerb zum ( Namen “Westend )m2 “entschlossen .
(ID: 13383x13401); (Calc-Prob:6)
r.
Wunderworte , 1920 geschrieben und in Deutschland zuvor erst zweimal aufgeführt , ist
ein sogenanntes “Esperpento “( zu deutsch : Schauerposse ) ( des Spaniers Ramon del
Valle-Inclan )m1 , eine Tragi-Komödie , ein deformiertes Zerrbild der Wirklichkeit .[. . . ]
Viele Gedanken , Blitzlichter einer Idee wurden angerissen , aber nicht ausgeführt , wie
zum Beispiel die Inzest-Anspielung zwischen Pedro und seiner Tochter Simonina ( verläßlich gut : Ulrike Krumbiegel ) .
( Ramon del Valle-Inclans )m2 Katholizismuskritik ist bei Armin Holz so stark in den Hintergrund gedrängt , daß das Kreuz , als es auf die Bühne geschleppt wird , kaum mehr als
s.
t.
u.
v.
w.
ein weiterer Gag sein kann .
(ID: 140x196); (Calc-Prob:6)
( Der Bürgersmann Pedro )m1 beweint zudem noch seine eheliche Ehre , die seine Angetraute gerade mit dem sittenlosen Gaukler Septimo verspielt .[. . . ]
Bernd Stempel als ( Pedro Gailo )m2 muß in seinem viel zu kleinen Anzug und einem
zweifarbigen Rauschebart über die Bühne stelzen und sich von der überzeichneten Zuhälterfigur Septimo ( Ulrich Haß ) verhöhnen lassen , seine Frau Mari-Gaila und Schwester
(ID: 130x164); (Calc-Prob:12)
Und während ( Bruder Pedro )m1 und Schwester Marica noch darüber feilschen , wer den
“Karren “nun an welchen Tagen bekommen soll , ist es für Pedros Frau Mari-Gaila schon
ausgemacht , daß sie mit Laureano auf und davon gehen wird .[. . . ]
Bernd Stempel als ( Pedro Gailo )m2 muß in seinem viel zu kleinen Anzug und einem
zweifarbigen Rauschebart über die Bühne stelzen und sich von der überzeichneten Zuhälterfigur Septimo ( Ulrich Haß ) verhöhnen lassen , seine Frau Mari-Gaila und Schwester
(ID: 113x164); (Calc-Prob:12)
Gott guckt uns nicht zu , der hat der Welt längst den Rücken gekehrt “, läßt ( der spanische
Dichter Ramon del Valle-Inclan )m1 einen seiner Helden gleich zu Beginn seiner Grotske
“Wunderworte “erklären .[. . . ]
Der Bürgersmann Pedro beweint zudem noch seine eheliche Ehre , die seine Angetraute
gerade mit dem sittenlosen Gaukler Septimo verspielt .
Wunderworte , 1920 geschrieben und in Deutschland zuvor erst zweimal aufgeführt , ist
ein sogenanntes “Esperpento “( zu deutsch : Schauerposse ) ( des Spaniers Ramon del
Valle-Inclan )m2 , eine Tragi-Komödie , ein deformiertes Zerrbild der Wirklichkeit .
Als die Witwe Juana la Reina plötzlich auf offener Straße stirbt und ( ihrem irren Sohn Laureano )m1
ein einträgliches Geschäft hinterläßt , möchte sich so mancher in ihrer Verwandtschaft
gerne vor den Karren spannen , in dem die Verblichene ihren irren Laureano von Jahrmarkt
zu Jahrmarkt geschoben hatte .[. . . ]
Denn soviel ist klar : Wer ( Laureano )m2 versorgt , besitzt im armen Galizien allemal
eine einträgliche Geldquelle .
Als die Witwe Juana la Reina plötzlich auf offener Straße stirbt und ( ihrem irren Sohn Laureano )m1
ein einträgliches Geschäft hinterläßt , möchte sich so mancher in ihrer Verwandtschaft
gerne vor den Karren spannen , in dem die Verblichene ( ihren irren Laureano )m2 von
181
x.
y.
z.
182
Jahrmarkt zu Jahrmarkt geschoben hatte .
Einmal ins ( Ritz )m1
Paris ( dpa ) - Einer amerikanischen Touristin sind im ( berühmten Hotel “Ritz “an der
Place Vendome )m2 in Paris Schmuckstücke im Schätzwert von 2,5 Millionen Franc (
knapp 750.000 Mark ) gestohlen worden .
Es soll sich um drei Diebe gehandelt haben .
(ID: 346x351); (Calc-Prob:12)
Als ( Mario Worm , ein Ostberliner Kneipenwirt )m1 , schon 1989 mit der Idee , das alles
in einem Film festzuhalten , hausieren ging , riet man ihm ab .[. . . ]
In zwei Jahren werde es tausend Filme über diese historische Umbruchszeit geben , und
viel professionellere dazu .
Aber ( der Laienfilmer Worm )m2 scharte seine Freunde , Kollegen und Stammgäste um
sich , drehte mit einfachsten Mitteln trotzdem seinen Film und behielt recht .
(ID: 1784x1792); (Calc-Prob:35)
( Der innenpolitische Sprecher der CDU , Ralf Borttscheller )m1 , nannte es gestern positiv , wenn sich Wedemeier von der “starren Haltung “der SPD absetze .
( Borttscheller )m2 .
“Das ehrt ihn . “
(ID: 2007x2013); (Calc-Prob:12)
Bibliography
Ernst Althaus, Nikiforos Karamanis, and Alexander Koller. Computing locally coherent discourses. In
Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, ACL ’04,
pages 399–406, Barcelona, Spain, 2004. Association for Computational Linguistics.
Giuseppe Attardi, Stefano Dei Rossi, and Maria Simi. Tanl-1: Coreference resolution by parse analysis
and similarity clustering. In Proceedings of the 5th International Workshop on Semantic Evaluation,
SemEval ’10, pages 108–111, Los Angeles, California, 2010. Association for Computational Linguistics.
Amit Bagga and Breck Baldwin. Algorithms for scoring coreference chains. In Proceedings of the First
International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference, pages 563–566, 1998.
Egon Balas, Fred Glover, and Stanley Zionts. An additive algorithm for solving linear programs with
Zero-One variables. Operations Research, 13(4):517–549, 1965.
Nikhil Bansal, Avrim Blum, and Shuchi Chawla. Correlation clustering. In Proceedings of the 43rd
Annual IEEE Symposium on Foundations of Computer Science, pages 238–247, 2002.
E.T. Bell. Exponential numbers. The American Mathematical Monthly, 41(7):411–419, August 1934.
Eric Bengtson and Dan Roth. Understanding the value of features for coreference resolution. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 294–303,
Honolulu, Hawaii, October 2008. Association for Computational Linguistics.
Adam L. Berger, Vincent J. Della Pietra, and Stephen A. Della Pietra. A maximum entropy approach to
natural language processing. Comput. Linguist., 22:39–71, March 1996.
Gosse Bouma and Anne marie Mineur. Coreference related linguistic phenomena. MIT Press, 2006.
Thorsten Brants. Tnt: A statistical part-of-speech tagger. In Proceedings of the sixth conference on
Applied natural language processing, ANLC ’00, pages 224–231, Seattle, Washington, 2000. Association for Computational Linguistics.
Samuel Broscheit, Massimo Poesio, Simone Paolo Ponzetto, Kepa Joseba Rodriguez, Lorenza Romano,
Olga Uryupina, Yannick Versley, and Roberto Zanoli. Bart: A multilingual anaphora resolution system. In Proceedings of the 5th International Workshop on Semantic Evaluation, SemEval ’10, pages
104–107, Los Angeles, California, 2010a. Association for Computational Linguistics.
Samuel Broscheit, Simone Paolo Ponzetto, Yannick Versley, and Massimo Poesio. Extending bart to provide a coreference resolution system for german. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). European Language Resources Association
(ELRA), 2010b.
183
Bibliography
Andre Burkovski, Gunther Heidemann, Hamidreza Kobdani, and Hinrich Schütze. Self organizing maps
in nlp: Exploration of coreference feature space. In Proceedings of the 8th International Workshop on
Self-Organizing Maps, 2011.
Canoo.net. Canoo.net - the pronoun es, Last checked 13. February 2011a. URL http://www.canoo.
net/services/OnlineGrammar/InflectionRules/FRegeln-P/Pron-es.html.
Canoo.net. Canoo.net - possessive pronouns and determiners, Last checked 20. February 2011b.
URL
http://www.canoo.net/services/OnlineGrammar/InflectionRules/
FRegeln-P/Pron-Poss3.html.
Canoo.net. Canoo.net - reflexive verbs, Last checked 14. November 2011c. URL http://www.
canoo.net/services/OnlineGrammar/Wort/Verb/Valenz/Reflexiv.html.
Claire Cardie and Kiri Wagstaff. Noun phrase coreference as clustering. In Proceedings of the 1999 Joint
SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora
(EMNLP/VLC1999), pages 82–89, College Park, Maryland, USA, 1999.
Jonathan H Clark and Jose P Gonzalez-brenes. Coreference resolution : Current trends and future directions. Language and Statistics II Literature Review, page 14, 2008.
William W. Cohen. Fast effective rule induction. In Proceedings of the Twelfth International Conference
on Machine Learning, pages 115–123, Tahoe City, California, USA, 1995. Morgan Kaufmann.
Hamish Cunningham, Diana Maynard, Kalina Bontcheva, and Valentin Tablan. Gate: an architecture for
development of robust hlt applications. In Proceedings of the 40th Annual Meeting on Association for
Computational Linguistics, ACL ’02, pages 168–175, Philadelphia, Pennsylvania, 2002. Association
for Computational Linguistics.
Walter Daelemans, Jakub Zavrel, Ko van der Sloot, and Antal van den Bosch. TiMBL: Tilburg memory
based learner, version 5.0: reference guide [= ILK Research Group technical report 03-10]. Tilburg
University, Tilburg, 2003.
Arthur P. Dempster. A generalization of bayesian inference. In Ronald R. Yager and Liping Liu, editors,
Classic Works of the Dempster-Shafer Theory of Belief Functions, volume 219 of Studies in Fuzziness
and Soft Computing, pages 73–104. Springer, 2008.
Pradheep Elango. Coreference resolution: A survey. Master’s thesis, University of Wisconsin Madison,
2005.
Thomas Finley and Thorsten Joachims. Supervised clustering with support vector machines. In Proceedings of the International Conference on Machine Learning (ICML), pages 217–224, 2005.
Yoav Freund and Robert E. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37:277–296, December 1999.
Johan Hall and Joakim Nivre. A dependency-driven parser for german dependency and constituency representations. In Proceedings of the Workshop on Parsing German, PaGe ’08, pages 47–54, Columbus,
Ohio, 2008. Association for Computational Linguistics.
Sven Hartrumpf. Coreference resolution with syntactico-semantic rules and corpus statistics. In Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7, ConLL
’01, pages 17:1–17:8, Toulouse, France, 2001. Association for Computational Linguistics.
Erhard W. Hinrichs, Katja Filippova, and Holger Wunsch. What treebanks can do for you: Rule-based
and machine-learning approaches to anaphora resolution in German. In Proceedings of the 4th Workshop on Treebanks and Linguistic Theories (TLT 2005), pages 77–88, Barcelona, Spain, 2005a.
184
Bibliography
Erhard W. Hinrichs, Sandra Kübler, and Karin Naumann. A unified representation for morphological,
syntactic, semantic, and referential annotations. In Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky, CorpusAnno ’05, pages 13–20, Ann Arbor, Michigan, 2005b.
Association for Computational Linguistics.
Lynette Hirschman and Nancy Chinchor. Muc-7 coreference task definition (version 3.0). In Proceedings
of the 7th Message Understanding Conference (MUC-7), 1997.
J Hobbs. Resolving pronoun references, pages 339–352. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1986.
IMS-Wikipedia. Ims-wikipedia sukre, Last checked 02. March 2011. URL http://wiki.ims.
uni-stuttgart.de/Sukre.
Thorsten Joachims. Making large-scale support vector machine learning practical, pages 169–184. MIT
Press, Cambridge, MA, USA, 1999.
Wiltrud Kessler. Analysis and Visualization of Coreference Features. Diploma thesis, University of
Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Germany, April 2010.
Manfred Klenner and Étienne Ailloud. Optimization in coreference resolution is not needed: a nearlyoptimal algorithm with intensional constraints. In Proceedings of the 12th Conference of the European
Chapter of the Association for Computational Linguistics, EACL ’09, pages 442–450, Athens, Greece,
2009. Association for Computational Linguistics.
Hamidreza Kobdani and Hinrich Schütze. Sucre: A modular system for coreference resolution. In
Proceedings of the 5th International Workshop on Semantic Evaluation, SemEval ’10, pages 92–95,
Los Angeles, California, 2010a. Association for Computational Linguistics.
Hamidreza Kobdani and Hinrich Schütze. Feature engineering of coreference resolution in multiple
languages. Manuscript, University of Stuttgart, 2010b.
Hamidreza Kobdani, Hinrich Schütze, Andre Burkovski, Wiltrud Kessler, and Gunther Heidemann. Relational feature engineering of natural language processing. In Proceedings of the 19th ACM international conference on Information and knowledge management, CIKM ’10, pages 1705–1708, Toronto,
ON, Canada, 2010. ACM.
T. Kohonen. Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43:
59–69, 1982.
Beata Kouchnir. A machine learning approach to German pronoun resolution. In Proceedings of the
Annual Meeting on Association for Computational Linguistics 2004 workshop on Student research,
ACLstudent ’04, Barcelona, Spain, 2004. Association for Computational Linguistics.
Xiaoqiang Luo. On coreference resolution performance metrics. In Proceedings of the conference on
Human Language Technology and Empirical Methods in Natural Language Processing, pages 25–32,
Vancouver, British Columbia, Canada, 2005. Association for Computational Linguistics.
Andrew Mccallum and Ben Wellner. Toward conditional models of identity uncertainty with application
to proper noun coreference. In Proceedings of the IJCAI Workshop on Information Integration on the
Web. MIT Press, 2003.
Andrew Mccallum and Ben Wellner. Conditional models of identity uncertainty with application to noun
coreference. In Neural Information Processing Systems (NIPS), Vancouver, British Columbia, Canada,
2004.
185
Bibliography
Joseph F Mccarthy and Wendy G Lehnert. Using decision trees for coreference resolution. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 1050–1055,
Montreal, Quebec, Canada, 1995.
A. McEnery, I. Tanaka, and S. Botley. Corpus annotation and reference resolution. In Proceedings
of a Workshop on Operational Factors in Practical, Robust Anaphora Resolution for Unrestricted
Texts, ANARESOLUTION ’97, pages 67–74, Madrid, Spain, 1997. Association for Computational
Linguistics.
Christoph Müller and Michael Strube. Multi-level annotation of linguistic data with MMAX2. In Sabine
Braun, Kurt Kohn, and Joybrato Mukherjee, editors, Corpus Technology and Language Pedagogy:
New Resources, New Tools, New Methods, pages 197–214. Peter Lang, Frankfurt a.M., Germany,
2006.
Vincent Ng. Supervised noun phrase coreference research: the first fifteen years. In Proceedings of the
48th Annual Meeting of the Association for Computational Linguistics, ACL ’10, pages 1396–1411,
Uppsala, Sweden, 2010. Association for Computational Linguistics.
Vincent Ng and Claire Cardie. Improving machine learning approaches to coreference resolution. In
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 104–
111, Philadelphia, PA, USA, 2002.
Cristina Nicolae and Gabriel Nicolae. Bestcut: A graph algorithm for coreference resolution. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, EMNLP
’06, pages 275–283, Sydney, Australia, 2006. Association for Computational Linguistics.
J. Ross Quinlan. C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1993.
Altaf Rahman and Vincent Ng. Supervised models for coreference resolution. In Proceedings of the
Conference on Empirical Methods in Natural Language Processing, pages 968–977, Singapore„ 2009.
William M. Rand. Objective criteria for the evaluation of clustering methods. Journal of the American
Statistical Association, 66(336):846–850, 1971.
Marta Recasens and Eduard Hovy. Blanc: Implementing the rand index for coreference evaluation.
Natural Language Engineering, 17:485–510, 2010.
Marta Recasens, Lluís Màrquez, Emili Sapena, M. Antònia Martí, Mariona Taulé, Véronique Hoste,
Massimo Poesio, and Yannick Versley. Semeval-2010 task 1: Coreference resolution in multiple
languages. In Proceedings of the 5th International Workshop on Semantic Evaluation, SemEval ’10,
pages 1–8, Los Angeles, California, 2010. Association for Computational Linguistics.
Michael Schiehlen. Optimizing algorithms for pronoun resolution. In Proceedings of the 20th international conference on Computational Linguistics, COLING ’04, pages 515–521, Geneva, Switzerland,
2004. Association for Computational Linguistics.
Helmut Schmid. Improvements in part-of-speech tagging with an application to German. In Proceedings
of the EACL SIGDAT-Workshop, pages 47–50, Dublin, 1995.
Helmut Schmid and Florian Laws. Estimation of conditional probabilities with decision trees and an
application to fine-grained pos tagging. In Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1, COLING ’08, pages 777–784, Manchester, United Kingdom, 2008.
Association for Computational Linguistics.
186
Bibliography
SfS Universität Tübingen. TüBa-D/Z Release 6, Last checked 30. December 2010. URL http://
www.sfs.uni-tuebingen.de/tuebadz.shtml.
Wojciech Skut and Thorsten Brants. A maximum-entropy partial parser for unrestricted text. In Proceedings of the Sixth Workshop on Very Large Corpora, pages 143–151, Montreal, Quebec, 1998.
Wee Meng Soon, Hwee Tou Ng, and Daniel Chung Yong Lim. A machine learning approach to coreference resolution of noun phrases. Computational Linguistics, 27(4):521–544, 2001.
Veselin Stoyanov, Nathan Gilbert, Claire Cardie, and Ellen Riloff. Conundrums in noun phrase coreference resolution: Making sense of the State-of-the-Art. In Proceedings of the Joint Conference of the
47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint
Conference on Natural Language Processing of the Asian Federation of Natural Language Processing
(ACL-IJCNLP 2009), pages 656–664, Suntec, Singapore, August 2009. Association for Computational
Linguistics.
Michael Strube, Stefan Rapp, and Christoph Müller. The influence of minimum edit distance on reference resolution. In Proceedings of the ACL-02 conference on Empirical methods in natural language
processing - Volume 10, EMNLP ’02, pages 312–319, Philadelphia, PA, USA, 2002. Association for
Computational Linguistics.
STTS. The stuttgart-tübingen-tag-set, Last checked 02. March 2011. URL http://www.ims.
uni-stuttgart.de/projekte/corplex/TagSets/stts-table.html.
Roland Stuckardt. Three algorithms for competence-oriented anaphor resolution. In Proceedings of the
5th Discourse Anaphora and Anaphor Resolution Colloquium (DAARC2004), pages 157–163, Sao
Miguel/Azores, 2004.
Yannick Versley. Parser evaluation across text types. In Proceedings of the 4th Workshop on Treebanks
and Linguistic Theories (TLT 2005), Barcelona, Spain, 2005.
Yannick Versley. A constraint-based approach to noun phrase coreference resolution in German newspaper text. In Proceedings of the Konferenz zur Verarbeitung Natürlicher Sprache (KONVENS), Constance, Germany, 2006.
Yannick Versley, Simone Paolo Ponzetto, Massimo Poesio, Vladimir Eidelman, Alan Jern, Jason Smith,
Xiaofeng Yang, and Alessandro Moschitti. Bart: A modular toolkit for coreference resolution. In
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08),
Marrakech, Morocco, may 2008. European Language Resources Association (ELRA).
Renata Vieira and Massimo Poesio. An empirically based system for processing definite descriptions.
Comput. Linguist., 26:539–593, December 2000.
Marc Vilain, John Burger, John Aberdeen, Dennis Connolly, and Lynette Hirschman. A model-theoretic
coreference scoring scheme. In Proceedings of the 6th conference on Message understanding, MUC6
’95, pages 45–52, Columbia, Maryland, 1995. Association for Computational Linguistics.
Robert A. Wagner and Michael J. Fischer. The string-to-string correction problem. Journal of the ACM,
21:168–173, January 1974.
Christopher Walker, Stephanie Strassel, Julie Medero, and Kazuaki Maeda. ACE 2005 Multilingual
Training Corpus. Linguistic Data Consortium, Philadelphia, 2006.
Ralph Weischedel, Sameer Pradhan, Lance Ramshaw, Martha Palmer, Nianwen Xue, Mitchell Marcus,
Ann Taylor, Craig Greenberg, Eduard Hovy, Robert Belvin, and Ann Houston. OntoNotes Release
2.0. Linguistic Data Consortium, Philadelphia, 2008.
187
Bibliography
Xiaofeng Yang, Guodong Zhou, Jian Su, and Chew Lim Tan. Coreference resolution using competition
learning approach. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1, ACL ’03, pages 176–183, Sapporo, Japan, 2003. Association for Computational
Linguistics.
Desislava Zhekova and Sandra Kübler. Ubiu: A language-independent system for coreference resolution.
In Proceedings of the 5th International Workshop on Semantic Evaluation, SemEval ’10, pages 96–99,
Los Angeles, California, 2010. Association for Computational Linguistics.
188

Feature Engineering for Coreference Resolution in German

Transcription

Similar documents

How-To-Do M2Web Feature Inhaltsverzeichnis

Untitled - Austrian Film Commission

Triton

Kein Fummeln im Auto

MAVI 2.0 – Konzept und Möglichkeiten

press kit - Reverse Angle

KAPITEL I DIE EINLEITUNG

pocket-napkins - Baekkelund by Mank

E15 2015

Der Beistelltisch PA03 ALEX bildet mit seiner

Der hocker st04 backenzahn™ ist bereits ein klassiker des

double paged folder - low resolution