Verb Sense Disambiguation Tiago Manuel Paulo Travanca
Transcription
Verb Sense Disambiguation Tiago Manuel Paulo Travanca
Verb Sense Disambiguation Tiago Manuel Paulo Travanca Advisors: Prof. Nuno Mamede and Prof. Jorge Baptista Instituto Superior T´ecnico Spoken Language Systems Laboratory – INESC-ID Lisboa Rua Alves Redol 9, 1000-029 Lisboa, Portugal tiago.travanca@ist.utl.pt Abstract. This work addresses the problem of Verb Sense Disambiguation (VSD) for the European Portuguese. VSD is a sub-problem of the Word Sense Desambiguation (WSD) problem, which tries to identify the sense of a polysemic word used in a given sentence. Two approaches to VSD were considered. The first, rule-based, approach makes use of the lexical, syntactic and semantic descriptions of the verb senses present in a lexical resource (ViPEr). The second approach uses machine learning with a set of features commonly used in WSD. Both approaches were tested in several scenarios to determine the impact of different features and different combinations of methods. The baseline accuracy of 84%, resulting from the Most Frequent Sense (MFS) for each verb lemma, was both surprisingly high and hard to surpass. Still, both approaches provided some improvement over this value. The best combination of the three techniques yielded an accuracy of 87.2%, a gain of 3.2% above the baseline. 1 Introduction Nowadays, Natural Language Processing (NLP) has many applications: search engines (semantic/topic search rather than word matching), automated speech translation, automatic summarization, among others. Still, one problem that all NLP systems have to deal with is ambiguity, i.e. the fact that a certain expression can be interpreted in more than one way. Ambiguity appears at different stages in the processing of a text or a sentence, such as, tokenization, sentence-splitting, part-of-speech (POS) tagging, syntactic parsing and semantic processing. This work addresses semantic ambiguity of verbs. Considering that the sentence has already been parsed and, even if its syntactic analysis (parse tree) is unique and correct, some words may feature more than one meaning for the grammatical category they were tagged with. For example, sentences (1a) and (1b) show how the verb to count, used in the same position on both sentences, means enumerate on the first sentence while it stands for rely on the second: (1a) I am counting the number of students (1b) I am counting on you to help me 2 In these examples, the meaning of count in (1a) results from its directtransitive construction and plural object, while in (1b) the preposition introducing the second argument and the human trait of that same argument of count determine the verb’s sense. To tackle this problem, two modules were developed: one, rule-based, using a set of linguistic resources, specifically built for Portuguese (ViPEr); and another, based on machine learning, which makes use of a set of features, commonly adopted for the WSD problem, to correctly guess the verb sense. Both modules were integrated in a fully-fledged NLP system, STRING [1]. 2 Related Work 2.1 Lexical Resources Many works on WSD [2,3,4] rely on existing linguistic resources as catalogues of word senses (and their relations). For English, one can cite WordNet [5], FrameNet [6,7,8] and VerbNet [9]. For (European) Portuguese, though other resources have been built they are not yet freely available to the general public. We thus considered using ViPEr [10], which features a classification based on the syntactic-semantic properties of the most frequently occurring verbs. In the particular case of VSD, the most important features to be considered are formal features, such as the syntactic patterns of the sentence forms, and the distributional constraints resulting from the selectional restrictions of the verb on its arguments. Besides the resource here mentioned, there are other known sources with linguistic information on verbal constructions of Portuguese. Among others, the most relevant (and recent) are described in [11] and [12], each developed in a different theoretical framework. However, none of these works have a machine readable form, thus, they cannot be automatically processed. 2.2 Word Sense Disambiguation Approaches An overview of the most used algorithms and features for WSD was conducted, based on the systems evaluated at the SensEval31 [13] All-Words and Lexical Sample tasks (2004). The most commonly used techniques [14] were the Naive Bayes algorithm, Decision Lists, Vector Space Model (with cosine similarity function) and Support Vector Machines; other methods, such as Maximum Entropy models, Adaptative boosting and Latent Semantic Analysis, were also used by some systems. The majority of systems proposed at SensEval3 divided the features used to describe the instances into four different groups, as follows: – Collocations: n-grams (usually bigrams or trigrams) around the target word, composed by the lemma, word-form and part-of-speech tag of each word. – Syntactic dependencies: dependencies connecting the surrounding words to the target word. Typically, the relations of subject, object and modifier are used. 1 http://www.senseval.org/ 3 – Surrounding context: words in a defined window size are collected and used in a bag-of-words approach. Usually, the word form and lemma are extracted; – Knowledge-Based information: some systems make use of information such as WordNet domains, or FrameNet syntactic patterns, for example. The results obtained by the best systems at the SensEval3 conference for the English All-Words and the Lexical Sample tasks were 65.1% and 72.9% precision, respectively. 3 Rule-based Disambiguation Mappings Verb Lexicon Rules Rule Sorting Differences Rule Generation Verb List Difference Finder ViPEr Pre Processing The large number of verbs involved and the complexity of the rule generation task required the construction a specific module for rule generation, whose architecture is illustrated in Figure 1. Disamb. Rules Feature Priority Fig. 1. Rule Generation Module Architecture The first step in the rule generation process, the Pre-Processing, takes as its input the lexical resource information (ViPEr) and converts the linguistic information to an internal structure that represents each verb as a set of meanings. Each meaning, in turn, is represented as a collection of features and their possible values. This module is also responsible for producing the lexicon file (the verb lemmas and their meanings) used by the grammar of XIP [15], the parsing module of STRING. After the parsing is complete, the system passes the processed information to the next module: the Difference Finder. This module compares the different meanings of the same lemma and generates a difference for each argument found different between each pair of meanings. The rule generation module is the last transformational step in the system. The module transforms differences into rules, with each difference usually generating two rules, one for each verb meaning contained in that difference. This step requires an additional parametrization file, mappings, to make the bridge between the features used in ViPEr and those provided by XIP. 4 Finally, the system sorts the rules according to the feature/trait being tested in each of them, since XIP executes the rules in the order they are present in the file. This order favours, firstly, the properties related to structural patterns in sentences, followed by several semantic/selectional restrictions, according to their strength. The base features considered by the system are the selectional restrictions on the verb’s arguments (N0 to N3 ) and their respective prepositions, the property vse, denoting intrinsically reflexive constructions (e.g. queixar-se ‘complain’) and transformational properties regarding the most common two types of passives (with auxiliary verbs ser and estar ‘be’). Additionally, two other properties were added to ViPEr during this work and are used by the rule generation system. The first, vdic property, indicates which verb meanings (usually only one per lemma) allows the verbum dicendi construction pattern, with subject inversion. Then, the aNi=lhe feature, indicating the possibility of a dative pronominalization for some of the verb arguments. 4 Machine Learning Disambiguation The architecture of the supervised classification approach, presented in Figure 2, uses MegaM [16] as the machine learning module, which is based on maximum entropy models [17]. The typical feature extraction and classification modules of supervised classification systems were both implemented inside XIP (that is, inside STRING), to allow XIP rules to be built afterwards to solve other NLP tasks, such as semantic role labeling. In the training phase, STRING takes annotated data (sentences) and extracts the features describing each verb instance and its respective manually annotated verb class (label). Then, MegaM uses that information to build a classification model for each verb lemma. In the prediction phase, STRING assigns a verb class (label) to each verb instance in the raw data, using the models built by MegaM. Label STRING Annotated Data MegaM Classification Model Features STRING Raw Data Labelled Data Fig. 2. Supervised Machine Learning Architecture for VSD using STRING 5 4.1 Training Corpus Due to the time required for manual annotation of the training data, only some verbs were addressed by this approach during this project. The verbs chosen were: explicar, falar, ler, pensar, resolver, saber and ver. This choice was based on the higher number of instances left to disambiguate that these lemmas exhibited after the rule-based testing. Further analysis revealed that these verbs also formed an heterogeneous group, with different MFS percentage values, different number of senses and different ambiguity classes. 4.2 Features The features used by the machine learning module follow those presented in section 2.2, and can be organized into three groups, as follows: – Local features, extracted by the system in a window of size 3 around the target verb, resulting in a total of 6 tokens, with their respective indexes (-3 to +3). Information regarding the POS tag and lemma of each token was considered. – Syntactic features, regarding constituents with a direct relation with the verb. The dependencies used in the system are those of SUBJ (subject), CDIR (direct complement) and MOD (modifier). The POS tag and lemma of the head word of each node in these relations is extracted, together with the dependency name. – Semantic features are extracted for the head words of the nodes that have a direct relation with the verb, as they act as the selectional restrictions on the verb arguments. Each semantic trait found in the head word of the node is also accompanied by the dependency name. The semantic features considered by the system are those defined in ViPEr. Additional features that do no fit exactly into any of those categories were also used. These features concern the subject inversion pattern of verbum dicendi constructions and information regarding the case of clitic personal pronouns. 5 Evaluation To evaluate the two approaches presented in the previous sections, a manually annotated corpus had to be built and validated. The corpus chosen for this evaluation contains around 250 thousand words with, 38,827 verbs manually disambiguated, from which 21,368 (about 55%) are full verbs. From the full verb constructions, 13,030 correspond to ambiguous verbs. The most frequent sense (MFS) for each verb lemma was adopted as the baseline against which the developed modules were compared and evaluated. Since the corpus was the only available annotated data, it had to be used both for counting (to determine the MFS) and for evaluation. To avoid biased results, the common 10-fold cross-validation method was used. An accuracy of 84% was achieved by MFS. This is related to the conservative approach underlying the ViPEr classification, leading to an average lower number of senses per verb when compared to other lexical resources. 6 5.1 Rules+MFS This experiment tested three different combinations of the rule-based approach and the MFS (baseline). In this case, the MFS classifier was applied after the rule-based module to the remaining non-fully disambiguated instances. In the first experiment, the disambiguation rules were applied to all the verbs, regardless of their number of senses per lemma, obtaining an accuracy of 79.15%, below the baseline of 84%. For the second scenario, a finer grained combination was performed, with rule-based disambiguation being applied only to verbs with 8 or more meanings. This resulted in 84.41% accuracy, which was 0.41 percentage points over the baseline. The last experiment, considered the results of the previous one, but also included other verb lemmas that had shown positive results with rules, but had fewer meanings. The final result obtained in the rule-based disambiguation experiments reached the 86.46% accuracy (2.46 percentage points above the baseline). 5.2 Machine Learning Disambiguation Accuracy The first machine learning disambiguation experiment tested the impact of using bias, a special feature calculated by the system during the training step, which indicates the deviation of the model towards each class. Comparing the two scenarios (with and without bias) against the previously presented methods showed that, whenever a verb has a high MFS, it is difficult for another approach to surpass it, and only ML-Bias can equal the MFS accuracy. All verbs with low MFS exhibited some improvement in the results (using ML-NoBias) when compared to the MFS baseline. Still, for the verb pensar, ML-NoBias is outperformed by the combination of rules and MFS. 100 90 80 70 60 50 40 30 20 10 0 explicar falar MFS ler Rules+MFS pensar ML-NoBias resolver ML-Bias saber ver Rules+ML Fig. 3. Machine Learning Results Comparison Looking at the set of classes associated with each lemma, both resolver and ver share the same ambiguity set (06, 32C), while pensar has more senses (06, 08, 7 16, 32C, 35R, 36R). This suggests that ML might not be able to deal with verbs that have many senses or that a specific set of classes is better disambiguated by this technique, though further testing encompassing more verbs should be performed to verify this finding. Afterwards, another scenario was considered, to test the performance of ML as a complementary technique to the rule-based disambiguation, similarly to the scenario that combined rules and MFS, presented in section 5.1. Globally, adding ML as a complementary technique to rules proved to be worse than just using ML for the majority of the verbs here studied. This suggests that the instances being correctly classified by the rules are a subset of the instances correctly disambiguated by ML, and that rules are incorrectly classifying the remaining instances, which ML would otherwise guess correctly. This experiment showed that rules and ML were not very complementary, and that choosing ML alone provides the best improvement to the system. For the cases where Rules+ML surpasses ML alone, other approaches provide better results. Further testing of the ML scenarios should be performed to confirm these findings, namely, using annotated data regarding other verb lemmas and trying out other combination of methods and other scenarios. 5.3 Rules+MFS+ML With all the previous information regarding each verb lemma, a final combination using all the techniques was done, choosing the best method for each lemma. This last experiment obtained the best overall results, settling a new maximum accuracy of 87.2% for the system, improving 3.2 percentage points above the baseline. Finally, a performance test was conducted to assess the impact of the newly added modules to the global performance of STRING. This test revealed a minimal increase in the global time required for processing (around 0.05%, for 5,000 sentences). 6 Conclusions This work addressed the problem of verb sense disambiguation for the European Portuguese. Two approaches were considered when tackling this problem: a ruled-based, and a machine learning disambiguation. The results obtained from the baseline (MFS) were surprisingly high (about 84% accuracy) and proved to be hard to surpass. The integration of both modules provided better results to the system, achieving a final score of 87.2% accuracy, an improvement of 3.2% above baseline. Certainly, there is still much work to be done in VSD, and the system here presented can be improved. Nevertheless, we would like to think that some valid contribution has been made to the domain. The integral work here presented is available at http://www.inesc-id.pt/pt/ indicadores/Ficheiros/7279.pdf 8 References 1. Mamede, N.J., Baptista, J., Diniz, C., Cabarr˜ ao, V.: STRING: An Hybrid Statistical and Rule-Based Natural Language Processing Chain for Portuguese. In: Propor 2012, Demo Session (2012) Paper available at http://www.propor2012. org/demos/DemoSTRING.pdf 2. Buscaldi, D., Rosso, P., Masulli, F.: The upv-unige-CIAOSENSO WSD system. In Mihalcea, R., Edmonds, P., eds.: Senseval-3: 3rd International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain, ACL (2004) 77–82 3. Shi, L., Mihalcea, R.: Open Text Semantic Parsing using FrameNet and WordNet. In: Demonstration Papers at HLT-NAACL 2004. HLT-NAACL–Demonstrations ’04, Stroudsburg, PA, USA, ACL (2004) 19–22 4. Brown, S.W., Dligach, D., Palmer, M.: VerbNet class assignment as a WSD task. In: Proceedings of the 9th International Conference on Computational Semantics. IWCS ’11, Stroudsburg, PA, USA, ACL (2011) 85–94 5. Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: WordNet: An On-line Lexical Database. International Journal of Lexicography 3 (1990) 235–244 6. Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet Project. In: Proceedings of the 36th Annual Meeting of the ACL and 17th COLING - Volume 1. ACL ’98, Stroudsburg, PA, USA, ACL (1998) 86–90 7. Johnson, C.R., Fillmore, C.J., Wood, E.J., Ruppenhofer, J., Urban, M., Petruck, M.R., Baker, C.F.: The FrameNet Project: Tools for Lexicon Building (2001) 8. Ruppenhofer, J., Ellsworth, M., Petruck, M.R., Johnson, C.R., Scheffczyk, J.: FrameNet II: Extended Theory and Practice. International Computer Science Institute, Berkeley, California (2006) Distributed with the FrameNet data. 9. Kipper, K., Dang, H.T., Schuler, W., Palmer, M.: Building a Class-Based Verb Lexicon using TAGs. In: TAG+5 5th Workshop on Tree Adjoining Grammars and Related Formalisms, Paris, France (2000) 10. Baptista, J.: ViPEr: A Lexicon-Grammar of European Portuguese Verbs. In: 31st International Conference on Lexis and Grammar, Nov´e Hrady, Czech Republic (2012) 10–16 11. Borba, F.: Dicion´ ario Gramatical de Verbos do Portuguˆes Contemporˆ aneo do Brasil. UNESP, S˜ ao Paulo (1990) 12. Busse, W.: Dicion´ ario Sint´ actico de Verbos Portugueses. Almedina, Coimbra (1994) 13. Mihalcea, R., Edmonds, P., eds.: Senseval-3: 3rd International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain, ACL (2004) 14. Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques. 3rd edn. Morgan Kaufmann, Amsterdam (2011) 15. Ait-Mokhtar, S., Chanod, J., Roux, C.: Robustness beyond shallowness: incremental dependency parsing. Natural Language Engineering 8(2/3) (2002) 121–144 16. Daum´e, H.: Notes on CG and LM-BFGS Optimization of Logistic Regression. Paper available at http://pub.hal3.name#daume04cg-bfgs, implementation available at http://hal3.name/megam/ (2004) 17. Berger, A.L., Pietra, S.A.D., Pietra, V.J.D.: A Maximum Entropy approach to Natural Language Processing. Computational Linguistics 22 (1996) 39–71