Praharsha Pandarinath Sirsi
Transcription
Praharsha Pandarinath Sirsi
Seminar : Knowledge Representation and Domain Ontologies Presented by: Praharsha Sirsi In linguistics, semantic analysis is the process of relating syntactic structures, from the levels of phrases, clauses, sentences and paragraphs to the level of the writing as a whole. The Mikrokosmos (µK) Machine Translation System is a knowledge-based machine translation (KBMT) system. Defining a methodology for representing the meaning of natural language texts in a languageneutral interlingual format called a text meaning representation (TMR) Knowledge Based Machine Translation (KBMT) Unifying the use of Microtheories Use of Test Meaning Representation (TMR) to derive the meaning in a neutral interlingual format Main focus is on Lexical-Semantic dependency Language-neutral description Deliberately syntax neutral Lexical-semantic dependencies Stylistic factors, discourse relations, speaker attitudes, and other pragmatic factors Subject Category: Verb Category: Noun Object Verbs that require only two arguments, a subject and a single direct object, are called monotransitive The agent and theme will be filled by TMR relating to AGENT: ‘grupo Roche’ (var1) THEME: ‘Dr. Andreu’ (var2) Additional information from the ACQUIRE TMR will be filled here like ‘a traves de’ treated as a phrasal entry, will add an INSTRUMENT slot SYN-STRUC provide an interaction with the meaning pattern from the SEM zone Meaning pattern for a phrase or clause are determined by the semantics of the components (Principle of Compositionality) The SEM zone can add language-specific semantic constraints For example, the English verb ‘to taxi’ as in ‘the jet taxied to the gate’ maps into a GROUND-CONTACTMOTION, but further specifies that its INSTRUMENT must have AIRCRAFT semantics However, it is not possible to define every conceivable mapping Combining the knowledge contained in the ontology and lexicon and applying it to the current input to produce output TMRs Retrieve the appropriate semantic constraints for each possible word sense Test each of the constraints in context Use the SEM zones of the word senses to find the best possible combination of constraints Use other Microtheories to the Core TMRs Construct the output TMRs Gather all possible word sense mappings using the lexicon entries for each of the words For each word sense, the SYN-STRUC zone must be examined Now examine the SEM zone of each word sense to construct a list of constraints Constraints can arise from 5 sources These constraints ask about the fillers What kind of concepts can this filler modify with the given slot? Example: HAMMER, when used as the filler for an INSTRUMENT slot usually modifies some sort of BUILD event The slots restricts what its DOMAIN and RANGE can be Example: AGENT slot requires its DOMAIN to be an EVENT and its RANGE to be HUMAN, whereas a THEME slot requires an EVENT for the DOMAIN but can have any OBJECT or EVENT for its RANGE There are default values for the slots in the Ontology These constraints are always very general, but still can help eliminate wrong attachments and word meanings Lexicon entry of ‘a-traves-de’ depending on the meaning used, will either add a LOCATION slot or an INSTRUMENT slot to the TMR Location Slot Instrument Slot The slot will be filled by the TMR that results from ‘compania’ which maps into either a CORPORATION or a SOCIAL-EVENT Ontological graph search function Function determines relevant paths between two concepts and returns a score based on their degree of closeness Example: checkonto-con(ACQUIRE EVENT) returns a score of 1.0 (out of 1.0) since ACQUIRE is a type of EVENT. However,check-onto-con(ORGANIZATION HUMAN) returns a score of 0.9 along with the path (ORGANIZATION HASMEMBER HUMAN). This indicates that ORGANIZATION can stand in the place of HUMAN because it has HUMAN members Each combination activates the applicable constraints, which are combined into a total score for the combination Combination with the best total score is chosen Will choose INSTRUMENT As LOCATION would require ‘adquirir’ to be physical object Will choose LOCATION As TEMPORAL would require ‘espana’ to be temporal object The choice is not yet defined Additional Ontological Information Statistical Information Will choose ACQUIRE As LEARN would require ‘Dr.Andrew’ to be information Will choose ORAGANIZATION As HUMAN cannot be the theme of ‘ACQUIRE’ Finding Constraints and using them to generate valid TMRs requires a lot of processing and many computational techniques were being developed for efficient processing. The analyzer utilizes an opportunistic, ‘bulletinboard’ processing scheme This scheme makes use of the following computational techniques Difficulty in natural language processing is the complex interplay of constraints. Choosing one particular sense of a word may seem locally optimal, but it may create problems elsewhere Makes use of Dependency-directed analysis which systematically tracks dependencies and can › propagate related constraints forward automatically, › automatically detect inconsistent solutions, and › be used in failure processing to determine the cause of failures and suggest recoveries Statistical data to determine the most likely senses of the input words Senses are tested first, and if a result that ‘satisfies’ is obtained, processing ends This approach is extended to every aspect of processing, even failure recovery Failures from › Spelling errors › Syntactic analysis › Lexicon and/or Ontology can be erroneous › Lack needed information › Analyzer makes incorrect decisions MikroKosmos tries to deal with these problems by: › using the dependency analysis to see why failures occurred › checking for metonymic/metaphoric language › if missing slot fillers, positing gaps (ellipsis) › changing syntactic analysis, including trying different attachments › relaxing thresholds › ordering possible recoveries using a sophisticated ‘‘best first’’ approach If the basic semantic constraints cannot fully disambiguate, then the MikroKosmos will › use collocational preferences stored in the lexicon › use statistical methods to determine the most likely meanings › allow the ambiguity to remain. Subsequent clauses combined with coreferences might resolve the problem. › apply attachment rules such as ‘referential success’ and/or ‘minimal attachment’ › Use ‘expectations’ to moderate. For instance, in the current example, if one of the ‘adquirir’ senses expected an INSTRUMENT slot (which ‘a-traves-de’ adds), favor that.