Praharsha Pandarinath Sirsi

Transcription

Praharsha Pandarinath Sirsi
Seminar : Knowledge Representation and Domain Ontologies
Presented by: Praharsha Sirsi
In linguistics, semantic analysis is the process of
relating syntactic structures, from the levels of
phrases, clauses, sentences and paragraphs to the
level of the writing as a whole.
 The Mikrokosmos (µK) Machine Translation System is
a knowledge-based machine translation (KBMT)
system.
 Defining a methodology for representing the
meaning of natural language texts in a languageneutral interlingual format called a text meaning
representation (TMR)

Knowledge Based Machine Translation
(KBMT)
 Unifying the use of Microtheories
 Use of Test Meaning Representation (TMR)
to derive the meaning in a neutral
interlingual format
 Main focus is on Lexical-Semantic
dependency

Language-neutral description
 Deliberately syntax neutral
 Lexical-semantic dependencies
 Stylistic factors, discourse relations, speaker
attitudes, and other pragmatic factors

Subject
Category: Verb
Category: Noun
Object
Verbs that require only two
arguments, a subject and
a single direct object, are
called monotransitive
The agent and theme will be
filled by TMR relating to
AGENT: ‘grupo Roche’ (var1)
THEME: ‘Dr. Andreu’ (var2)
Additional information from the
ACQUIRE TMR will be filled here like
‘a traves de’ treated as a phrasal
entry, will add an INSTRUMENT slot





SYN-STRUC provide an interaction with the meaning
pattern from the SEM zone
Meaning pattern for a phrase or clause are determined
by the semantics of the components (Principle of
Compositionality)
The SEM zone can add language-specific semantic
constraints
For example, the English verb ‘to taxi’ as in ‘the jet
taxied to the gate’ maps into a GROUND-CONTACTMOTION, but further specifies that its INSTRUMENT must
have AIRCRAFT semantics
However, it is not possible to define every conceivable
mapping






Combining the knowledge contained
in the ontology and lexicon and
applying it to the current input to
produce output TMRs
Retrieve the appropriate semantic
constraints for each possible word
sense
Test each of the constraints in context
Use the SEM zones of the word senses
to find the best possible combination
of constraints
Use other Microtheories to the Core
TMRs
Construct the output TMRs

Gather all possible word sense mappings using the lexicon
entries for each of the words

For each word sense, the SYN-STRUC zone must be examined
Now examine the SEM zone of each word
sense to construct a list of constraints
 Constraints can arise from 5 sources

These constraints ask about the fillers
 What kind of concepts can this filler modify
with the given slot?
 Example: HAMMER, when used as the filler
for an INSTRUMENT slot usually modifies
some sort of BUILD event

The slots restricts what its DOMAIN and RANGE can
be
 Example: AGENT slot requires its DOMAIN to be an
EVENT and its RANGE to be HUMAN, whereas a
THEME slot requires an EVENT for the DOMAIN but
can have any OBJECT or EVENT for its RANGE
 There are default values for the slots in the
Ontology
 These constraints are always very general, but still
can help eliminate wrong attachments and word
meanings

Lexicon entry of
‘a-traves-de’
depending on the
meaning used, will
either add a
LOCATION slot or
an INSTRUMENT slot to
the TMR
Location Slot
Instrument Slot
The slot will be filled by
the TMR that results from
‘compania’ which maps
into either a
CORPORATION or a
SOCIAL-EVENT




Ontological graph search function
Function determines relevant paths between two
concepts and returns a score based on their
degree of closeness
Example: checkonto-con(ACQUIRE EVENT) returns
a score of 1.0 (out of 1.0) since ACQUIRE is a type
of EVENT.
However,check-onto-con(ORGANIZATION HUMAN)
returns a score of 0.9 along with the path
(ORGANIZATION HASMEMBER HUMAN). This
indicates that ORGANIZATION can stand in the
place of HUMAN because it has HUMAN members


Each combination activates the applicable
constraints, which are combined into a total score
for the combination
Combination with the best total score is chosen
Will choose INSTRUMENT
As LOCATION would
require ‘adquirir’ to be
physical object
Will choose LOCATION
As TEMPORAL would
require ‘espana’ to be
temporal object
The choice is not yet defined
Additional Ontological Information
Statistical Information
Will choose ACQUIRE
As LEARN would require
‘Dr.Andrew’ to be
information
Will choose ORAGANIZATION
As HUMAN cannot be the
theme of ‘ACQUIRE’
Finding Constraints and using them to generate
valid TMRs requires a lot of processing and many
computational techniques were being developed
for efficient processing.
 The analyzer utilizes an opportunistic, ‘bulletinboard’ processing scheme
 This scheme makes use of the following
computational techniques



Difficulty in natural language processing is the
complex interplay of constraints. Choosing one
particular sense of a word may seem locally
optimal, but it may create problems elsewhere
Makes use of Dependency-directed analysis which
systematically tracks dependencies and can
› propagate related constraints forward automatically,
› automatically detect inconsistent solutions, and
› be used in failure processing to determine the cause of
failures and suggest recoveries



Statistical data to determine the most likely senses
of the input words
Senses are tested first, and if a result that ‘satisfies’
is obtained, processing ends
This approach is extended to every aspect of
processing, even failure recovery
Failures from
› Spelling errors
› Syntactic analysis
› Lexicon and/or Ontology can be erroneous
› Lack needed information
› Analyzer makes incorrect decisions
 MikroKosmos tries to deal with these problems by:
› using the dependency analysis to see why failures occurred
› checking for metonymic/metaphoric language
› if missing slot fillers, positing gaps (ellipsis)
› changing syntactic analysis, including trying different
attachments
› relaxing thresholds
› ordering possible recoveries using a sophisticated ‘‘best first’’
approach


If the basic semantic constraints cannot fully disambiguate,
then the MikroKosmos will
› use collocational preferences stored in the lexicon
› use statistical methods to determine the most likely
meanings
› allow the ambiguity to remain. Subsequent clauses
combined with coreferences might resolve the problem.
› apply attachment rules such as ‘referential success’
and/or ‘minimal attachment’
› Use ‘expectations’ to moderate. For instance, in the
current example, if one of the ‘adquirir’ senses expected
an INSTRUMENT slot (which ‘a-traves-de’ adds), favor that.