Untitled
Transcription
Untitled
Outline 1. The gap: Natural language and Semantic Web data 2. The bridge: Ontology lexica 3. Building bridges Hand-crafting ontology lexica Learning ontology lexica Limitations Outline 1. The gap: Natural language and Semantic Web data 2. The bridge: Ontology lexica 3. Building bridges Hand-crafting ontology lexica Learning ontology lexica Limitations The Semantic Web From the human-readable web of documents to the machine-readable web of data England birthPlace capital birthDate London urbanPopulation 9 787 426 . Alan Turing 1912-06-23 SELECT ?x WHERE { dbpedia:Hallasan dbpedia-owl:elevation ?x . } SELECT ?x WHERE { dbpedia:Hallasan dbpedia-owl:elevation ?x . } . Hallasan elevation 1,950 m • Hallasan is 1,950 m high. • Hallasan rises to 1,950 m. • The altitude of Hallasan is 1,950 m. • rt routeSta Santa Monica . Interstate 10 routeEnd Jacksonville • Interstate 10 links Santa Monica to Jacksonville. • Interstate 10 connects Santa Monica with Jacksonville. • Bruce. Lee child Shannon Lee child Wren Keasler gender Female • Wren Keasler is the granddaughter of Bruce Lee. • Bruce. Lee child Shannon Lee child Wren Keasler gender Male • Bruce Lee is the grandfather of Wren Keasler. • Outline 1. The gap: Natural language and Semantic Web data 2. The bridge: Ontology lexica 3. Building bridges Hand-crafting ontology lexica Learning ontology lexica Limitations dbpedia-owl:elevation rdfs:label "elevation"@en , "Höhe@de , "hoogte"@nl . Ontology lexica capture rich linguistic information • word forms • part of speech • subcategorization • meaning about how ontology elements are verbalized in a particular language. lemon (Lexicon Model for Ontologies) http://lemon-model.net lemon provides a meta-model for describing ontology lexica with RDF. lemon (Lexicon Model for Ontologies) http://lemon-model.net lemon provides a meta-model for describing ontology lexica with RDF. . Semantics by reference . The meaning of lexical entries is specified by pointing to elements in an ontology. . The lemon model (core) LexicalForm writtenRep:String form Lexicon entry canonicalForm otherForm abstractForm LexicalEntry Word Phrase language:String isSenseOf sense LexicalSense prefRef altRef hiddenRef reference isReferenceOf Ontology Part The lemon model (argument mapping) Lexical Entry sense synBehavior isSenseOf subsense LexicalSense propertyDomain propertyRange semArg context:Resource condition:Resource definition:Resource reference synArg Argument marker isReferenceOf Ontology Frame subjOfProp objOfProp isA Syntactic Role Marker Example play : LexicalEntry . partOfSpeech=verb Example : Form writtenRep="play"@en canonical form play : LexicalEntry . partOfSpeech=verb Example : Form writtenRep="play"@en canonical form play : LexicalEntry . partOfSpeech=verb sense : LexicalSense reference <http://dbpedia.org/ontology/team> Example : Form writtenRep="play"@en canonical form : IntransitivePPFrame synBehavior play : LexicalEntry . partOfSpeech=verb ct bje su prepositionalObject x : Argument sense : LexicalSense reference y : Argument <http://dbpedia.org/ontology/team> Example : Form writtenRep="play"@en canonical form : IntransitivePPFrame synBehavior play : LexicalEntry . partOfSpeech=verb ct bje su prepositionalObject sense x : Argument : LexicalSense reference y : Argument <http://dbpedia.org/ontology/team> marker : Word canonicalForm : Form writtenRep="for"@en Example : Form writtenRep="play"@en canonical form synBehavior : IntransitivePPFrame play : LexicalEntry . partOfSpeech=verb ct bje su prepositionalObject sense x : Argument subjOfProp : LexicalSense reference fProp objO y : Argument <http://dbpedia.org/ontology/team> marker : Word canonicalForm : Form writtenRep="for"@en Example connect : LexicalEntry . partOfSpeech=verb Example : Form writtenRep="connect"@en canonical form connect : LexicalEntry . partOfSpeech=verb Example : Form writtenRep="connect"@en canonical form connect : LexicalEntry . partOfSpeech=verb sense : LexicalSense subsense subsense : LexicalSense : LexicalSense reference reference dbpedia:routeStart dbpedia:routeEnd Example : Form writtenRep="connect"@en canonical form connect : LexicalEntry . partOfSpeech=verb sense t jec Ob prepositionalObject sub synBehavior ect dir ject : TransitivePPFrame y : Argument x : Argument : LexicalSense subsense subsense : LexicalSense reference dbpedia:routeStart z : Argument marker … : LexicalSense reference dbpedia:routeEnd Example : Form writtenRep="connect"@en canonical form t jec ject sense Ob prepositionalObject ect dir sub connect : LexicalEntry . partOfSpeech=verb synBehavior : TransitivePPFrame : LexicalSense y : Argument x : Argument objO subjOfProp subjO fProp fProp subsense subsense : LexicalSense reference dbpedia:routeStart z : Argument marker … objOfProp : LexicalSense reference dbpedia:routeEnd Outline 1. The gap: Natural language and Semantic Web data 2. The bridge: Ontology lexica 3. Building bridges Hand-crafting ontology lexica Learning ontology lexica Limitations Outline 1. The gap: Natural language and Semantic Web data 2. The bridge: Ontology lexica 3. Building bridges Hand-crafting ontology lexica Learning ontology lexica Limitations • lemon lexicon for DBpedia 3.9 • 354 classes (98 %) • 300 properties (17 %, all those with 10 000 or more occurences) • English, Spanish, German https://github.com/cunger/lemon.dbpedia Lemon design pattern library https://github.com/jmccrae/lemon.patterns StateVerb("play", dbpedia-owl:team, propSubj = Subject, propObj = prepositionalObject("for")) Lemon Assistant Outline 1. The gap: Natural language and Semantic Web data 2. The bridge: Ontology lexica 3. Building bridges Hand-crafting ontology lexica Learning ontology lexica Limitations Idea 1. For each predicate P in a data repository (e.g. DBpedia), collect the set of entities S and O connected through P. Example: spouse • Audrey Hepburn • Albert Einstein • … Mel Ferrer Mileva Maric Idea 2. Search a text corpus (e.g. Wikipedia) for all sentences containing the labels of S and O. • Mileva Maric, the future wife of Albert Einstein, was the only woman among the six students in the mathematics and physics section. • Einstein was married to Maric for 16 year. Idea 3. For all retrieved sentences, the natural language pattern connecting both entities is a potential lexicalization of P. • S, the future wife of O • S, wife of O • appos S. prep wife . . pobj of . O. M-ATOLL https://github.com/ag-sc/matoll/ BOA http://aksw.org/Projects/BOA.html Outline 1. The gap: Natural language and Semantic Web data 2. The bridge: Ontology lexica 3. Building bridges Hand-crafting ontology lexica Learning ontology lexica Limitations Limitations rt . routeSta routeEnd . child child gender Female Learning adjective lexicalizations . gender . nationality . religion Female "Australian" "Buddhist" • lemon http://lemon-model.net • W3C community group Ontology Lexica https://www.w3.org/community/ontolex/ • DBpedia lexicon https://github.com/jmccrae/lemon.patterns • M-ATOLL https://github.com/ag-sc/matoll/ • BOA http://aksw.org/Projects/BOA.html