Information Extraction and the Semantic Web
Transcription
Information Extraction and the Semantic Web
Information Extraction and the Semantic Web Lecture 3: Knowledge representation continued There is optional supplementary material available here Thanks for your feedback! The feedback was overwhelmingly positive: Results Proposals for improvement: • get bigger room • fix blackboard • speak more slowly: will try • make it a 6 CP class: yes, next time • move exam: Participate in Doodle! ”What is important?” The important things are the Def’s and Tasks. Here is a summary so far: Classes Relation loves: Tuples/ Facts Person Person Intensity Irma Mr.Bean 1.0 Mr.Bean Irma 0.227 Entities Literals ”It’s confusing” Philosophy is always confusing :-( It is much more honest to have the question without knowing the answer, than to have the answer without knowing the question. If confused, stick to the Def’s and Tasks! Ignore the Digressions. More Feedback • Make a break: will disrupt class :-( • Air conditioning: agreed • Is there additional material? Not for the exam. But see references at the end of each lecture. • Do I have to get all the Def’s right? Not verbatim, but you should be able to explain the concepts with your words. • Can you show applications? See Lecture 1 ”Show practical examples” Everything we have seen so far is implemented 1:1 in real KBs: Mr. Bean in YAGO Atkinson in YAGO Types of Atkinson Subclasses ”Make a quizz” Sure! We will make the ”Tasks” as quizzes! Please bring paper and a pen (+ 1 brain)! Overview • Names/Labels • Meta Entities • Reification • Types of Knowledge • Canonicity • WordNet Reminder: Relations Binary relations can be represented as • relation drives ⊆ person × car drives(M rBean, M ini1000) • table drives Person MrBean Car Mini1000 • graph drives • triple store Subject Relation Object MrBean drives Mini1000 9 Task: Functions Which of the following relations are functional? r1 r2 r2 r2 r1 r1 r2 r3 r3 r3 10 Def:Name A name (also: label) of an entity is a human-readable string attached to that entity. The entity Entity Name is called the meaning of the name. ”Mr. Bean” 11 Label ”Label” is a binary relation that holds between an entity and its name. label ”Mr. Bean” 12 Def:Synonymy If an entity has multiple names, the names are called synonymous. (The adjective for the names is ”synonymous”, each name is a ”synonym”, the phenomenon is called ”synonymy”) label ”The King” label ”Elvis” Def:Ambiguity If a name is attached to multiple entities, the name is called ambiguous. (The adjective for the names is ”ambiguous”, the phenomenon is called ”ambiguity”) label ”The King” label Task:Names List some entities with their names, some ambiguous names, and some synonyms. 15 Knowledge Representation • Names/Labels • Meta Entities • Reification • Types of Knowledge • Canonicity • WordNet Tuples and classes A relation tuple contains entities. Can it also contain classes? sings(Alizee, P opSongs) Entity Class ? Def:Class Entities A class entity is an entity that represents the class. The class entity ”PopSongs” represents popSongs = {M oiLolita, IslaBonita, ...} ”PopSongs” is an entity and ”popSongs” is a set of entities. RDFS Vocabulary 18 Example: Class Entity The class entity ”Marianne” represents the f renchP eople = { class ”French People” Hollande, P iaf, Alizee, ...} The Class Class ”Class” is the class of all class entities. class = {Cars, Cities, Rivers, ...} This class can appear in relations likes ⊆ person × class likes = {< Alizee, T attoos >, < M onroe, Shoes >, ...} 20 SubClassOf SubClassOf is a binary relation on class entities, which contains <x,y> if the class represented by x is a subclass of the class represented by y. subClassOf ⊆ class × class subClassOf (Singers, P ersons) subClassOf (Cars, V ehicles) 21 Type Type is a binary relation, which contains <x,y> if x is an instance of the class represented by y. type ⊆ entity × class type(Alizee, Singer) type(Elvis, livingP eople) Task: Class entities Draw a knowledge graph with the relations subClassOf and type. 23 Def:Relation Entity A relation entity is an entity that represents the relation. The relation entity ”Likes” represents likes = {< Alizee, T attoos >, ...} Now, ”Likes” is an entity (as opposed to ”likes”, which is a set of pairs). Def: Dom ”dom” is a relation on binary relation entities and class entities, which contains <x,y> if the domain of the relation represented by x is the class represented by y. dom ⊆ relation × class dom(BornInCity, P erson) 25 Def: Ran ”ran” is defined like ”dom” and identifies the range of a relation. ran ⊆ relation × class ran(BornInCity, City) 26 Task: Meta Relations Draw a knowledge graph with the relations dom and ran. Can dom and ran appear as nodes? 27 Digression:Class&Relation A fact can be modeled as a class or as a relation. type type woman singer gender job female singer Knowledge Representation • Names/Labels • Meta Entities • Reification • Types of Knowledge • Canonicity • WordNet n-ary facts as binary facts Every n-ary fact can be represented as binary facts. drives Person MrBean Car Mini1000 car destination Nice MrBean person MrBeanVacation Destination Mini1000 Nice Def: Event Entity An event entity represents an n-ary fact. Event entity MrBean person MrBeanVacation car destination Mini1000 Nice Task: Event Entities Draw a knowledge graph for the following facts. Irma loves Mr. Bean since 1955. Mr. Bean drives with Irma to the cinema. Irma and Mr. Bean watch ”Titanic”. The movie is about the trip of the ship ”Titanic” from Europe to New York. (There may be multiple solutions) 32 Binary relations are flexible n-ary relations enforce the presence of all arguments: born Person City Atkinson Consett Year 1955 Binary relations don’t: 1955 1955 Binary vs n-ary Binary and n-ary relations can represent the same facts. binary n-ary • more relations • less arity • less relations • more arity • more flexibility • more control Task: Representation Represent the following statement in 4 different ways: Rowan Atkinson is an actor who plays Mr Bean. Def:Reified statement A reified statement is an entity that represents a statement. This phenomenon is called reificati Alizee represents completed Danse Classique Dance School 36 Reification Vocabulary statement = set of reified statements subject ⊆ statement × entity predicate ⊆ statement × relation object ⊆ statement × entity subject predicate s41 object Alizee completed Dance School Example: Reification hopes(P ierre, s42) subject(s42, Alizee) predicate(s42, type) object(s42, single) Simplified notation: hopes(P ierre, type(Alizee, single)) The represented statement itself is not necessarily in the KB! Pierre Task: Reification Write down a knowledge base with some reified facts. Can you reify facts that have reified arguments? 39 Knowledge Bases • Names/Labels • Meta Entities • Reification • Types of Knowledge • Canonicity • WordNet Structured Information There are several types of structured information: • spatial • factual • taxonomic • lexical • multilingual • phrasal • meta • common-sense • multimodal • epistemic • temporal 41 Factual Knowledge Factual knowledge concerns relationships between entities. Alizee is a French singer from Ajaccio, Corsica. Alizee hasNationality Alizee isFrom Ajaccio isLocatedIn French Ajaccio Corsica Taxonomic Knowledge Taxonomic knowledge concerns classes. Singers Pop Singers Alizee, a pop singer from France, has sold more than 5 million records. AlizeeAmerica Lexical Knowledge Lexical knowledge concerns labels, words, and properties of words. Also known by her nickname ”Lili”, Alize started dancing early. Wikipedia/Alizee nickname label fullName (Please also look at the technical content of this slide) ”Lili” ”Alizee” ”Alize Jacotey” Multilingual Knowledge Multilingual knowledge concerns labels in different languages. ”France” ”Fronkraisch” ”??????” ”??????????????” ”Francio” Phrasal Knowledge Phrasal knowledge is about small groups of words that stand together as a conceptual unit. GoogleDef PATTY Temporal Knowledge Temporal knowledge concerns the time of events and facts. Alize started dancing early in her Event life, and by age four was already dance proficient Age 4 Age 5 enroll proficient. A year later, she was enrolled in Ajaccio’s Ecole de Danse, and trained there until she was 15. In 1995, at the age of 11, she won a competition of a French airline. The aircraft was subsequently named after her. Time early train won Age 5-15 won 1995 born 1984 Age 11 Spatial Knowledge Spatial knowledge concerns the location of events, entities, and facts. Meta Knowledge Meta knowledge is knowledge about knowledge. Structured Knowledge of NELL about Alizee: Meta knowledge about ”Alizee is a celebrity” says where, when, and how this fact was found. Common Sense Knowledge Common sense knowledge concerns facts of ordinary sensible understanding. Myspace daughter(x, y) ⇒ gender(x, f emale) mother(x, y) ⇒ loves(x, y) mother(x, y) ⇒ ∃z : f ather(z, y) Wikionary Multimodal Knowledge Multimodal knowledge concerns information that is not written text: audio,video,haptic,etc. hasPicture Alizee sings MoiLolita.mp3 appearsIn Youtube Epistemic Knowledge Epistemic knowledge concerns beliefs. Hellocoton believes(Hellocoton, divorced(Alizee, Jeremy)) Knowledge Representation • Names/Labels • Meta Entities • Reification • Types of Knowledge • Canonicity • WordNet Def: Canonic Entities An entity is canonic in a KB, if there is no other entity in the KB that represents the same real-world object. Alizee A. Jacotey ... produced produced ... Gourmandises Psychdlices ... not canonical (Here, we distinguish exceptionally between an entity in the KB, and the real world object. This distinction is correct, but rarely necessary otherwise, which is why this lecture does not make the difference) 54 Def: Canonic Relations An relation is canonic in a KB, if there is no other relation in the KB that represents the same real-world relation. Alizee Alizee ... produced Gourmandises hasProduced Psychdlices ... ... not canonical (Here, we distinguish exceptionally between a relation in the KB, and the real world relation. This distinction is correct, but rarely necessary otherwise, which is why this lecture does not make the difference) 55 Use of Canonicity Canonicity is essential for • counting • answering queries • constraint satisfaction Alizee Alizee ... produced Gourmandises hasProduced Psychdlices ... ... not canonical Canonicity and Names A canonic entity can have multiple names. Alizee produced Alizee produced Alizee label Alizee produced label ”Alizee” ”A. Jacotey” label ”produced” produced ... label ... ”has produced” ... Gourmandises Psychdlices Canonicity is not easy Jacotey is considered one of the ”100 Se women of the world”. The singer said i that Alizee is married, but she lives sep Example: Non-Canonicity ”Tell me about Alize” near duplicates not Alizee not an entity TextRunner Example: Canonicity YAGO Example: Non-Canonicity ”Who built the pyramids?” correct not bad less likely duplicate useful TextRunner Example: Canonicity No answer to ”Who built the pyramids” YAGO Canonicity as Trade-Off non-canonic • easier to extract • less easy to use canonic • difficult to • more noise extract • easy to use • more data • less noise • less data Knowledge bases • Names/Labels • Meta Entities • Reification • Types of Knowledge • Canonicity Digression: Reality We model reality by a representation. person Alize Monroe female Digression: Reality Our identifiers are arbitrary names. A16 A17 A14 A15 Digression: Reality Can we reconstruct reality from our model? ? A16 A17 A14 ? A15 Digression: Reality Most likely no: A Chinese dictionary is a model of the world... ...yet, by reading it, you cannot learn Chinese. References RDF Primer 69