Dictionaries
Transcription
Dictionaries
Vocabulary management and SKOS Putting Business in the Lead Jan Voskuil (Taxonic) September 5th, 2014, Leipzig SEMANTiCS 2014 Introduction Jan Voskuil Taxonic (co-founder) Consultancy in Semantic Technology “SKOS is used for findability, but should be used also for vocabulary management in organizations. Business owns the dictionary, not IT” What are dictionaries and what for? SKOS: Tooling and benefits Practicalities Dienst Justitiële Inrichtingen (DJI) Custodial Institutions Agency Ca. 10.000 employees Ca. 70.000 inmates per year Ca. 50 facilities Four groups of detainees Adult detainees Juvenile offenders Patients in forensic care Foreign nationals Dictionaries: Benefits • Knowledge management • Quality of information • Manageability – If your systems contain 100K+ of attribute names, then they contain unstructured information (Dave McComb) • Findability – Document (DMS) – Data (DBMS) • Exchangeability 4 How many key words are enough? Frequency of the most frequent word • • Zipf’s Law 5000 words are enough to understand 95% of any corpus. For the other 5% you need to know the other 200,000 words Source: Tiberius and Schoonheim A Frequency Dictionary of Dutch, 2014 Frequency of the second most frequent word Pocket dictionary: 5K General dictionary: 100K Lexicographic dictionary: 1M+ 5 The Real World What is the correct definition of x? Who decides this? My project introduces new terms, how can I get these accepted? Dictionary Owner Begrippenwoordenboek DJI Dept X Begrippenlijst Project Y Project Y Mega Glossary ICT-Dept Information chain dictionaries Ketenwoordenboek Strafrecht JustID Ketenwoordenboek Vreemdelingen JustID Justitiethesaurus WODC Data Dictionaries Gegevenswoordenboek MITS ICT-Dept Datadictionary Tulp MIR ICT-Dept … It just does not work! 6 OLD SITUATION NEW SITUATION Various lists Single source of truth Various versions Single source of truth Word-documents Intranet (Internet) Distribution per mail Intranet (Internet) Endless discussions Responsibility of IT dept or project Clear-cut governance Ownership by the business 7 Some How To’s • Keep the dictionary lean and mean – Create a “pocket dictionary” – Example: 1200 key words • Governance: be pragmatic • Ownership within the business! • Use clear, explanatory descriptions – Language of the work force – Avoid legal speak! • Dictionary maintenance is a continuous proces! – Release cycle – One major, four minor releases per year – Major release is approved by senior executives 8 Why SKOS is so great: just enough semantics • Semantic relations Justitiabele (“Detainee”) Adult detainee narrower – Compare one-dimensional lists • A LIMITED number of STANDARDIZED semantic relations Criminal Law • Intuitive, easy to understand • “GENERALIZED CLASSIFICATION” narrower Penal Institution Sex narrower • Only most relevant info Foreign national Patient in forensic care – Broader, Narrower, Related Term – Semantics is sufficiently vague – Ideal for “pidginization” – Use is far broader than Class Diagrams, ERDs and ontologies Juvenile offender Male Female Unknown Undisclosed 9 Why SKOS is so great: tooling 10 Tooling: PoolParty Thesaurus Manager 11 End User View 12 SKOS is an Open Standard: Project Linking 13 http://vocabulary.wolterskluwer.de prefLabel: Unfallverhütung Alternative labels Broaders From Wolters Kluwer Narrowers Related terms From DBPedia From lod.gesis.org From eurovoc.org Other thesauri on the web 15 prefLabel: Unfallverhütung Alternative labels From Wolters Broaders Kluwer DJI and the POLICE have very different meanings for the word ARRESTANT Narrowers DO: > RESPECT DIFFERENCES BETWEEN RelatedORGANIZATIONS terms > MAKE LEXICOGRAPHIC DIFFERENCES EXPLICIT USING LINKED THESAURI DON’T > TRY MAKING ALL ORGANIZATIONS USE EXACTLY THE SAME LANGUAGE From DBPedia From lod.gesis.org From eurovoc.org Other thesauri on the web 16 Conclusion and next step: Linking Thesauri to Datamodels • Datamodels: not owned by business – too detailed – too complex – NO ownership at the strategic level • Thesauri – Relatively abstract – Relatively simple – Ownership by the business • SKOS bridges the gap – With datamodels in RDF, the gap can be bridged! 17 THESAURUS AND DOMAINMODELS: SCENARIO 1 THESAURUS Skos:Concept “Detention Facility” skos:Concept rdf:type rdfs:type skos:exactMatch eurovoc:C877 skos:prefLabel “Penal Institution”@en skos:prefLabel skos:broader skos:prefLabel “Penitentiary Institution” skos:Definition “A prison,[3] gaol or jail[4] is a facility in which voc:4862 inmates are forcibly confined and denied a variety of freedoms under the authority of… “място за лишаване от свобода ”@bg owl:sameAs? skos:exactMatch? DOMAIN MODEL | Data dictionary :penitentiaryInstitution rdf:type :inmate#9818763 :isRegisteredAt :cell :pi_Dordrecht “B.23.a” 18 THESAURUS AND DOMAINMODELS: SCENARIO 2 THESAURUS Skos:Concept skos:Concept rdfs:type “Detention Facility” rdf:type eurovoc:C877 skos:prefLabel “Penitentiary Institution” “Penal Institution”@en skos:prefLabel “A prison,[3] gaol or jail[4] is a facility in which inmates are forcibly confined and denied a variety of freedoms under the authority of… skos:exactMatch “място за лишаване от свобода ”@bg DOMAIN MODEL | Data dictionary :penitentiaryInstitution rdf:type :inmate#9818763 :isRegisteredAt :cell :pi_Dordrecht “B.23.a” 19 jan.voskuil@taxonic.com www.taxonic.com