2-02-1 DOCLE Report - University of Adelaide
Transcription
2-02-1 DOCLE Report - University of Adelaide
2-02-1 GP Vocabulary Project – Stage-2 DOCLE Report Don Walker April 2004 The Docle System Report Acknowledgements: The GP Vocabulary Project is being funded by the Department of Health and Ageing (DHA) through the General Practice Computing Group (GPCG) Publication Date: April 2004 Contact person: Dr Don Walker Department of General Practice University of Adelaide South Australia 5005 Email: donald.walker@adelaide.edu.au Don Walker (April 2004) 2 The Docle System Report FOREWORD The "GP Vocabulary Project – phase-2" links (or “maps”) a subset of GP terms to the concepts of various systems. The links are made via the terms of the concepts. Thus “ICD10-AM codes” are linked via their “index terms”. This is a report about the DOCLE system. It covers its loading into the Poly-browser and Authoring Tool (PAT) and its subsequent examination, assessment and analysis as specified for the GP Vocabulary Project – stage-2. Additional information is available on the web-page of the "GP Vocabulary Project". It may be found via the home-page of the Department of General Practice at the University of Adelaide: http://www.generalpractice.adelaideuni.org/ http://www.generalpractice.adelaideuni.org/nav/res_nav/current/it.shtml#vocab Don Walker (April 2004) 3 The Docle System Report CONTENTS 1 2 3 4 5 6 7 8 Introduction............................................................................................................6 1.1 The GP Vocabulary Project stage-2...............................................................6 1.2 The Docle System ..........................................................................................6 Liberties taken........................................................................................................6 Definitions pertinent to the Docle System .............................................................6 The supplied data tables.........................................................................................8 4.1 Terms All .......................................................................................................8 4.2 Reason for encounter .....................................................................................9 4.3 Primary Key - Secondary Key .....................................................................10 4.3.1 Data sample..........................................................................................10 4.4 Genus to Species ..........................................................................................12 4.4.1 Data Sample .........................................................................................12 4.5 Species to Genus ..........................................................................................13 4.5.1 Data Sample .........................................................................................13 Preparation ...........................................................................................................14 5.1 Expand the “Terms-All” data.......................................................................14 5.2 Identification of “Atomic Concepts” ...........................................................14 5.3 Creation of a “Unique Docle List”...............................................................14 5.4 Creation of “Preferred-Terms” ....................................................................14 5.5 Creation of Concept and Operator usage .....................................................14 5.6 Creation of other “Terms” data....................................................................16 5.7 Creation of CODE data ................................................................................16 5.8 Creation of LIST data ..................................................................................16 5.9 Creation of the Hierarchy.............................................................................16 5.9.1 Discussion ............................................................................................16 5.9.1.1 Level-0 - the “Phylum” ....................................................................16 5.9.1.2 Level-1 - the “Genus” ......................................................................17 5.9.1.3 Level-2 – the “Species” ...................................................................17 5.9.1.4 Docle Orphan Concepts ...................................................................17 5.9.1.5 Cyclic relationships in the Genus-to-Species table..........................17 5.9.1.6 Poly-hierarchies ...............................................................................18 5.9.1.7 Cyclic relationships in a poly-hierarchy ..........................................18 5.9.1.8 The “Type” of a Hierarchical relationship.......................................19 5.9.2 Creation of the Hierarchy Table ..........................................................19 5.10 Creation of Non-hierarchical Relationships.................................................20 5.10.1 Relationship-types in SNOMED..........................................................20 5.10.2 Creation of the Non-hierarchical relationships ....................................20 5.10.2.1 The Docle Operators ....................................................................20 5.10.2.2 Docle Non-Hierarchical Relations...............................................21 5.11 Allocation of “Unique Identifiers”...............................................................23 Loading the Docle System into the Poly-Browser...............................................23 6.1 Docle’s Appearance in the Poly-Browser....................................................23 Further Reading ...................................................................................................30 Analysis................................................................................................................30 8.1 Supplied data................................................................................................30 8.2 Content.........................................................................................................30 8.3 Machine located contents.............................................................................32 Don Walker (April 2004) 4 The Docle System Report 8.3.1 Finding GP Vocabulary terms .............................................................32 8.3.2 Terms in SNOMED CT that were also in other systems .....................33 8.4 Manually located contents ...........................................................................34 8.4.1 Comparing machine matches and manual linking ...............................35 8.5 Relevance.....................................................................................................35 8.6 Ease of use and functionality: ......................................................................36 8.7 Cost and availability ....................................................................................37 8.8 Strengths and implications...........................................................................37 8.9 Weaknesses, consequences and solutions....................................................37 9 Concluding remarks .............................................................................................39 10 Creator’s Comments ........................................................................................39 Don Walker (April 2004) 5 The Docle System Report 1 Introduction This document describes the inclusion of the DOCLE System into the poly-browser and authoring tool (PAT) and its subsequent examination, assessment and analysis. The work was part of the GP Vocabulary Project – stage 2. 1.1 The GP Vocabulary Project stage-2 The objectives of stage-2 of the GP Vocabulary Project are to: 1. Trial the mapping of the GP Vocabulary project to other coding systems in use within Australia (which include where feasible: DOCLE, CATCH, ICD-10-AM, ICPC2-Plus, and SNOMED-CT). 2. Analyse the structure of these terminologies to determine how well they relate to each other to assess their suitability for general practice and other parts of the health sector; and 3. Inform consideration of the process and costs involved in completing a map from the GP Vocabulary to a standard reference terminology and classifications. To achieve this, the pilot will need to use appropriate methodologies and tools which could be applied to an eventual large scale completion of the linking exercise. 1.2 The Docle System The Docle System is the creation of an active Victorian general practitioner - Dr Kuang Oon. It is used as the controlled vocabulary in the medical record system called “Medical Director”. 2 Liberties taken Some liberties had to be taken with the Docle System. These were necessary to enable its loading into the Poly-browser. The liberties included the following… o Some Docle operators were changed to “Docle operator concepts”. These are discussed below. o The question-mark character (“?”) was treated as a new concept named “possible”. 3 Definitions pertinent to the Docle System The Docle System is somewhat unconventional. Perhaps the following definitions and explanations may help initial understanding of it… Don Walker (April 2004) 6 The Docle System Report Term Definition and Explanation Docle A “Docle” is the name given to the unique identifier used by the Docle System. It generally consists of the first four letters of the name of the concept it represents. For example the docle for “chest” is “ches”. The identifier is thus usually understandable. Atomic Concept “Atomic concepts” are those that cannot be reduced further. They are combined by “docle operators” to create “compound concepts”. For example “Chest” and “pain” are two atomic concepts that form the compound concept of “Chest pain”. The docle for which is that for “chest”, plus the operator for “apropos” plus the docle for “pain” – the resulting docle being “ches@pain”. Atomic Docle An “atomic docle” is a single docle that identifies an “atomic concept”. It does not contain an operator. “ches” and “pain” are each “atomic docles”. Docle Operator The “docle operator” is the single punctuation character that joins atomic concepts to form “compound concepts”. In the above “chest pain” example it is the “@” character meaning “apropos”. Compound Concept A “Compound concept” is one that is made up of two or more “atomic concepts”. Compound Docle A “compound docle” is a docle that uniquely identifies a compound concept. It contains two or more atomic docles joined by operators. Genus “Genus” is the name given by the Docle System to the top level of its hierarchical concepts. The level above (“phylum”) has not been implemented. A genus concept is the parent of children that are called its “species”. (“In the Docle System, a genus is a parent “metaclass” of its species. It carries the “^” stigma.” (Kuang Oon)) Species “Species” is the name given by the Docle System to the second level of its hierarchical concepts. They are the children of their parent “genus”. Orphan concepts “Orphan concepts” are those that are unclassified. They therefore have no parents. Primary key The docle “primary-key” is an expression of a docle in “long-hand”. For example the primary key for the concept “Chest pain” is “chest@pain”. “Preferred terms” and “secondary keys” may be derived from the primary-docle by using the “docle computer algorithm”.. Secondary key The docle “secondary-key” is the short-hand form of the primary-key docle. It is derived by applying the “Docle algorithm” to the primary-key. Tertiary key The docle “tertiary key” is the “alias” or “alternative description” for a docle concept. Aliases represent concept “synonyms”. Docle algorithm The “Docle algorithm” converts the “Docle primary-key” to the “Docle secondary-key” and can generate “preferred terms” from the “Docle primary-key”. To generate secondary-keys, the Docle algorithm acts along the follows lines: For one word scenario – the first four characters are used. For two word scenario – the first four characters of first word plus the first character of the second word. For three or more word scenario – the first characters of each word concatenated. Table 1: Terms and definitions used to describe the Docle System Don Walker (April 2004) 7 The Docle System Report 4 The supplied data tables The Docle System was supplied in five text tables that are described below. The table were… 1. 2. 3. 4. 5. Terms All Reason For Encounter (RFE) subset Primary-Key to Secondary-Key Genus to Species Species to Genus 4.1 Terms All The “Terms All” table contained two fields – one was the “name” of the term and the other its “docle”. Mixed amongst them were primary-keys, ICD codes, and aliases. There were 26,704 records. An example of the data is shown below… Name uterinePolypSurgery Docle surg.uter@polyp icd9@453.3@=@rvt rvt icd9@453.0@=@bcs bcs icd9@451@=@infl.vein infl.vein arterialBlood@baseExcess arteb@basee Anti viral prescription pres@anti@vira icd9@451.19@=@dvt dvt arterialBleeding blee@arte infection<ascaris@lumb-ricoides infe<asca@lumbrico Autoantibodies - Glomerular basement membrane autoa@gbm Brain Injury deficiency@proteinS@acquired inju.brai defi@prots@acqu biopsyCervix surg.cerv@biop infection<arenaVirus Autoantibodies - Gastric parietal cell infe<arenv autoa@gpc cervix@dilatation cerv@dila uterineHypoplasia Brain cyst dysplasticBarrettsEsophagus hypoplas.uter cyst.brai infl.esop@barr@dysplast Don Walker (April 2004) 8 The Docle System Report infection<arboVirus Autoantibodies - Extractable nuclear antigen infe<arbov autoa@ena uterineFibromyoma icd10@M85.89@=@tietd fibroid tietd X-ray - Discography xr@discogra xxxKaryotype Brain CT xxxk ct.brai Table 2: A sample of the “Terms All” data supplied 4.2 Reason for encounter The Reason for Encounter (RFE) subset file was a subset of the Terms-All data. It contained the “Terms” and docles imbedded in reporting text (“seek view” and &ctx@seek@view[...]”). There were 18,920 RFE records. Some 116 “new entries” were not included in the “Terms-All” data file - they were added during preparation of the data. An example of the RFE subset is shown below… Name/Primary Key seek view possibleDiabetesMellitus seek view suspectDiabetesMellitus seek view possibleHIVCarrier seek view possibleHumanImmunodeficiencyVirusInf ection seek view queryHIVInfection seek view diabetesMellitusControlled seek view hyperlipidemiaControlled seek view hyperTensionControlled seek view helicobacterPyloriEradication seek view cardioversionFailed seek view familyHistoryBowelCancer seek view familyHistoryCarcinomaBowel seek view systemReviewCardiovascular seek view diabetesMellitusNilFamilyHistory Don Walker (April 2004) Docle/Secondary Key &ctx@seek@view[&ctx@?,diabm] &ctx@seek@view[&ctx@?,diabm] &ctx@seek@view[&ctx@?,infe<hiv] &ctx@seek@view[&ctx@?,infe<hiv] &ctx@seek@view[&ctx@?,infe<hiv] &ctx@seek@view[&ctx@eval[diabm],outx[,g ood,cont]] &ctx@seek@view[&ctx@eval[hype@lipi],out x[,good,cont]] &ctx@seek@view[&ctx@eval[hypet],outx[,go od,cont]] &ctx@seek@view[&ctx@eval[infe<helib@py lori],outx[,cure]] &ctx@seek@view[&ctx@eval[ppoc@cardiov e],outx[,fail]] &ctx@seek@view[&ctx@fh[carc.bowe@larg] ] &ctx@seek@view[&ctx@fh[carc.bowe@larg] ] &ctx@seek@view[&ctx@hx[hx@cvs]] &ctx@seek@view[&ctx@no,fh@diabm] 9 The Docle System Report seek view noFamilyHistoryDiabetesMellitus &ctx@seek@view[abdomen@indigestion] seek view Abdomen - Indigestion seek view dyspepsia seek view Epigastric discomfort seek view epigastricDiscomfort seek view indigestion &ctx@seek@view[abdomen@murmur] seek view Abdomen - Murmur seek view abdomenBruit seek view Abdominal bruit seek view Bruit - abdomen seek view bruitAbdomen &ctx@seek@view[abdomen@pain&ctx@i ll,undiagnosed] seek view undiagnosedAbdomenPain &ctx@seek@view[abdomen@pain.epigast rium] seek view Abdominal pain - Epigastric seek view Epigastric pain seek view epigastriumPain seek view Pain - epigastrium seek view painEpigastrium &ctx@seek@view[&ctx@no,fh@diabm] &ctx@seek@view[abdo@indi] &ctx@seek@view[abdo@indi] &ctx@seek@view[abdo@indi] &ctx@seek@view[abdo@indi] &ctx@seek@view[abdo@indi] &ctx@seek@view[abdo@indi] &ctx@seek@view[abdo@murm] &ctx@seek@view[abdo@murm] &ctx@seek@view[abdo@murm] &ctx@seek@view[abdo@murm] &ctx@seek@view[abdo@murm] &ctx@seek@view[abdo@murm] &ctx@seek@view[abdo@pain&ctx@ill,undi] &ctx@seek@view[abdo@pain&ctx@ill,undi] &ctx@seek@view[abdo@pain.epig] &ctx@seek@view[abdo@pain.epig] &ctx@seek@view[abdo@pain.epig] &ctx@seek@view[abdo@pain.epig] &ctx@seek@view[abdo@pain.epig] &ctx@seek@view[abdo@pain.epig] Table 3: A sample of the “Reason For Encounter” data supplied 4.3 Primary Key - Secondary Key The “Primary Key - Secondary Key” table was supplies at the request of the author. It was needed to enable “preferred terms” to be identified. The table contained two fields. The “primary-key” fields contained the “long hand” version of the docle, while the “secondary-key” field contained the shorter docle. There were 5,961 records. 4.3.1 Data sample A sample of the data is shown below… Primary_Key Docle mri.bone skin@bruising@umbilicus mri.bone skin@brui@umbi surgery.jejunum cerebralPalsy immunisation@japaneseEncephalitis surg.jeju cerep immu@japae hemoGlobin@h@incl-usions hemog@h@inclusio feeling= infection<herpesSimplexVirus feel= infe<hsv Don Walker (April 2004) 10 The Docle System Report ct.spine@neck ct.spin@neck edema.lung immunisation@infl-uenzae edem.lung immu@influenz coma@hypo@glucose@pre coma@hypo@gluc@pre aneurysm.aorta@thorax aneu.aort@thor inflammation.larynx osteoArthritis.ankle lesion.subthalamicNucleus injury&ctx@ill,pres-ent infl.lary ostea.ankl lesi.subtn inju&ctx@ill,present lesion.substantiaNigra osteoArthritis preventiveCare@osteoPorosis lesi.subsn ostea prevc@ostep preventiveCare@lung prevc@lung preventiveCare@hyperTension prevc@hypet edema dislocation.hip@congenital edem disl.hip@cong coma@hypo@glucose coma@hypo@gluc feeling*l birthControl cerebralEdema eczema@vein feel*l birtc ceree ecze@vein autoHemolysis@test autoh@test fracture.vertebra@t12@crus-h frac.vert@t12@crush injury&ctx@ill,powe-red@non inju&ctx@ill,powered@non inflammation.lacrimalDuct infl.lacrd Table 4: A sample of the “Primary key - Secondary key” data supplied Notes: o Case is used to delimit words in phrases eg. “preventiveCare” and “birthControl”. o 101 docle Secondary Keys were not unique – ie. The table contained aliases. A few examples are shown below… Don Walker (April 2004) Primary key Secondary key Mittleschmerz mitt mittelschmerz mitt nsu nsu 11 The Docle System Report Primary key Secondary key nonSpecificUrethritis nsu OCD ocd obsessiveCompulsiveDisorder ocd Pneumonitis pneu pneumonia pneu tremor trem trembly trem Table 5: A few examples of the 101 non-unique secondary-keys 4.4 Genus to Species The “Genus to Species” table contained 851 records of a single variable length text field containing tab-delimited docles. The first data item of each line was the “Genus”. Subsequent items in each line were the “Species” associated with each “Genus”. The “^” character was used to denote a genus docle. 4.4.1 Data Sample A sample of the data is shown below… &ctx@hx[hx@cvs]^ &ctx@hx[hx@cvs] abdo@pain^ abdo@pain.umbi abdo@pain@chro abdo@pain@rebo abdo@pain.hypog abdo@pain@guar abdo@pain.epig abdo@pain@touc abdo@rigi abdo@pain abdo@pain.ruq@colic abdo@pain.lif abdo@pain.leftl abdo@pain.rif abdo@pain&ctx@ill,undi abdo@pain.luq abdo@pain.ruq abdo abdo@swel^ abdo@swel.leftl abdo@swel.rif abdo@swel.lif abdo@swel.luq abdo@swel.ruq abdo@swel@puls abdo@swel@mass abdo@swel.umbi abdo@swel.righl abdo@swel.epig abdo@swel abdo@swel.hypog abdo@upse^ abdo@pain.umbi abdo@pain@chro abdo@pain@rebo abdo@pain.hypog abdo@pain@guar abdo@pain.epig abdo@pain@touc abdo@rigi abdo@upse abdo@pain abdo@pain.lif abdo@rumb abdo@pain.leftl abdo@pain.rif abdo@pain&ctx@ill,undi abdo@pain.luq abdo@pain.ruq abdo abor^ abor@thre abor@part abor@miss abor@recu abor abor@drug abor@seps abor@inev absc.brai^ absc.brai<enta@histolyt absc.subd absc.brai absc.live^ absc.live<enta@histolyt absc.live absc.lung^ absc.lung<enta@histolyt absc.lung absc.skin^ absc.skin furu absc.skin@mult absc^ absc.brai absc.gallb absc.epid absc.lung absc.bartg absc.live absc.pleu absc.subd absc@perin absc.panc absc.brai<enta@histolyt absc.live<enta@histolyt absc absc@extrd absc.lung<enta@histolyt absc@peria absc@subp absc@ischr absc.pelv absc.kidn absc.br abus@alco^ encep@wern infl.stom@alco withdraw@alco abus@alco hepa@alco abus@laxa^ melanosi.colo abus@laxa abus@subs^ abus@drug@iv abus@volas abus@opia withdraw@drug prob@alco abus@coca abus@laxa abus@benzd withdraw@benzd withdraw@opia withdraw@synd impa@visi&ctx@sequ<,abus@toba abus@toba abus@amph abus@halla abus@subs abus@gambling abus@alco abus@lysea abus@dr Don Walker (April 2004) 12 The Docle System Report abus@toba^ abus@toba impa@visi&ctx@sequ<,abus@toba accu@drug^ accu@drug overdose@drug acidosis^ acidosis@renat acidosis@resp acidosis acidosis@diabm acidosis@metab acidosis@metab@aniog@norm acidosis@metab@aniog@incr acne^ acne&ctx@sequ<,drugr acne acne&ctx@ill,wors adhe.fallt^ adhe.fallt adhe.labi^ adhe.labi fusi.labi adjud^ adjud Table 6: A sample of the “Genus to Species” data supplied 4.5 Species to Genus The “Species to Genus” table contained 3126 records of a single variable length text field containing tab-delimited docles. The first data item of each line was the “Species”. Subsequent items in each line were the “genera” associated with the “Species”. The “^” character was used to denote the genus docles. 4.5.1 Data Sample A sample of the data is shown below… abdo@indi systr@gasti^ abdo@upse^ abdo@pain systr@gasti^ abdo@pain^ pain^ abdo@upse^ abdo@pain&ctx@ill,undi abdo@pain^ pain^ abdo@upse^ abdo@pain.epig abdo@pain^ pain^ abdo@upse^ abdo@pain.hypog abdo@pain^ pain^ abdo@upse^ abdo@pain.leftl abdo@pain^ pain^ abdo@upse^ abdo@pain.lif abdo@pain^ pain^ abdo@upse^ abdo@pain.luq abdo@pain^ pain^ abdo@upse^ abdo@pain.rif abdo@pain^ pain^ abdo@upse^ abdo@pain.righl abdo@pain^ pain^ abdo@upse^ abdo@pain.ruq abdo@pain^ pain^ abdo@upse^ abdo@pain.ruq@colic abdo@pain^ pain^ abdo@pain.umbi abdo@pain^ pain^ abdo@upse^ abdo@pain@acut abdo@pain^ pain^ abdo@upse^ abdo@pain@chro abdo@pain^ pain^ abdo@upse^ abdo@pain@guar abdo@pain^ pain^ abdo@upse^ abdo@pain@rebo abdo@pain^ pain^ abdo@upse^ abdo@pain@touc abdo@pain^ pain^ abdo@upse^ abdo@rigi abdo@pain^ pain^ abdo@upse^ abdo@rumb abdo@upse^ abdo@swel abdo@swel^ Don Walker (April 2004) 13 The Docle System Report abdo@swel.epig abdo@swel^ abdo@swel.hypog abdo@swel^ abdo@swel.leftl abdo@swel^ abdo@swel.lif abdo@swel^ Table 7: A sample of the “Species to Genus” data supplied 5 Preparation The suppled data files were loaded into a database environment, processed, converted and manipulated so the form of their data suited the structure of the poly-browser, into which they were to be imported. The broad stages are outlined below… 5.1 Expand the “Terms-All” data The “Terms-All” data was imported. The original record count was 26,704. About 116 additional terms were added from the “Primary Key - Secondary Key” and “Reason for Encounter” data. Atomic Concepts that were implied but not specified were added. After merging some 2,776 of these, the Terms-All list became 29,596 5.2 Identification of “Atomic Concepts” “Atomic Concepts” were extracted from “docles” and their “primary keys”. The total number was 3,828. Those that had not been explicitly stated in the supplied files numbered 2,776. They were created and added to the above Terms-All data. 5.3 Creation of a “Unique Docle List” From the expanded Terms-All data, a “Unique Docle List” was created. 8,640 unique docles were identified. Flags indicating “Atomic Concepts” were added 5.4 Creation of “Preferred-Terms” The arbitrary “preferred terms” for each docle were selection for each record in the above “Unique Docle List”. This was done by: 1) Deriving the name from the structure of the “primary key” …or failing that… 2) The longest name that was otherwise extracted or derived. 5.5 Creation of Concept and Operator usage The “top-2-preceding” and “top-2-following” operators used with each unique atomic docle were recorded. This was done more out of interest than necessity. The most used atomic concepts (in the Docle System) and their top-two-adjacent operators are shown in the table below… Don Walker (April 2004) 14 94 100 Apropro 4 0 323 Injury Starting 99 Apropro 1 178 136 133 108 98 Ppoc S Pain Fracture Prescription Starting Starting Apropro Starting Starting 86 99 92 72 100 Apropro Apropro Starting Apropro 13 1 8 28 0 Due to Contextual modifier Code shear character Apropro Apropro Ending Located at Apropro 94 92 89 86 86 81 70 65 63 Skin Extirpation Urine Xr X Ray Neoplasm Lesion Abdomen Inflammation Starting Apropro Starting Starting Starting Starting Starting Starting Starting 58 100 96 100 100 84 80 55 97 Located at 34 0 4 0 0 15 20 32 3 Apropro Ending Apropro Located at Located at Located at Located at Apropro Located at 58 57 50 49 47 46 Mc Carcinoma Breast Spine Eye Class Apropro Starting Located at Located at Located at Apropro 67 93 78 88 62 100 Starting Apropro Starting Starting Starting 33 7 20 10 32 0 Apropro Located at Apropro Apropro Apropro Ending 46 45 44 44 44 Knee Abscess Bladder Chest Serology Located at Starting Located at Starting Starting 89 56 89 82 100 Starting Apropro Apropro Located at 9 44 9 18 0 Apropro Located at Apropro Apropro Apropro 43 Hyper Starting 93 Apropro 7 Apropro 43 42 42 Swelling Allergy Problem Apropro Starting Starting 98 74 100 Starting Apropro 2 24 0 Ending Apropro Apropro 40 Immunisation Starting 80 15 Apropro 37 37 35 Ear Vagina Sh Located at Located at Starting 81 62 100 Contextua l modifier Starting Starting 14 32 0 Apropro Apropro Apropro 35 34 Uterus Deficiency Located at Starting 94 100 Starting 6 0 Apropro Apropro 34 33 Neck Hypo Apropro Starting 53 88 Starting Apropro 24 12 Apropro Apropro 33 33 32 32 Lung Vein Kidney Us Located at Located at Located at Starting 88 82 69 100 Starting Apropro Apropro 9 15 25 0 Apropro Apropro Ending Located at 31 31 Biopsy Feeling Apropro Starting 52 77 Starting Apropro 45 23 Apropro Apropro 76 64 10 0 77 10 0 59 10 0 64 76 59 10 0 68 68 30 30 Hip Joint Located at Starting 93 43 Starting Located at 7 33 Apropro Apropro 87 57 361 355 Don Walker (April 2004) Apropro Apropro Apropro Located at Apropro PostOperator_ NextPerCent PreOperator_ NextPerCent 0 0 PostOperator_ NextType PreOperator_ NextType Apropro PostOperator_ TopPerCent PreOperator_ TopPerCent 99 100 Infection Ill Starting Code shear character Starting Apropro 753 405 PostOperator_ TopType PreOperator_ TopType Surgery Ctx Total_Use_In_ Compounds Derived_Name The Docle System Report Located at Apropro 93 10 0 74 99 Apropro 6 0 Located at Ending 18 1 89 Located at 10 83 99 44 69 99 Located at Ending Apropro Ending Code shear character Ending Apropro Abnormal Apropro Apropro Apropro Apropro Ending 8 1 29 28 1 Ending Ending Ending Ending Ending Code shear character Ending Ending Ending Ending 38 2 24 22 23 7 83 60 84 73 73 73 61 83 10 0 62 98 66 71 62 93 80 44 64 89 10 0 10 0 65 86 10 0 83 13 30 12 27 27 23 30 17 0 20 40 30 11 0 0 Located at Ending 23 14 0 Ending 18 Ending Ending 22 31 0 Ending 20 0 Ending 41 0 Ending Ending Apropro 33 18 41 0 Ending Abnormal low Ending Ending 32 23 13 43 15 The Docle System Report Table 8: A sample of the most used atomic concepts (in the Docle System) and their top-two-adjacent operators 5.6 Creation of other “Terms” data The various terms that describe each concept were extracted from the “Terms-All” table and placed in a “Term” table. These comprised the “preferred terms” and their “synonyms” – i.e. Docle System “aliases”. 5.7 Creation of CODE data The ICD9 and ICD10 codes were extracted from the “Terms-All” data table. From these a “code-table” was created. To the code-table were added the Docles. Thus, the ICD codes and docles were retained and were related to their original concept. There were 1,550 ICD9 codes and 1,505 ICD10 codes. 5.8 Creation of LIST data To preserve derived data of potential interest, two “Lists” of concepts were created. They were: (1) List of Atomic Concepts, and (2) List of Reason for Encounter (RFE) concepts. 5.9 Creation of the Hierarchy Before describing the creation of the “hierarchy table” used by the poly-browser, a brief discussion about the Docle System hierarchies may be helpful. 5.9.1 Discussion The Docle System was described as having three levels of classification: “Phylum”, “Genus” and “Species”. These are discussed below… 5.9.1.1 Level-0 - the “Phylum” Unfortunately the first level of the Docle System has not been created “because the system has been constructed from the bottom-up rather than from the top-down” (Dr Kuang Oon). Documentation1 provided with the Docle System suggested that the entries might be… a) Medical Administration b) Symptoms Signs c) Diagnostic Non Imaging d) Diagnostic Imaging e) Procedures Process of Care 1 “Unitary Health Language – DocleScript” by Dr Yeong Kuang Oon of Docle Systems P/L, 29 Darryl St, Scoresby, Vic 3179 Australia 03-97638935 Don Walker (April 2004) 16 The Docle System Report f) Therapeutics g) TAMTAP- (Thinking About Medical Thinking and Practice) h) Reason for encounter i) Clinical Domains j) Context The absence of this level meant that the top level of the Docle system was an unorganised list of more than 700 concepts. 5.9.1.2 Level-1 - the “Genus” The first level of the Docle System hierarchy was called the “Genus”. It was an unorganised list of 719 unique docles. A Genus concept had the “^” character added to its docle in the “Genus to Species” and “Species to Genus” tables provided. Each genus had one or more children (or “Species”). 5.9.1.3 Level-2 – the “Species” The second level of the system was the “Species”. Each species had one (or more) genus as its parent. The species numbered 2,588 unique concepts. 5.9.1.4 Docle Orphan Concepts There were many concepts that had not been classified. They were neither Genus nor Species. In the poly-browser they were allocated the parent “Docle Orphan Concept”. They numbered 5,587. 5.9.1.5 Cyclic relationships in the Genus-to-Species table If a level-1 (“Genus”) concept is also in its level-2 (“Species”), then the hierarchy is illogical and “cyclic”. In other words, a “parent” must not be its own child. In the Genus-to-Species table supplied there were 798 “apparent cyclic relationships”. Note: The “apparent cyclic relationships” occurred because the parent genus is a “metaclass” in the Docle-System” where a child (species) may be its own parent metaclass. 24 Species Docles were absent and 13 Genus Docles were absent from the supplied “Terms-All” table. They thus had no “term”. In the Species-to-Genus supplied there were 811 “apparent cyclic relationships”. All Species Docles occurred in the supplies “Terms-All” table, however, 13 genus Docles were absent and therefore had no “term”. The expanded data contained in the Genus-to-Species and Species-to-Genus tables were expected to be the same, but it was not so. Some 415 relationships had to be added to the Genus-to-Species table from the Species-to-Genus table. The resulting merged table contained 3,566 relationships, of which 24 Species Docles were absent and 13 Genus Docles were absent from the supplied “Terms-All” table. Don Walker (April 2004) 17 The Docle System Report All “apparent cyclic relationships” were avoided. 5.9.1.6 Poly-hierarchies 227 level-1 (Genus) concepts were also children (Species) of other level-1 concepts. 279 Level-2 (Species) concepts were also parents (Genera) of other level-2 concepts These relationships form a poly-hierarchy (i.e. concepts with in multiple-hierarchies). A poly-hierarchy best reflects medical concepts in a computer based terminology. It is used by systems such as SNOMED and MeSH. On the other hand, single hierarchies tend to be used by classification systems. ICD9 and ICD10 are somewhat hybrid. They allocate different codes for the same concept in different hierarchical contexts (e.g. “dagger codes”). Note: A “Linnean hierarchy” is used by the creator of the Docle system to depict its structure. However, Linnaeus described and used a classification hierarchy in which items occurred only once. 5.9.1.7 Cyclic relationships in a poly-hierarchy A poly-hierarchy has the potential to be cyclic. This occurs when an ancestor or parent is also a child or descendent. An example and explanation is shown below… Generations 1 A B C 2 B D C E A F G B H [B] D C E A F 3 E [A] F G B H Dia gram 1: An example of “cyclic poly-hierarchies” An example of cyclic poly-hierarchies is shown in the above diagram. Consider a three first generation concepts called A, B, and C. Let us suppose their children (the second generation) are “B, D, C”, “E, A, F” and “G, B, H” respectively. In this example, the first generation also exists in the second. Consequently each has their descendents, forming a third generation. Those shown in square brackets are cyclic, as they are their own ancestor. Thirty poly-hierarchy cyclic relationships were identified in the docle data supplied. The offending relationships were deleted. The ten concepts involved were… Don Walker (April 2004) 18 The Docle System Report Hypo gluco Carticoid In Sufficiency adrenal Gland Pneumoconiosis S creatinine Urea Electrolytes Withdrawal drug Hypo mineralo corticoid Occupational Lung Disease Pulmonary Fibrosis S electrolyres Withdrawal syndrome Table 9: The concepts that had cyclic poly-hierarchy relationships 5.9.1.8 The “Type” of a Hierarchical relationship Hierarchical relationships are of the “type” of relationship called “is-a”. A child “is a ….” of its parent. The characteristics of an ancestor in such a hierarchy also apply to its descendents – i.e. they may be “inherited”. 5.9.2 Creation of the Hierarchy Table Hierarchical relations (“is-a” type relationships) define parents and children in the poly-browser. They are often described as “vertical relationships” They were implied in the supplied Docle System table called “Genus to Species” and “Species to Genus”. These were used to derive the hierarchical relationships of the Docle System in the poly-browser. The hierarchical relationships are stored in a “Hierarchical Table” in the polybrowser. It has two essential fields – (1) the Concept, and (2) its Parent. In computing terms it is an “acyclic directed graph”. An overview of the hierarchical relationships is shown below… Top Polybrowser Concept Other Systems… DOCLE Operator Concept 8 Operator Concept 1 Operator Concept 2 Operator Concept 3… DOCLE System 3 DOCLE Classified Concept 719 Genus Concepts 1 Genus Concepts 2 Genus Concepts 3… DOCLE Unclassified Orphan Concept 5587 Orphan Concept 1 Orphan Concept 2 Orphan Concept 3… Species Concept 1 Species Concept 2 Species Concept 3… Species Concept 4 Species Concept 5 Species Concept 6… Diagram 2: Overview of the hierarchical relationships at the top of the Docle tree Don Walker (April 2004) 19 The Docle System Report 5.10 Creation of Non-hierarchical Relationships 5.10.1 Relationship-types in SNOMED A concept may be related to other concepts with “types” of relationships other than an “is-a” type. These form “non-hierarchical relationships”. An example of the relationship types used by SNOMED is shown below… Undefined Approach Associated morphology Direct device Episodicity Has definitional manifestation Interprets Onset Priority Severity Access instrument Has specimen Recipient category Specimen source morphology Subject of information Is a (i.e. hierarchical type) Associated etiologic finding Causative agent Direct morphology Finding site Has focus Method Part of Procedure site Temporally follows Component Indirect morphology Specimen procedure Specimen source topography Access Associated finding Course Direct substance Has active ingredient Has intent Occurrence Pathological process Revision status Using Has interpretation Laterality Specimen source identity Specimen substance Table 10: An example of the relationship types used by SNOMED 5.10.2 Creation of the Non-hierarchical relationships As explained above, non-hierarchical relationships occur in a system when a concept is related to another with a type of relationship that is other than “is-a”. They are often referred to as “lateral relationships”. 5.10.2.1 The Docle Operators The Docle System contained compound-docles that were constructed from atomicdocles that were joined by “operators”. The operators, there meaning and their explanation are shown below.… Operator Meaning Example of Use APROPROS detachment@retina is read as detachment apropos / associated with retina. . LOCATED AT tuberculosis.kidney (tube.kidn) is read as tuberculosis located at kidney. > LEADING TO back@pain>buttock (back@pain@butt) is read as backpain radiating to buttock < DUE TO pneumonia<virus (pneu<viru) : DESCRIBED AS pain:dull is read as pain described as dull. QUANTIFICATION breast@lump,%2cm means lump at breast 2 cm in size. @ % Don Walker (April 2004) 20 The Docle System Report Operator Meaning Example of Use / INCREASED chest@pain/swallow (ches@pain/swal) reads as chest pain increases with swallowing. \ DECREASED chest@pain\food (ches@pain\food) reads as chest pain decreased with food. = NORMAL wcc= reads as white cell count is normal * ABNORMAL fbe* *l ABNORMAL low wcc*l means whiteCellCount abnormal low *h ABNORMAL High wcc*h means whiteCellCount abnormal high History of Surg! means Surgery history (of) Code Shear character &ctx@ill is the illness contextual organizer that can be sheared off contextual modifier ,%2cm is value 2 cm ! & , Table 11: The Docle System operators, there meaning and their explanation It was noted that despite the above approach, some use was made of docles (i.e. concepts) that appeared to have the same function as the above operators. They included… Docle abno decr incr norm Name abnormal decreased increased normal Table 12: Docle System concepts that appeared to have the same function as “Docle Operators” 5.10.2.2 Docle Non-Hierarchical Relations Non-hierarchical relationships were derived from the content and syntax of the docle. To enable the Docle System to co-exist with other systems in the poly-browser some modifications were made. These involved treating some docle-operators as concepts. They were thus created as “DOCLE operator concepts” and are shown in the table below… Docle Operator Don Walker (April 2004) Concept Name / Increased (Docle operator) \ Decreased (Docle operator) 21 The Docle System Report Docle Operator Concept Name = Normal (Docle operator) * Abnormal (Docle operator) *l Abnormal low (Docle operator) *h Abnormal high (Docle operator) ? Possible ! History of Table 13: “Docle Operators” that were converted to “Concepts” When these were used the “Relation Characteristic” in the poly-browser became “Qualifier”. (The “Relation Characteristic” is a refinement incorporated into the polybrowser and used by SNOMED.) When loaded into the poly-browser, the non-hierarchical “relationship-types” used by the Docle-System (and derived from docle-operators), were… Relationship-Type Docle Operator Prefix *Docle Starting ` Docle Apropos @ Docle Located at . Docle Leading to > Docle Due to < Docle Described as : Docle Quantification % Docle Context Modification , * The first relationship-type was created as a generic solution to undefined “starting atomic concepts”. Table 14: The non-hierarchical “relationship-types” used by the DocleSystem (and derived from docle-operators) Docle expressions following the “&” (ampersand) or “,” (comma) operator were allocated the “Relation Characteristic” of “Additional” indicating contextual information. Don Walker (April 2004) 22 The Docle System Report 5.11 Allocation of “Unique Identifiers” Docle “Concepts” and “Terms” were given unique identifiers to be used by the polybrowser. Numbers started at 6,020,001. 6 Loading the Docle System into the Poly-Browser The prepared data described above was imported into the poly-browser. When loaded, the poly-browser ran the following tests and operations o Test for cyclic hierarchical relationships o Test for redundant siblings o Count of children o Build enhanced keywords o Rebuild Poly-browser lexicon 6.1 Docle’s Appearance in the Poly-Browser The appearance of the Docle System in the Poly-browser is shown in the following pictures. A few SNOMED examples are included for comparison… Picture 1: A list of all the Docle concepts Don Walker (April 2004) 23 The Docle System Report Picture 2: The top of the Docle hierarchy Picture 3: The Docle “Operator” concept hierarchy Don Walker (April 2004) 24 The Docle System Report Picture 4: The hierarchy for “Abscess” Picture 5: The first hierarchy for “Abscess brain” - it has three parents. Don Walker (April 2004) 25 The Docle System Report Picture 6: The second hierarchy for “Abscess brain” - it has three parents. Picture 7: The third hierarchy for “Abscess brain” - it has three parents. Don Walker (April 2004) 26 The Docle System Report Picture 8: The terms for “Abortion Missed” Picture 9: The relationships and codes for “Abortion Missed” (Note: Because the poly-browser uses the “@” character as its “wild-card” when searching data, for technical reasons all Docles appear with their “@” characters changed to the “~” character.) Don Walker (April 2004) 27 The Docle System Report Picture 10: The terms for “Missed abortion” in the SNOMED system Picture 11: The relationships and codes for “Missed abortion” in the SNOMED system Don Walker (April 2004) 28 The Docle System Report Picture 12: The relationships and codes for “Chest pain” in the Docle system (Note: Because the poly-browser uses the “@” character as its “wild-card” when searching data, for technical reasons all Docles appear with their “@” characters changed to the “~” character.) Picture 13: The relationships and codes for “Chest pain” in the SNOMED system Don Walker (April 2004) 29 The Docle System Report 7 Further Reading The following references were supplied by Dr Kuang Oon Oon Y. K. ‘Docle - the coding scheme which comes with a free medical belief system’ HIC-APAMI 1997. Conference proceedings of the Health Informatics Society of Australia. Sydney, Australia. McGhee S et al (ed), 1997. Oon, Y. K. ‘The Gelati Syndrome’. HIC 2000. Conference proceedings of the Health Informatics Society of Australia. Adelaide, Australia. Pradhan M et al ( ed), 2000. Oon, Y. K. ‘DocleScript-unitary health language’ . Conference proceedings of the NCCH 7th Biennial Conference 2001 Potts Point, Sydney 2001. Oon, Y. K. ‘A unified theory of medical informatics based on multi and distinct semiotic forms’. Proceedings of the NCCH 8th Biennial Conference 2003 Melbourne 2003. Oon Y.K. ‘The emoticon charged Docletalk interface language’ Proceedings RACGP/HIC conference 2003 Sydney 2003 Oon Y. K. ‘Docletalk/DocleScript- the language and its implementation’ Docle Systems 29 Darryl St, Scoresby, Vic 3179, Australia. Book to be released Late 2003 8 Analysis An analysis of the DOCLE System revealed the following… 8.1 Supplied data Tables: The supplied tables have been described in detail earlier. They contained a somewhat bizarre mixture of data that was not easy to understand. Documentation: Documentation of the data structure was absent. Processing required: Processing required to load the supplied data into the polybrowser was large. Major parsing and re-building of the supplied data tables was required before they were able to be managed. 8.2 Content Size: The size of the files was small. Scope: The scope of the data was limited but presumably it is appropriate for general practice. Don Walker (April 2004) 30 The Docle System Report Detail: Detail was limited as would be expected in a small system designed for and used by general practitioners. However, presumably it is adequate as it is widely used. Complexity: Because the system is a poly-hierarchy with semantic interrelationships it is more complex than most. Pre-coordination: The level of pre-coordination in DOCLE was low. In the author’s opinion, the numbers of words in the concept descriptions offer a comparative measure of pre-coordination. Table 15 below shows the count of words in the descriptions for the most to the least quartiles for terms that describe the DOCLE concepts. Others have been included for comparison. Average word count in descriptions Most words 1/4 2nd 1/4 3rd 1/4 Least words 1/4 CATCH Terms 5.6 3.3 2.3 1.3 DOCLE Terms 4 3.6 2 1.4 ICD-10-AM Terms 10.4 6.2 4.4 2.8 ICPC-2 Rubric 6.3 4.29 3.36 2.12 ICPC-2 Plus Interfaceterm-concept 4.8 3.05 2.14 1.7 SNOMED-CT Terms 8 4.7 3.4 2.0 Description Type Table 15: The average words used in Terms might give a measure of pre-coordination Readiness for use: The descriptions and DOCLE identifiers are used widely in a current GP system. The system is not ready for general use, particularly regarding its hierarchies and mappings to ICD10am classification system. It would require a additional work before the potential of the system could be realised. Concept identifiers: Docles are the concept identifiers. They have meaning and contain the semantic interrelationship data. In a compound concept, they therefore identify the contained atomic concepts and their relationships. Concept descriptions: Main description (preferred terms) were supplied only after a specific request. Synonyms seemed plentiful. Hierarchies: The system has a poly-hierarchy. It is largely undeveloped. It seems to contain semantic “isa” relationships. These are of the usual type designed for navigation, where a child of a concept may be its parent plus the addition of an additional attribute e.g. “Back pain” is a child of “pain”. Interrelationships: The Docle identifiers define the semantic interrelationships between concepts. All compound (pre-coordinated) concepts are related to their atomic components. Associated data: There was no associated data supplied Mappings: A limited and simple “mapping” to ICD9 and ICD10AM is provided. Don Walker (April 2004) 31 The Docle System Report Supporting features: “Docle script” was described in documents for further reading. Its usefulness, functions, and robustness were not able to be estimated. System maintenance: There are several aspects to system maintenance. They include… (a) Organisation: The organisation supporting the system is one person. (b) People: Dr Oon alone maintains the system, including the creation of new terms. (c) Customers: The system has only one significant customer - Health Communication Network (HCN) which is the vendor of the “Medical Director” software. HCN in turn support many end-user general practitioners. (d) Content: GPs ask HCN to include various concepts. Dr Oon creates new concepts and adds terms according to requests received from HCN. (e) Computer tools: Dr Oon has developed his own computer tools to help in the creation and maintenance of the system’s contents. Details of these tools are not known. (f) Validation methods: Presumably some validation software is used. Finding only a few cyclic relationships in the hierarchy would suggest this. The fact that any were found perhaps suggests that the tools could be improved. (g) Updates: The system appears to be updated regularly in accordance with the needs of HCN and its users. 8.3 Machine located contents 8.3.1 Finding GP Vocabulary terms The report “Target System Analysis Using Term Matching Techniques” contains the results of looking for the contents of the GP Vocabulary terms in DOCLE. The pie-graph Figure 1 below, compares the number of GP-Vocabulary terms that were found to be “similar” in the five systems examined. CATCH, 961, 8% ICPC, 1540, 13% SNOM ED-CT, 4915, 39% ICD10AM , 1657, 14% DOCLE, 3186, 26% Figure 1: Pie-graph of the number of terms that were “similar” to those of the GP-Vocabulary test terms, for each target system. Don Walker (April 2004) 32 The Docle System Report The GP Vocabulary terms were machine matched to the terms in the target systems. The results were then analysed according to the frequency of use of the GP terms in the field. Matches were either “similar to” (blue-lower), “contained the GP term” (red-middle), or they were “unmatched” (yellow-upper). (See the graphs in Figure 2 below). 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% The most used 400 GP Vocabulary terms CATCH 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% ICD10AM ICPC2Plus SNOMEDCT The most used 3000 GP Vocabulary terms CATCH 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% DOCLE DOCLE ICD10AM ICPC2Plus SNOMEDCT All 11,583 valid GP Vocabulary terms CATCH DOCLE ICD10AM ICPC2Plus SNOMEDCT Figure 2 Matches according to the source-terms frequency of use in the field. Matches that were “similar to” are in blue at the bottom, “contained the GP term” are red in the middle and “non-matching” are yellow at the top. 8.3.2 Terms in SNOMED CT that were also in other systems An analysis was done of the terms that were the same (or “similar”) in SNOMED CT and CATCH, DOCLE, and ICPC2Plus. Matching ICD10AM-NCCH terms in SNOMED CT has been omitted because: 1. ICD10AM-NCCH terms were very numerous (335,451), being lexical variants and computer constructs from the index of the book. A very large number did not match. The results were Don Walker (April 2004) 33 The Docle System Report 2. deceptively bad at the term level, and less so at the concept level.2 This project was focussed on linking ‘Terms”, not “Concepts”. SNOMED CT was building mapping tables and rules to enable automatic coding of its interface concepts into ICD10. This work is complex and in progress. Matching was on the same basis as that described in the report titled: “Target System Analysis using Term Matching Techniques”. The terms that were considered “similar” were those that had any of the following matching characteristics… • Exact Match between terms • All the Same-words matched • All Words less ''Stop-words'' matched • Equivalent Term matched • Same-words less ''Exclude-words'' matched % of "Similar" TERMS CONCEPTS of the "Similar" terms % of CONCEPTS that were "Similar" 6249 2375 22.4 1356 21.7 DOCLE - All DOCLE - Atoms DOCLS - Others 23352 5233 18119 8652 3736 4916 6111 2803 3308 26.2 53.6 18.3 4340 2321 2019 50.2 62.1 41.1 2430 14811 ICPC2Plus - All ICPC2Plus – Nat. Lang. ICPC2Plus - Other 12689 5528 7161 7161 5528 7161 3695 1482 2213 29.1 26.8 30.9 2603 1482 2213 36.3 26.8 30.9 4046 4948 Totals 21987 16334 "Non-similar" CONCEPTS to be mapped or added "Similar" TERMS in SNOMED 10605 "Non-similar" TERMS to be mapped or added Total CONCEPT count CATCH - All Systems whose terms were sought in SNOMED CT terms Total TERM count The result of the analysis is shown in Table 16 below. DOCLE terms were divided into “atomic” and “other terms”, while ICPC2-Plus terms were divided into their “natural language terms” and “other terms”. 8230 4893 17241 4312 1415 2897 8994 4558 4046 4948 34465 13763 Table 16: Analysis of the Terms and Concepts from CATCH, DOCLE and ICPC2Plus that are the same (or “similar”) in SNOMED CT 8.4 Manually located contents Table 17 below shows the total number of manually created links made between the GP Vocabulary term subsets and the terms of the target systems. The percentage of the total number of attempted links is also shown. 2 Using a 23% sample of the 335,451 ICD10AM terms, 5.9% were found to be "similar" to TERMS in SNOMED. When the CONCEPT belonging to these terms were considered, then the match was a 20%. Don Walker (April 2004) 34 The Docle System Report 620 chronic problems 500 ramdom stratified terms 1120 combined terms Target System Total Links % of 620 Total Links % of 500 Total Links % of 1120 CATCH 213 34 153 31 366 33 DOCLE 354 57 274 55 628 56 ICD10AM 190 31 236 47 426 38 ICPC 189 30 210 42 399 36 SNOMEDCT 540 87 430 86 970 87 Table 17: the results of manually-linking 1,120 GP terms to the terms of the target systems 8.4.1 Comparing machine matches and manual linking Figure 3 below shows the results of (a) machine-matching and (b) manually-linking 1,120 GP terms to the terms of the target systems. Machine Match Manual Link Manual Link Machine Match SNOMED-CT ICPC ICD10AM DOCLE CATCH 1000 900 800 700 600 500 400 300 200 100 0 Figure 3: Graph of the machine-matched and manually-linked terms (from the combined “620 Chronic Problems” and “500 Stratified Random” subset) that were “similar” for each of the target systems 8.5 Relevance To general practitioners: The relevance of the Docle System to general practitioners has several aspects: (a) Current use: Current use by GPs is very large, as it is the terminology of the most used medical record system in Australia. Don Walker (April 2004) 35 The Docle System Report (b) Content: The content is presumably relevant as it has been created for GPs by GPs. (c) Decision support: There has been some use of the system in decision support software. The Medical Director (MD) software has some drug-disease alerts. The management of diabetes and asthma via the MD interface are current projects sponsored by GPCG and DHAC. At this stage, and for the above projects, it might be assumed the system is adequate. (d) Research: The Docle System has been “tried in the field” by HCN, GPs, and Divisions of GP. However, it is believed there has been no research that specifically tested and compared the Docle System for research use. To Healthcare generally: The relevance of the Docle System to wider healthcare domains has several aspects: (a) Current use: Current use is limited largely to general practice. (b) Content: Its contents would probably not suite some healthcare fields. 8.6 Ease of use and functionality: The ease of use and functionality of a system includes the following aspects… (a) Programmers: The ease of use and functionality of the system for programmers are a function of… o Supplied data formats: The bizarre data formats, and their need to be parsed and processed, could make the use of the system problematic - unless a simple list of terms with associated Docles was all that was utilised. o System maintenance and support: The system seems well supported. (a) Decision support: The ease of use and functionality of the system for builders of decision support subsystems are a function of… o Content: The required concepts must be in the system. This depends on system support. The system seems well supported. o Relationships: Compound concepts need to be related to their atomic components. The system is one of the very few that have this feature. o Hierarchies: The Docle System hierarchies are very deficient. Those that exist are designed for navigation or data aggregation. However decision support often refers to a concept and all its descendents. This can be problematic if descendents have been created by the inclusion of post-coordinated concepts comprising the parent (or ancestor) plus additional attributes (eg. “Pain in the ankle” being a child of “Pain”). The system rates well except for its hierarchies. (b) Primary users: The ease of use and functionality of the system for its primary users (namely general practitioners) are a function of… o Adequate concept scope, detail and maintenance o Plentiful synonyms acronyms and abbreviations o Few inappropriate terms Don Walker (April 2004) 36 The Docle System Report o The relationships between atomic-concepts and their compound-concept o Adequate and appropriate hierarchies o Good interface design: The system rates well for the first four. The hierarchies could be improved. The interface design is not a direct issue for the terminology (c) Secondary users: The ease of use and functionality of the system for secondary user (e.g. statisticians and researchers) are a function of… o Adequate content detail o The existence of atomic relationships on compound concepts o Mappings to required classification systems o Appropriate hierarchies The system provides all the above, however mappings and hierarchies may be deficient. 8.7 Cost and availability The cost and availability are unknown. However it is imagined that the system is of low cost and is readily available. 8.8 Strengths and implications The DOCLE System has the following strengths and implications… Strengths Implications Because of its inherent design, atomic parts (concepts) are always created for all compound concepts (i.e. pre-coordinated concepts). Utilisation by decision support and ad hoc searching is made easier. Pre- and postcoordinated concepts can be better equated. A poly-hierarchy is provided Concepts exist in several positions in hierarchies as they do in reality. The location and management of concepts and data is made easier Its content is apparently suited to GPs The system is suitable for general practice medical records It is in wide use by GPs in the “Medical Director” software application The system has been widely “field tested” over a significant time, so it fulfils current needs. It is used, to a limited extend, in some decision support applications The system seems suited to real decision support software applications The cost is presumed to be low and its availability high. There should be low market resistance based on cost. 8.9 Weaknesses, consequences and solutions The following table summarises the system’s weaknesses, points out their consequences, and offers possible solutions… Don Walker (April 2004) 37 The Docle System Report Weaknesses: Consequence: Possible solution: Content is limited to the domain of General Practice It may not suit the wider health domains The content of the system could be increase to cover other domains. This is a matter of funding. Support is limited to one individual Support resources and reliability are limited Increase funding to expand support Internationally it is a non-standard system International decision support, reporting, application and research software would tend to be incompatible with the Docle System Build and maintain translation tables from the Docle System to an internationally recognised system, or… Build and promote the Docle system so it is adopted internationally. Delivered tables are poorly organised and deficient in content The system is difficult to utilise fully in application programs. Reorganise the data into tidy and predictable formats Many atomic concepts are not specified There will be problems for decision support applications and the interchange ability of pre- and post-coordinated concepts Create and maintain atomic concepts. This was done when preparing Docle to be loaded into the poly-browser. Concept identifiers have meaning Dr Oon believes this is a “strength”. However it contravenes terminology desiderata standards. The result is that medical thinking and terms are frozen. Concepts could be identified by numbers. They need not be seen. Docles could exist if required, but rather than being the identifiers they would be abstracted by one level. They could then be changed. These things were done prior to loading into the poly-browser. Relationship identifiers are limited to a few specific punctuations There are potentially many more types of relationships than the punctuation characters that are available. Create a file of “relationship types” and give each relationship type a unique numeric identifier. Authoring software would be required (or need to be updated) to manage the additional table of relationships. These things were done prior to loading into the poly-browser. Relationships are a mixture of semantic types and “statuses” Unusual relationships are developed Convert the unusual relationships to those that are more conventional. This was done when preparing Docle to be loaded into the poly-browser. Hierarchies are few There will be problems with data aggregation, decision support, reporting and research Fund the rebuilding and expansion of the hierarchies. Don Walker (April 2004) 38 The Docle System Report Weaknesses: Consequence: Possible solution: Hierarchies allow children to result from the introduction of additional attributes. They are therefore designed for navigation and data aggregation Navigation hierarchies can cause problems for the interchange ability between pre- and postcoordinated concepts and for decision support and data matching rules that refer to “a concept and all its descendents”. Rebuild the hierarchies using appropriate rules. Cyclic relationships occurred Cyclic relationships are illegal and illogical. They lead to cyclic endless loops in software. Improve the authoring software. 9 Concluding remarks The Docle System is a small unconventional system that is widely used by general practitioners. It is the creation of one person who alone continues to support it. Its contents have been created by GPs for GPs and so are likely to be adequate and appropriate for that domain. The system is one of the very few to be created from atomic concepts. It therefore lends itself to decision support, flexible data collection and research. It has many weaknesses; all but one could be overcome. Being a non-standard system, from an international perspective, may not be solved. 10 Creator’s Comments The creator of the DOCLE System, Dr Kuang Oon, was invited to comment on this report. His comments were as follows: Paragraph 4.3 titled “Primary Key - Secondary Key” on page 10 Regarding issue of the 101 erroneous primary keys - this has been subsequently determined to be spurious and attributed to a faulty export from the Docle database. A subsequent file release from Docle systems had the problems eliminated. Kindly amend polybrowser to reflect the latest Docle release I sent you. In the above faulty examples, the spurious tertiary keys are allocated primary key status they do not deserve. The HCN -MD Docle files do not have these 101 erroneous codes. The corrected primarysecondary Docle file release I sent you reflects this aspect. Kindly amend table above/add new table to reflect the latest and corrected Docle release I sent you. Peter McIsaac, in his letter (2003), has indicated that this is no problem, and this episode has previously evoked some correspondence amongst us. As this is a serious matter that has arisen from a careless "clerical error", I hope you will elect not to make too much an issue of it. Besides incorporation of corrected data will only enhance use of the polybrowser. Paragraph 8.2 titled “Content” on page 30 The word bizarre is pejorative ...perhaps better stated as "innovative" or at worst "idiosyncratic" which "on initial inspection evokes mental resistance". Don Walker (April 2004) 39 The Docle System Report Complexity: Because the system is a poly-hierarchy with semantic interrelationships it is more complex than most. The embodiments of the codes are human readable and in that sense it is an easier to use coding system from the programmers viewpoint. Paragraph 8.6 titled “Ease of use and functionality:” on page 36 (a) Decision support: The ease of use and functionality of the system for builders of decision support subsystems are a function of • Content: The required concepts must be in the system. This depends on system support. The system seems well supported. • Relationships: Compound concepts need to be related to their atomic components. The system is one of the very few that have this feature. • Hierarchies: The Docle System hierarchies are very deficient. Those that exist are designed for navigation or data aggregation. However decision support often refers to a concept and all its descendents. This can be problematic if descendents have been created by the inclusion of additional attributes, The system rates well except for its hierarchies…. Docle is committed to populate every Docle concept in a Linnean framework with Object medica as the root object and with levels of phylum, class, order, family, genus and species. It is a work in progress. Paragraph 9 titled “Concluding remarks” on page 39 If the spirit moves...please append at the end of conclusion: Medical coding is undergoing rapid evolution. For decision support, detailed and specific information about the patient is required. For example we may need to code for a patient with rheumatoid arthritis treated with gold injections and need to retrieve this information outside the medical record context. For this type of coding - we will have combinatorial explosion using static reference coding by multiplying the number of syndromes versus the number of treatment modalities. We also need to code for medical heuristics. The solution to this problem is a compositional or propositional coding system. This is where Docle scales particularly well to a health language. Let's hope for progress in 2004 and best wishes to you. Kuang Oon Don Walker (April 2004) 40