Semantic Big Data – from Dingoes to Drysdale
Transcription
Semantic Big Data – from Dingoes to Drysdale
e-Research Lab School of ITEE, UQ Jane Hunter <j.hunter@uq.edu.au> The University of Queensland Research Projects Eco-informatics ◦ ◦ ◦ ◦ ◦ Automatic Analysis of Animal Accelerometry Data OzTrack Springbrook Wireless Sensor Network Analysis 3D Coal Seam Gas Water Quality Atlas Automated Online Reef Report Cards Biomedical-informatics ◦ Skeletome Digital Humanities ◦ 20th Century Paint – art conservation ◦ 3D Semantic Annotations – museum artefact classification ◦ Post-war Qld Architecture – oral history archive E-Social Sciences ◦ Indigenous Housing Semantic Annotation of Animal Accelerometry Data Animal-attached accelerometers monitor animal movement and behavior Tri-axial data streams Large volumes of complex data Lack of visualization, analysis tools and share-ability Lack of analysis & pattern identification services Free-ranging wild animal behavior Lack of ground truth data … Endangered Species Feral Pests Production livestock User Driven Requirements Step 1 Upload data Step 2 Activity recognition running walking resting walking running Step 3 – Analysis and Visualization Walking, Running, Resting, Sleeping, Feeding Animal Health Energy Consumption Food/Water Requirements feeding walking Objectives Web-based semantic annotation and activity recognition system to enable biologists to Share tri-axial accelerometer data Visualize and analyze tri-axial accelerometer data Share expert knowledge Help scientists understand the movement and behavior of animals Use surrogates/domestic animals (& video) to train classifier –> automatically tag rare, wild, feral animals User Interface - Tagging Screenshot of SAAR Plot-Video interface and the annotation interface User Interface – Automated Results Screenshot of the SAAR Interface with human activity identification results Evaluation Tested on range of species (different sizes and gaits) ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ Australian dingo (Canis lupus dingo) Eurasian badger (Meles meles) Bengal tiger (Panthera tigris tigris) African cheetah (Acinonyx jubatus), American alligator (Alligator mississippiensis) Hairy-nosed wombat (Lasiorhinus krefftii) Eastern Grey kangaroo (Macropus giganteus) Short-beaked echidna (Tachyglossus aculeatus) Running, Walking, Standing, Sitting, Lying (Sternal recumbency) Test dog classification module on range of species Results High scores (80-90%) if SL:SH = (2-3) (Spine length: Spine height) Benefits Subscribers login to online service Leverage community expertise to tag training sets Develop libraries of classifiers for different species Apply domestic species classifiers to wild species Dogs ->dingos, foxes; birds->bats; horses->camels Classifiers – improve over time as more data uploaded Socio-economic and health benefits: ◦ livestock productivity – assess health, energy/food needs ◦ reduce spread of feral pests & viruses ◦ management & conservation of threatened species OzTrack Overlay of Camel Tracks on Vegetation Semantic Sensor Networks Lianli Gao UQ Michael Bruenig CSIRO 125 sensor nodes : • Air temperature • Humidity • Wind Speed • Leaf Wetness CSIRO Sensor Network Semantic Fire Weather Index Calculate FWI from: wind speed, relative humidity, temperature Limitations: - Widely distributed sensors (tens of km apart) - Updated once per day - Urgent need for data with higher spatiotemporal resolution System Architecture Combine SPARQL inference rules with an Inverse Distance Weighting to calculate accurate spatial distributions Comparison with BoM FWIs Skeletome A community-driven knowledge curation platform for Skeletal Dysplasias ◦ ◦ ◦ ◦ Rare diseases Affect the development of Human Skeleton Complex medical issues Caused by genetic abnormalities Capture, integrate, correlate and analyse clinical, radiographic, phenotypic and genetic data Verne Troyer Cartilage-Hair Hypoplasia RMRP Peter Dinklage Danny Devito Achondroplasia Multiple Epiphyseal Dysplasia (MED) (MOST COMMON DISORDER) FGFR3 COL9A2, COL9A3 COMP, MATR3 Challenges Hundreds of different types ◦ 440 types in 40 groups Difficult to diagnose, treat Few medical publications Doctors rely on: ◦ Existing patient data ◦ Expert knowledge Requirements Common terminology Data Integration Data Quality Control Knowledge Extraction and Transfer Privacy Expertise sharing The Platform Patient Archive Knowledge Bone Dysplasia Ontology Base Reasoning Knowledge Base of Disorders Written Abstracts Linked Genes ISDS Grouping X-Rays Phenotypes Inline Editing Knowledge Base Disorders Written Abstracts Linked Genes ISDS Grouping X-Rays Phenotypes Inline Editing Knowledge Base Disorders Written Abstracts Linked Genes ISDS Grouping X-Rays Phenotypes Inline Editing Knowledge Base Disorders Written Abstracts Linked Genes ISDS Grouping X-Rays Phenotypes Inline Editing Knowledge Base Disorders Written Abstracts Linked Genes ISDS Grouping X-Rays Phenotypes Inline Editing Knowledge Base Disorders Written Abstracts Linked Genes ISDS Grouping X-Rays Phenotypes Inline Editing Knowledge Base Disorders Written Abstracts Linked Genes ISDS Grouping X-Rays Phenotypes Inline Editing Patient Sharing Sharing patients with multiple doctors - X-rays - Clinical Summaries - Genetic reports Anonymizes patient data Patientst Discussing a Patient Inline commenting Text posts with PubMed Integration Community Diagnoses Discussing a Patient Inline commenting Text posts with PubMed Integration Community Diagnoses Discussing a Patient Inline commenting Text posts with PubMed Integration Community Diagnoses Discussing a Patient Inline commenting Text posts with PubMed Integration Community Diagnoses Discussing a Patient Inline commenting Text posts with PubMed Integration Community Diagnoses Entity Term Extraction Phenotype Extraction Diagnosis Extraction Entity Term Extraction Phenotype Extraction Diagnosis Extraction Entity Term Extraction Phenotype Extraction Diagnosis Extraction Reasoning across Knowledge Base • Analyze Diagnoses, Phenotypes, Genotypes - Across Patients and Publications • Extract/infer new relationships - Disease <-> phenotypes <-> genotypes - Capture provenance, certainty, temporality, severity, polarity Aboriginal Housing Crisis Aboriginal communities: Inferior housing Inferior neighbourhoods Low home ownership More live in public housing Greater overcrowding, homelessness Move house more frequently -> Adverse impact on health, well-being and education [Dockery A.M., Ong R., Colquhoun S., Li J., Kendall, G. (2013), “Housing and children’s development and well-being: evidence from Australian data”, AHURI Final Report No 201, March 2013] Remote Regional Metropolitan Central Desert Dubbo, Mt Isa Woodridge, Redfern Plan Housing Policies/ Strategies Implement Adapt Housing Programs/Investments/ Actions Compare against Targets - What works? - What doesn’t? Regional & cultural factors; Crowding & homelessness; Quality of Life Indicators; Socio-economic Indicators; Targets Monitor/measure - Regional needs analysis - Housing programs tailored to local context Regional/Cultural Factors Australian Census 2011 Challenges Inaccurate data Anonymized data – post-code level of geography What are the optimum data sources/indicators for successful housing programs? For a given region, what are the most significant factors that need to be considered, to satisfy the housing needs of the local Indigenous community? What are the optimum governance structures that combine ◦ economies of scale ◦ localized approaches informed by Aboriginal Community Councils? Data Sources Quantitative data: ◦ ◦ ◦ ◦ ABS data on Aboriginal health and welfare, population and housing; AURIN - “Social and Economic Indicators for Indigenous Communities” IRSEO - Index of Relative Indigenous Socioeconomic Outcomes; Data from the State/Territory Housing Departments, ICHOs and Community Councils; Qualitative data: ◦ LSIC – Longitudinal Study of Indigenous Children; ◦ 2002 and 2008 National Aboriginal and Torres Strait Islander Survey (NATSISS); ◦ HILDA (Household, Income and Labour Dynamics in Australia) Survey; Publications: ◦ AHURI and FaHCSIA reports; ◦ Australian Policies Online; Map sources: ◦ past ATSIC boundaries data (wards and regions) (Geosciences Australia); ◦ current FaHCSIA regions for Indigenous Coordination Centres ; ◦ AIATSIS Aboriginal Australia Map. Indigenous Housing Ontology Housing Policies, Housing Programs Housing Types Property Management Tenancies Regional Demographics Quality of Life indicators ◦ Quantitative – ABS data ◦ Qualitative Data – surveys/interviews Targets ◦ Reduce overcrowding by 50% by 2020 ◦ Reduce homeless by 50% by 2020 ◦ Improve QoL indicators by 30% Mapping Interface + R/Matlab Services - Choose datasets, region and start/end times - Understand the impact of regional, cultural and socio-economic factors on Aboriginal Housing programs - link housing data to quality-of-life indicators Commonalities Unstructured Data Ontology Registries Data Quality Machine Learning Statistical Analysis Inferencing Rules Experts Structured Data Marked-up Data Scalable RDF Triple Stores/ RDF Graphs - Curated Knowledge - Training Corpuses - Case Studies - Annotations KnowledgeBase Multi-variate 3D/4D Spatio-temporal Dynamic Streaming Textual Integrated Data Diagnosis Classification Decision Support Modellling Application Services Contact Jane Hunter <j.hunter@uq.edu.au> eResearch Lab at the University of Queensland http://www.itee.uq.edu.au/~eresearch