Challenges in Geospatial Data Handling and Modeling
Transcription
Challenges in Geospatial Data Handling and Modeling
IIIT-H Challenges in Geospatial Data Handling and Modeling • Established in 1998 – a new star in the Indian Technical Education scenario • Ranked 7th among Tech-Schools in India (DataQuest, 2008; among Tech Universities in South Asia, 2009) • A research university K S Rajan Lab for Spatial Informatics, International Institute of Information Technology Hyderabad rajan@iiit.ac.in – ICT (CSE, ECE) – Application Domains • Research Centers and NO Departments Garuda-NKN Partners Meet, July 25-26, 2013. @Bangalore • GeoSpatial Technologies? – Open Street Maps, Bing, Google Earth – Car Navigation • What does location mean to you? – Just a point in space? – A clue for what is happening in its neighbourhood – Help discover Larger spatial-temporal phenomena – like Climatic, health/disease surveillance Computation – as a way of life!! • Early stages of Computer Science – Data storage and handling – Mathematical process • Mathematical theory – a main stay of Computing theory • Applications – Large data handling • Sector-based approaches Science, BFSI, Census • Age of Internet – Information handling Geospatial Technologies – Location is it a variable or a constant? – Geospatial Information Systems – Modifying Computer Science • Simple Map Visualization to Web-based Map mashups • DB to Geo/Spatial DB; Spatial Data Mining • Analysis – Statistical to evolving field of Spatio-Statistical tools – BI to Geo-BI • Modelling and Simulation – Eg., Complex Climate-Social-Economic Integrated Modelling – Remote Sensing - again, Volumes of Data to Information – a still struggling journey Computing and Domains • Computing Paradigm - in all domains • Interactions between Computing and the domains – A One-way street? – Or Bi-directional? • Disciplinary and Multi-disciplinary interests – Opening up new paradigms of Computing • The Indian Context Need to move from Data to Information 1 Computer Science – GeoICT Main functions of GIS • Waking up to the exciting world of – Algorithm development – Graph theory, etc – Multi-dimensional Data • Data structures and Data bases • Data Mining – Spatio-Temporal Mapping & Visualization: CG, Visual data manipulation, Presentation, Automated Mapping GIS – Graphics and Visualization • 3D – GRID-TIN from either/or to Hybrid – Parallel Computing – Information Extraction and Retrieval – Software Engineering Spatial Analysis: Rule and Relation based Analysis, Simulations, Agent based modelling Spatial Data Base: Data Collection & Generation, Retrieval, Editing, Updating, Build Spatio-Temporal data relation (data quality), Inventory GIS: Another Application Area to a Frontier in CS/IT Challenges GI Science ??? • Technology (IT) – enabler ? – More data (single info layers to mash-ups) – More accessibility – Evolution of Intelligent Decision Support Systems • Rapid changes in IT infra GI Systems - Primary focus on tools GI Services Data & More Data – Are our data models good enough? – How Location aware technologies can seamlessly talk to say, Spatial Data Infrastructure • World of Sensing, Data collation, and Separating the chaff Source: Longley, Goodchild, Maguire, Rhind [2001] Domain related Challenges • Use of Tools in GIS Application areas – Convenience of use – BUT, Limitations to Scientific Discovery – NEED, more and better parameterization – Cadastral Mapping • Mismatch between satellite derived data vs Land Records VRGeo: Collaborative Mapping Platform (1) • Crowd-sourcing of Spatial Data – GPS based inputs – Satellite Images / Raster based • Use any WMS data in the background • Attributes based on needs – Local sourcing – SMS2Map feature – Structuring Unstructured data • Centralised Geo-DBSMS based input Developed by: IIIT-H; Hosted by GARUDA @ CDAC 2 VRGeo @ Davanagere VRGeo – for Slum Mapping in Pune 3 Challenges in the Indian Context VRGeo – SMS Interface • Has to provide for Incremental Design – Can’t get agencies to share data till they see value in it • Language Localization – Data Collection to Manipulation to GeoDB management • An SMS based Mapping Initiative • Working with local Organisations – WASSAN – for Livestock diseases, Agri pests, Groundwater studies – Hyderabad Urban Labs – for water bodies • Data Interoperability – Application driven Formats and Parameters – unification of Data Model will take time • A Generic Framework for ANY Theme • SMS content 509932vit.ls.co.fmd.500.5.2 (27chars) • Re-designing for Human disease surveillance (IIPH) • Data Ownership + Security – Though largely Govt., Distributed Authorities – Map Policy & RS Policy of India Kriti4SOUL: Citizen Initiative in integrating Geo-Intelligence Kriti4SOUL – Lake monitoring • Mobile based Data Capture – Near-real time – Geo-tagged Images – Observation Recorded by Text / Voice • Centralised Geospatial Data System – Visualization and Assimilation – Analytical Report Generation • Interactive Model of g-Governance Developed by a StartUp KAIINOS.com for SOUL VRGeo: Collaborative Mapping Platform (2) Can Urban Floods be modeled as a 3D Dynamic phenomena? • Further plans • 3D Visualization of a Phenomenon – GPS device detection and upload – Natural Terrain – water flow – Discrete Object space ? • GPS-Babel hacked – Village level data generation, correction and update – Case studies based Semantic Standardization • Flood spread in an Urban Environment • Challenges • Cultural / Language aspects • Attribute specifications – Data representation – TIN / GRID / Lattice – Large Near-real time data handling • GPU based processing tools D 4 Flood sequence modelled Hydrological Modeling, Embarrassingly Parallel Computing, Near-real time Analytics, Computer Vision, Current GIS mobility landscape GIS computation timeline • Data collection on field • Data Processing-Off field • Framework – Mobile nodes – Communication infrastructure – Computational hubs • Reliability and topological stability questionable Centralized computing Open Source and GIS Future of GIS and computing • Future applications – resource and computation hungry • Relevance of GIS expanding • Community based computing challenges current paradigms • Mobility gaining importance • Computation amalgamating with mobility • • • • • GRASS GIS favoured open source distribution Well documented sequential code Large body of applications Stable performance across sequential platforms Starting point for our work 5 What Parallelization achieved GRASS GIS Applications – flood modeling • Mapcalc – Fundamental set of application – Building block for multiple applications – Embarrassingly parallel application – Speedup of 5-6X over a sequential implementation • Terracost – Direct application of algorithm by Hazel, T., Toma, L., Vahrenhold, J., and Wickremesinghe, R. Terracost: Computing least-cost-path surfaces for massive grid terrains. J. Exp. Algorithmics 12 (Jun. 2008),1-31. – Speedup gained 6X over sequential. Advantages of amalgamation of HPC and mobility • • • • Real time computation On-field analysis of data Reduction of response time Suitable for disaster prevention and management – Communication not a bottleneck – Efficient and fast response – Real time updates possible • Can nurse expansion of role of GIS LSI Slide Analysis Mining Spatio-Temporal Invariant Core (MiSTIC) Study of Rainfall Patterns in Monsoonal India • Reference set of focal points determine the number of cores that should be identified. Analysis has been done for 7 reference focal points. For each of the 56 years, set of valid focal points are detected and zones are created for each of them. For the analysis in this study, only a subset of the data with contiguous landmass of the mainland India with non-extreme climatic Figure 7: TwentyTwenty-five zones created, each marked by different behavior (Central and color, for each of the twentytwenty-five focal points highlighted with dark Peninsular India) is brown color for entire India in 1991. 1991. The color bar has the considered. corresponding zone ID Ref Focal Point Core type Core size Max Freq (%) #NF Years #NS Years P1 CC 5 ~32% 4 13 CLD CR 5 ~32% 7 5 CLD CC 3 50% 1 2 CHD CR 3 50% 0 0 CHD CC 1 ~28.5% 6 34 CND CLD P2 • For this analysis, a point is considered frequent in a core if it has occurred at the same place within that core for more than three years (i.e. min_sup = 5%) out of 56 years. • The following table has the conditions to classify cores as CHD/CLD/CND Core min_freq max_pruneTS Type with T=56 years CHD >=60% (~34) <=10% (~6) CLD >=25% (~14) & <60% (~33) >10% (~7) & <=33%(~19) CND <25% (<=13) >33% (>=20) P3 Classification (CHD/CLD/CN D) CR 6 ~32% 9 9 P4 CC 3 ~52% 1 7 CLD P5 CR CC 3 2 ~59% 62.5% 0 5 0 10 CHD CLD CR 3 ~64% 12 0 CLD CC 4 ~39% 3 9 CLD CR 5 ~39% 3 1 CHD CC 2 ~8% 17 31 CND CR 5 ~34% 8 6 CLD P6 P7 6 MiSTIC - Summary • The detection of these core regions, especially the CHD can help detect phenomena that exhibit highly localized occurrences over time. • Changes in climatic pattern over long periods may be discovered by observing whether a given region has changed from say CHD to CLD or to CND, if analyzed over decadal time periods. Disease Occurrence Patterns using MiSTIC: Study of Salmonellosis in Florida • For the monsoonal rainfall phenomena, it is observed in this work that CR is a better indicator of the core regions. This could be attributed to the dynamic nature of the Monsoonal rainfall in India. Salmonellosis Disease in Florida • New York [1994-2010] • Map shows Disease “hot-spots” • Valuable insight into disease prevalence • 12 out of 19 Counties are Rural (Non-Metropolitan Statistical Areas) Sanitation-related factors – 18 out of 21 cores in either rural (12) or urban-rural fringe (6) zone. CC Focal Polygons D E C F New York Metropolitan Statistical Areas Lab for Spatial Informatics Zoning-based Analysis: An alternative spatial correlation model? D Spatio-Temporal Data Mining Results from MiSTIC C Overlay of MSAs & Cores IIIT 60 Hyderabad Results - I • Zoning - Boundaries leading to disease area delineation E F A A B Lab for Spatial Informatics G B G Waterway Networks - FL IIIT 61 Hyderabad Lab for Spatial Informatics Road Networks - FL Overlay - FL IIIT 62 Hyderabad 7 Prediction Results – I (Bay County, FL) Results - II • Zoning - Boundaries leading to disease area delineation Waterway Networks - NY Road Networks - NY Overlay - NY Lab for Spatial Informatics Theory Auto Geo-registration Classification techniques - time-series - Spatial Data Mining Moving Objects – Data & Analysis Drought Monitoring GML and Geo-Web Irrigation Mapping Lab for Spatial Informatics IIIT Hyderabad Constrained Networks Thank You !! Spatial Statistics & Spatial Data Mining Spatial Information Parallel Computing Extraction Eg.Roads & Mobility Cropping Season For More, See THESIS on A Data-driven Framework for Extraction of Spatio-Temporal Manifestations of Dynamic Processes Geo-Spatial Information Systems OBIA, Fusion, Change Detection Applications 45/5 IIIT 63 Hyderabad Remote Sensing CANSAT Competition 40/10 Locating News & Geo-context Algorithms for Airline Industry Tessellations and Mobile Network Planning Land Use Modeling VRGeo – Collab Mapping FOSS4G LSIViewer Environment / Policy / System Building 8