My 2 cents in this Open Data Economy
Transcription
My 2 cents in this Open Data Economy
Big Data for Future Energy and Urban Infrastructures: Challenges and Opportunities Presented at CyberGIS Initiative University of Illinois at Urbana Champaign Budhendra Bhaduri Corporate Research Fellow March 19, 2014 Champaign, IL Overview of presentation • Background • Geospatial data driven computing • Urban dynamics – Modeling population distribution and dynamics • Energy assurance – Energy and transportation – Crop monitoring from streaming data • Challenges and Future directions – My data in Big data – Open data economy Managed by UT-Battelle for the Department of Energy In the Garden of Big and Evil • How BIG is big? – What we can’t handle today • Its not about the needle, but about defining the haystack! • R&D community could – Define the needle (new challenges) – Do better in finding the needle (novel analytics) – Make a bigger haystack (keep adding data) Managed by UT-Battelle for the Department of Energy Geographic data has driven innovation 1790 First US Census. 3.9 million people counted. 1810 1850 Census collects data on manufacturing, quantity, and value of products. Census collects data on taxation, churches, pauperism, and crime. 1890 1896 Punched-card tabulating machines are used to count 63 million people. Tabulating Machine Company (TMC) formed. 1911 1924 TMC becomes C-T-R (Computing-Tabulating-Recording) Company. C-T-R becomes International Business Machines (IBM) Corporation. 1963 1969 First use of the phrase “geographic information system”. First commercial GIS software companies formed. 1972 1985 NASA launches first earth observation satellite (Landsat 1). First GPS satellite launched. 2001 2004 Keyhole Corp. creates dynamic 3D mapping of geographic information. Google Earth initiates the “Web Wide World” and visual discovery. 20?? Intelligent Locational Awareness: Real time access to multidimensional information of locations, states, and environments of entities. Managed by UT-Battelle for the Department of Energy • Beginning of the future.... Managed by UT-Battelle for the Department of Energy Big Data: the battle of ‘V’s www.datasciencecentral.com enterpriseresilienceblog.typepad.com 6 Presentation name Too Big or not too Big? • Our interest focuses on two questions: – What might happen (predictive)? – Why that happens (explanatory)? • More data doesn’t always make it better – Applications are often time sensitive • Big Data should enable both Big and small science Managed by UT-Battelle for the Department of Energy Behavior – A better answer (new knowledge) – A quicker answer or savings in time/energy/money – Enable individuals through organizations Geographic Scale Fidelity/Accuracy • Big Data analytics must provide a clear ROI or value proposition Resolution Time Spatial Distribution of U.S. Household Carbon Footprints Reveals Suburbanization Undermines Greenhouse Gas Benefits of Urban Population Density 8 Managed by UT-Battelle for the U.S. Department of Energy Christopher Jones *† and Daniel M. Kammen *†‡§ †Environ. Sci. Technol., 2014, 48 (2), pp 895–902 KUB_0905 Our energy challenges and solutions are often local to regional Energy savings potential is a macro-level (regional to national) phenomenon driven by individual socioeconomic behavior at the micro-level (local) Homes Work Useful insights will come from characterizing interactions among human, energy, and transportation networks Gas stations (regional) Shop Banks (state) Gas stations (regional) Managed by UT-Battelle for the Department of Energy Cleaners Schools Homes Success of future strategies depends on understanding complexity and consequences of proposed systems in which energy, environment and mobility interests are simultaneously optimized Urban Science and Informatics focus on • Discovering emerging behavior of urban systems over large spatial and temporal scales (at unprecedented resolution) Interactive and Interoperable Visualization • Data streams generated – by individual components (sensors) of infrastructure networks (energy, water, transportation, telecommunication,…) – by users of infrastructure (human network) • Physical, cyber, data, and computational infrastructures – Consistent platform for data, analytics, and simulations as service • Increased ability to monitor, measure and evaluate impacts and benefits from individual to enterprise scale 10 Presentation name Development of High Performance, Scalable Simulations Analysis Models and Tools Development Knowledgebase Creation Dynamic Collection, Integration, Management and Dissemination of Disparate Data Resources Improving knowledge of population dynamics: LandScan Population Road Railroads • Census Polygons • VMAP • 1:100K national railway network • TeleAtlas • Tract-toMultinet tract worker • TIGER; flow • NTAD • BLS quarterly updates Land cover/ land use Slope • Geocover • DTED Academic institutions Prisons Hospitals Business employment • InfoUSA • Department • National Jail • American Education Census of Hospital • Pitney • MODIS • LiDAR Association • HSIP • HSIP Bowes • National • National (AHA) Schools Prisons Land Cover Elevation • Dunn and Bradstreet Data Data (NED) • ESRI (NLCD) • GDT • State GIS Night Day LandScan Global • Spatial resolution of 30 arc seconds (~1km) • Ambient population (average of 24 hours) • Remote sensing based global data modeling and mapping LandScan USA Managed by UT-Battelle for the Department of Energy • Spatial resolution of 3 arc seconds (~90m) • Nighttime and daytime population • Integration of infrastructure and activity databases Imagery • EarthViewer • Terra Server • Google Spatial refinement of LandScan Global Managed by UT-Battelle for the Department of Energy HPC Based Imagery Analysis Moving from Modeling to Mapping Urban Urban Managed by UT-Battelle for the Department of Energy Addis Ababa, Ethiopia 2 Xeon Quad core 2.4GHz CPUs + 4 Tesla GPUs + 48GB Image analyzed (0.3m) 40,000x40,000 pixels (800 sq. km) RGB bands Overall accuracy 93% Settlement class 89% Non-settlement class 94% Total processing time Managed by UT-Battelle for the Department of Energy 27 seconds Settlements are economic indicators Managed by UT-Battelle for the Department of Energy Patterns in overhead imagery Higher Income Middle Income Managed by UT-Battelle for the Department of Energy Lower Income Edge Orientation Distribution Unplanned Settlement 0.025 Probability 0.02 0.015 0.01 0.005 0 -100 -80 -60 -40 -20 0 theta 20 40 60 80 100 Edge Orientations Planned Settlement 0.045 0.04 Peakness in the distribution around edge orientations separated by 90 degrees is a good indicator for planned settlements. Managed by UT-Battelle for the Department of Energy Probability 0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 -100 -80 -60 -40 -20 0 theta 20 40 Edge Orientations 60 80 100 Neighborhood mapping: From local interactions to global realizations •Unstructured Settlements •Lowest to lower middle income •Rural migrants •Very loosely structured •Historical ethnic quarters/neighborhoods •Poor residents currently being displaced in some areas with urban development/tourism Damascus, Syria Managed by UT-Battelle for the Department of Energy •Formal Urban Planning •Typical Urban Services •Middle to Upper Income Population density data from open source Managed by UT-Battelle for the Department of Energy PDT Smart Sampling Tool Managed by UT-Battelle for the Department of Energy Urbanization challenges 21 Managed by UT-Battelle for the U.S. Department of Energy KUB_0905 Assessing Population Dynamics • Model based assessments – Activity based – Variable space and time scales • Dynamic tracking of people and vehicle fleet movement from multisensor data – Video – Cell Phones – Social media Managed by UT-Battelle for the Department of Energy Managed by UT-Battelle for the Department of Energy 24 Presentation name 25 Presentation name Critical infrastructure data development We develop and maintain spatially enabled, foundation level data for a number of critical infrastructures for research and operational communities. Hospitals 26 Presentation name Prisons Day-care Centers Rail lines/Rail points Solid Waste Landfills Mobile Home Parks Energy Data Layers U.S. mobile home parks database High within-class variation Need for scalable solutions Total Area: 9.827 million sq. km Covered by ≈ 9.8 Trillion pixels of resolution 1m x 1m 27 Presentation name Automated and scalable detection • Data: NAIP 1m imagery • 8423 samples of 300m x 300m patches • State of TX – Order of magnitude improvement compared to open source compilation Database point Detected point • ~1500 points by manual compilation (9 manweeks) • ~15,000 points by automated detection (1 week) 28 Presentation name Online Detection of Anomaly, Change and Change Point from Space-Time Data Potere, D., Feierabend, N., Bright, E., Strahler, A. “Walmart from Space: A New Source for Land Cover Change Validation” Photogrametric Engineering and Remote Sensing. Vol 74. July 2008. Managed by UT-Battelle for the Department of Energy Wide area biomass monitoring in near real time is becoming a reality • MODIS Tile (4800x4800 pixels) – ~23 million locations/time series – 161 time steps (bi-weekly over 7 years) • FROST: An SGI Altrix ICE 8200 Cluster at ORNL – 128 compute nodes each with 16 virtual cores and 24 GB of RAM • Multicore (multithreaded) and Distributed (message passing) computing strategy Managed by UT-Battelle for the Department of Energy Serial • 41,105 seconds (11.4 hours) Threads (16) • 5,872 seconds (1.6 hours) MPI (96 nodes) • 604 seconds (10 minutes) MPI + Threads • 34 seconds (1536 cores) Bioenergy Knowledge Discovery Framework Facilitate informed decision by providing a means to synthesize, analyze, and visualize vast amounts of information • Integration of ~1500 data and map services; knowledgebase, models, and advanced analytical tools • Dynamic mapping for Billion Ton Update database (45 million records) • Programmatic cost savings and reusability (Energy Geoplatform for Open Data Initiative) Managed by UT-Battelle for the Department of Energy http://bioenergykdf.net 31 The Prosumer generation • Traditional surveys methods are no longer useful – Alarmingly low response rates – Landline phone and mail based – Technology infusion can help • Past methods will not be economically viable • Progress and commoditization of geospatial, cyber, and communication technologies will only increase data production and use – Internet interest groups and social networks – Real time streaming platforms • Crowdsourcing is an effective strategy – Hotline, Tipline, Amber Alert Managed by UT-Battelle for the Department of Energy The unknown unknowns Sustainability Credibility Ownership Privacy Managed by UT-Battelle for the Department of Energy • But is a 50% nonrandom sample better than a 5% random sample? Citizen science is making an impact Level 4 Extreme Citizen Science • Collaborative science-problem definition, data collection, and analysis Level 3 Participatory • Participation in problem definition and Science data collection Level 2 Distributed Intelligence Level 1 Crowdsourcing • Citizens as basic interpreters • Citizens as sensors Courtesy: Dr. Muki Haklay, UCL Managed by UT-Battelle for the Department of Energy Noise mapping around London Heathrow Managed by UT-Battelle for the Department of Energy Resource conservation in Africa Managed by UT-Battelle for the Department of Energy 37 Presentation name Disasters make population data obsolete • Loss and dispersion of population • Capturing population redistribution is critical at many time scales – Earthquake aftershocks (minutesdays) – Hurricanes (weeks to months) – Sea level rise (years to decades) • Space based observation only interprets land cover – Flood damage is often difficult to detect for structures • Is crowdsourcing a strategy? – Active (including self disclosure) – Passive (social media) Managed by UT-Battelle for the Department of Energy Real Time Rome: MIT Senseable City Lab “……Ratti's team obtains its data anonymously from cell phones, GPS devices on buses and taxis, and other wireless mobile devices, using advanced algorithms developed by Telecom Italia, the principal sponsor of the project. These algorithms are able to discern the difference between, say, a mobile phone signal from a user who is stuck in traffic and one that is sitting in the pocket of a pedestrian wandering down the street. Data are made anonymous and aggregated from the beginning, so there are no implications for individual privacy.” Managed by UT-Battelle for the Department of Energy http://radar.oreilly.com/2007/07/real-time-rome-using-cellphone.html Social networking and self disclosure • "Latitude" is being marketed as a tool that could help parents keep tabs on their children's locations, but it can be used for anyone to find anyone else, assuming permission is given.” • “…allow you to share that location with friends and family members, and likewise be able to see friends and family members' locations" • "To protect privacy, Google specifically requires people to sign up for the service. People can share their precise location, the city they're in, or nothing at all." Managed by UT-Battelle for the Department of Energy http://www.cbsnews.com/stories/2009/02/04/earlyshow/leisure/gamesgadgetsgizmos/main4774320.shtml Social networking and self disclosure • “Foursquare just snagged its six millionth member…” • “Foursquare, the social network that allows members to communicate with acquaintances by “checking in” to locations they patronize, is breaking with its own traditions by allowing users to “check in” to the Super Bowl even if they’re not attending the game in person.” Managed by UT-Battelle for the Department of Energy http://blogs.wsj.com/digits/2011/02/06/foursquare-changes-rules-for-super-bowl-tie-in/ Emergency service model feasibility • What if I could subscribe to an on demand service that tracks me? – When I feel I may end up needing assistance • There are a few challenges as well – Defining the nature of the service (what do I exactly get) – Accuracy of the location data – Geographic coverage and continuity of service • Are you willing to instrument our national parks? – Contractual obligations • Under what circumstances are the services guaranteed? Managed by UT-Battelle for the Department of Energy Big Data and democracy Managed by UT-Battelle for the Department of Energy Empowering the individual citizen Managed by UT-Battelle for the Department of Energy Information nutrition? • Information is as critical as food and energy • How much data does one produce and need? • Is there an obesity effect? Managed by UT-Battelle for the Department of Energy Titled playing field? Managed by UT-Battelle for the Department of Energy Democratizing My Data in Big Data • A new sharing paradigm • Privacy is a concern, but has resulted in overcorrected systems • It’s my privacy, so is my data • Individual motivations and incentives are a key driver Managed by UT-Battelle for the Department of Energy I want control and my 2 cents • Enabling individuals to manage their own data • Options and ease of participating in the open data economy • Self-interest can often be key • Technology can help Managed by UT-Battelle for the Department of Energy Crowdsourcing: Points to ponder • Crowdsourced information clearly augment space-based data – Increase density and resolution of data (Gap filling) e.g. NetQuakes – Enhance currency and quality of observation and model data (incidence report, damage qualification, and local knowledge) – The media makes great use of it (CNN iReport, Weather Channel) • Traditional top-down spatial data quality standard doesn’t work – When’s good is good enough (user defined and fit for purpose) • When does crowdsourcing make the system vulnerable? – Reliability of the crowd and crowd fatigue (are there disaster magnitude and frequency thresholds similar to relief funds) – Digital divide, victim crowd, and system overuse – Social, legal, and ethical concerns Managed by UT-Battelle for the Department of Energy Challenges and Opportunities Social • Recruiting the crowd could benefit from high-profile volunteer catalysts • The crowd may not be aware of engagement opportunities • Success may be locally variable because of cultural differences Legal • Expectation of privacy is a variable standard • Legal standards are not clearly defined and understood • Self disclosure could be an effective way to address privacy Ethical Managed by UT-Battelle for the Department of Energy • Does this involve deceptive principles (instrumenting national parks, GPS and battery life)? • Should we promote the crowd as only volunteers? • Self disclosure may come with expectations of service guarantee Oak Ridge National Laboratory: Meeting the challenges of the 21st century Managed by UT-Battelle for the Department of Energy www.ornl.gov