My 2 cents in this Open Data Economy

Transcription

My 2 cents in this Open Data Economy
Big Data for Future
Energy and Urban
Infrastructures:
Challenges and
Opportunities
Presented at
CyberGIS Initiative
University of Illinois at Urbana
Champaign
Budhendra Bhaduri
Corporate Research Fellow
March 19, 2014
Champaign, IL
Overview of presentation
• Background
• Geospatial data driven computing
• Urban dynamics
– Modeling population distribution and dynamics
• Energy assurance
– Energy and transportation
– Crop monitoring from streaming data
• Challenges and Future directions
– My data in Big data
– Open data economy
Managed by UT-Battelle
for the Department of Energy
In the Garden of Big and Evil
• How BIG is big?
– What we can’t handle today
• Its not about the needle, but about defining
the haystack!
• R&D community could
– Define the needle (new challenges)
– Do better in finding the needle (novel analytics)
– Make a bigger haystack (keep adding data)
Managed by UT-Battelle
for the Department of Energy
Geographic data has driven innovation
1790
First US Census. 3.9 million people counted.
1810
1850
Census collects data on manufacturing, quantity, and value of products.
Census collects data on taxation, churches, pauperism, and crime.
1890
1896
Punched-card tabulating machines are used to count 63 million people.
Tabulating Machine Company (TMC) formed.
1911
1924
TMC becomes C-T-R (Computing-Tabulating-Recording) Company.
C-T-R becomes International Business Machines (IBM) Corporation.
1963
1969
First use of the phrase “geographic information system”.
First commercial GIS software companies formed.
1972
1985
NASA launches first earth observation satellite (Landsat 1).
First GPS satellite launched.
2001
2004
Keyhole Corp. creates dynamic 3D mapping of geographic information.
Google Earth initiates the “Web Wide World” and visual discovery.
20??
Intelligent Locational Awareness: Real time access to multidimensional
information of locations, states, and environments of entities.
Managed by UT-Battelle
for the Department of Energy
• Beginning of the future....
Managed by UT-Battelle
for the Department of Energy
Big Data: the battle of ‘V’s
www.datasciencecentral.com
enterpriseresilienceblog.typepad.com
6 Presentation name
Too Big or not too Big?
• Our interest focuses on two questions:
– What might happen (predictive)?
– Why that happens (explanatory)?
• More data doesn’t always make it better
– Applications are often time sensitive
• Big Data should enable both Big and small
science
Managed by UT-Battelle
for the Department of Energy
Behavior
– A better answer (new knowledge)
– A quicker answer or savings in time/energy/money
– Enable individuals through organizations
Geographic
Scale
Fidelity/Accuracy
• Big Data analytics must provide a clear
ROI or value proposition
Resolution
Time
Spatial Distribution of U.S. Household
Carbon Footprints Reveals
Suburbanization Undermines Greenhouse
Gas Benefits of Urban Population Density
8
Managed by UT-Battelle
for the U.S. Department of Energy
Christopher Jones *† and Daniel M. Kammen *†‡§
†Environ. Sci. Technol., 2014, 48 (2), pp 895–902
KUB_0905
Our energy challenges and solutions
are often local to regional
Energy savings potential is a
macro-level (regional to
national) phenomenon driven
by individual socioeconomic
behavior at the micro-level
(local)
Homes
Work
Useful insights will come
from characterizing
interactions among
human, energy, and
transportation networks
Gas stations
(regional)
Shop
Banks
(state)
Gas stations
(regional)
Managed by UT-Battelle
for the Department of Energy
Cleaners
Schools
Homes
Success of future strategies
depends on understanding
complexity and consequences
of proposed systems in which
energy, environment and
mobility interests are
simultaneously optimized
Urban Science and Informatics focus on
• Discovering emerging behavior of urban
systems over large spatial and temporal
scales (at unprecedented resolution)
Interactive and Interoperable Visualization
• Data streams generated
– by individual components (sensors) of
infrastructure networks (energy, water,
transportation, telecommunication,…)
– by users of infrastructure (human network)
• Physical, cyber, data, and computational
infrastructures
– Consistent platform for data, analytics, and
simulations as service
• Increased ability to monitor, measure and
evaluate impacts and benefits from
individual to enterprise scale
10 Presentation name
Development of High Performance,
Scalable Simulations
Analysis Models and Tools Development
Knowledgebase Creation
Dynamic Collection, Integration,
Management and Dissemination of
Disparate Data Resources
Improving knowledge
of population dynamics: LandScan
Population
Road
Railroads
• Census
Polygons
• VMAP
• 1:100K
national
railway
network
• TeleAtlas
• Tract-toMultinet
tract worker
• TIGER;
flow
• NTAD
• BLS
quarterly
updates
Land cover/
land use
Slope
• Geocover
• DTED
Academic
institutions
Prisons
Hospitals
Business
employment
• InfoUSA
• Department • National Jail • American
Education Census
of
Hospital
• Pitney
• MODIS
• LiDAR
Association
•
HSIP
•
HSIP
Bowes
• National
• National
(AHA)
Schools
Prisons
Land Cover Elevation
• Dunn and
Bradstreet
Data
Data (NED) • ESRI
(NLCD)
• GDT
• State GIS
Night
Day
LandScan Global
• Spatial resolution of 30 arc seconds (~1km)
• Ambient population (average of 24 hours)
• Remote sensing based global data modeling and mapping
LandScan USA
Managed by UT-Battelle
for the Department of Energy
• Spatial resolution of 3 arc seconds (~90m)
• Nighttime and daytime population
• Integration of infrastructure and activity databases
Imagery
• EarthViewer
• Terra Server
• Google
Spatial refinement of LandScan Global
Managed by UT-Battelle
for the Department of Energy
HPC Based Imagery Analysis
Moving from Modeling to Mapping
Urban
Urban
Managed by UT-Battelle
for the Department of Energy
Addis Ababa, Ethiopia

2 Xeon Quad core 2.4GHz
CPUs + 4 Tesla GPUs +
48GB

Image analyzed (0.3m)
 40,000x40,000 pixels
(800 sq. km)
 RGB bands

Overall accuracy 93%
 Settlement class 89%
 Non-settlement class
94%

Total processing time
Managed by UT-Battelle
for the Department of Energy
 27 seconds
Settlements are economic indicators
Managed by UT-Battelle
for the Department of Energy
Patterns in overhead imagery
Higher Income
Middle Income
Managed by UT-Battelle
for the Department of Energy
Lower Income
Edge Orientation Distribution
Unplanned Settlement
0.025
Probability
0.02
0.015
0.01
0.005
0
-100
-80
-60
-40
-20
0
theta
20
40
60
80
100
Edge Orientations
Planned Settlement
0.045
0.04
Peakness in the distribution around edge orientations
separated by 90 degrees is a good indicator for planned
settlements.
Managed by UT-Battelle
for the Department of Energy
Probability
0.035
0.03
0.025
0.02
0.015
0.01
0.005
0
-100
-80
-60
-40
-20
0
theta
20
40
Edge Orientations
60
80
100
Neighborhood mapping: From local
interactions to global realizations
•Unstructured
Settlements
•Lowest to lower
middle income
•Rural migrants
•Very loosely structured
•Historical ethnic
quarters/neighborhoods
•Poor residents currently
being displaced in some
areas with urban
development/tourism
Damascus, Syria
Managed by UT-Battelle
for the Department of Energy
•Formal Urban Planning
•Typical Urban Services
•Middle to Upper Income
Population density data from open source
Managed by UT-Battelle
for the Department of Energy
PDT Smart Sampling Tool
Managed by UT-Battelle
for the Department of Energy
Urbanization challenges
21
Managed by UT-Battelle
for the U.S. Department of Energy
KUB_0905
Assessing Population Dynamics
• Model based assessments
– Activity based
– Variable space and time scales
• Dynamic tracking of people and
vehicle fleet movement from
multisensor data
– Video
– Cell Phones
– Social media
Managed by UT-Battelle
for the Department of Energy
Managed by UT-Battelle
for the Department of Energy
24 Presentation name
25 Presentation name
Critical infrastructure data development
We develop and maintain spatially enabled, foundation level data for a number
of critical infrastructures for research and operational communities.
Hospitals
26 Presentation name
Prisons
Day-care
Centers
Rail lines/Rail
points
Solid Waste
Landfills
Mobile Home
Parks
Energy Data
Layers
U.S. mobile home parks database
High within-class variation
Need for scalable solutions
Total Area: 9.827 million sq. km
Covered by ≈ 9.8 Trillion pixels of resolution 1m x 1m
27 Presentation name
Automated and scalable detection
• Data: NAIP 1m imagery
• 8423 samples of 300m x 300m
patches
• State of TX
– Order of magnitude improvement
compared to open source compilation
Database point
Detected point
• ~1500 points by manual
compilation (9 manweeks)
• ~15,000 points by
automated detection (1
week)
28 Presentation name
Online Detection of Anomaly, Change and
Change Point from Space-Time Data
Potere, D., Feierabend, N., Bright, E., Strahler, A. “Walmart from Space: A New Source for
Land Cover Change Validation” Photogrametric Engineering and Remote Sensing. Vol 74. July
2008.
Managed by UT-Battelle
for the Department of Energy
Wide area biomass monitoring in near
real time is becoming a reality
• MODIS Tile (4800x4800 pixels)
– ~23 million locations/time series
– 161 time steps (bi-weekly over 7
years)
• FROST: An SGI Altrix ICE 8200
Cluster at ORNL
– 128 compute nodes each with 16
virtual cores and 24 GB of RAM
• Multicore (multithreaded) and
Distributed (message passing)
computing strategy
Managed by UT-Battelle
for the Department of Energy
Serial
• 41,105 seconds (11.4
hours)
Threads (16)
• 5,872 seconds (1.6 hours)
MPI (96
nodes)
• 604 seconds (10 minutes)
MPI + Threads
• 34 seconds
(1536 cores)
Bioenergy Knowledge
Discovery Framework
Facilitate informed decision by providing a means to synthesize,
analyze, and visualize vast amounts of information
• Integration of ~1500 data and map services; knowledgebase, models,
and advanced analytical tools
• Dynamic mapping for Billion Ton Update database (45 million records)
• Programmatic cost savings and reusability (Energy Geoplatform for
Open Data Initiative)
Managed by UT-Battelle
for the Department of Energy
http://bioenergykdf.net
31
The Prosumer generation
• Traditional surveys methods are no longer
useful
– Alarmingly low response rates
– Landline phone and mail based
– Technology infusion can help
• Past methods will not be economically viable
• Progress and commoditization of geospatial,
cyber, and communication technologies will
only increase data production and use
– Internet interest groups and social networks
– Real time streaming platforms
• Crowdsourcing is an effective strategy
– Hotline, Tipline, Amber Alert
Managed by UT-Battelle
for the Department of Energy
The unknown unknowns
Sustainability
Credibility
Ownership
Privacy
Managed by UT-Battelle
for the Department of Energy
• But is a 50% nonrandom
sample better than a 5%
random sample?
Citizen science is making an impact
Level 4 Extreme
Citizen Science
• Collaborative science-problem
definition, data collection, and analysis
Level 3 Participatory • Participation in problem definition and
Science
data collection
Level 2 Distributed
Intelligence
Level 1
Crowdsourcing
• Citizens as basic interpreters
• Citizens as sensors
Courtesy: Dr. Muki Haklay, UCL
Managed by UT-Battelle
for the Department of Energy
Noise mapping around London Heathrow
Managed by UT-Battelle
for the Department of Energy
Resource conservation in Africa
Managed by UT-Battelle
for the Department of Energy
37 Presentation name
Disasters make population data obsolete
• Loss and dispersion of population
• Capturing population redistribution
is critical at many time scales
– Earthquake aftershocks (minutesdays)
– Hurricanes (weeks to months)
– Sea level rise (years to decades)
• Space based observation only
interprets land cover
– Flood damage is often difficult to
detect for structures
• Is crowdsourcing a strategy?
– Active (including self disclosure)
– Passive (social media)
Managed by UT-Battelle
for the Department of Energy
Real Time Rome: MIT Senseable City Lab
“……Ratti's team obtains its data anonymously from cell phones,
GPS devices on buses and taxis, and other wireless mobile
devices, using advanced algorithms developed by Telecom
Italia, the principal sponsor of the project. These algorithms
are able to discern the difference between, say, a mobile
phone signal from a user who is stuck in traffic and one that
is sitting in the pocket of a pedestrian wandering down the
street. Data are made anonymous and aggregated from the
beginning, so there are no implications for individual privacy.”
Managed by UT-Battelle
for the Department of Energy
http://radar.oreilly.com/2007/07/real-time-rome-using-cellphone.html
Social networking and self disclosure
• "Latitude" is being marketed as a tool that could
help parents keep tabs on their children's locations,
but it can be used for anyone to find anyone else,
assuming permission is given.”
• “…allow you to share that location with friends and
family members, and likewise be able to see
friends and family members' locations"
• "To protect privacy, Google specifically requires
people to sign up for the service. People can share
their precise location, the city they're in, or nothing
at all."
Managed by UT-Battelle
for the Department of Energy
http://www.cbsnews.com/stories/2009/02/04/earlyshow/leisure/gamesgadgetsgizmos/main4774320.shtml
Social networking and self disclosure
• “Foursquare just snagged its six millionth
member…”
• “Foursquare, the social network that
allows members to communicate with
acquaintances by “checking in” to
locations they patronize, is breaking with
its own traditions by allowing users to
“check in” to the Super Bowl even if
they’re not attending the game in person.”
Managed by UT-Battelle
for the Department of Energy
http://blogs.wsj.com/digits/2011/02/06/foursquare-changes-rules-for-super-bowl-tie-in/
Emergency service model feasibility
• What if I could subscribe to an on demand
service that tracks me?
– When I feel I may end up needing assistance
• There are a few challenges as well
– Defining the nature of the service (what do I
exactly get)
– Accuracy of the location data
– Geographic coverage and continuity of service
• Are you willing to instrument our national parks?
– Contractual obligations
• Under what circumstances are the services
guaranteed?
Managed by UT-Battelle
for the Department of
Energy
Big Data and democracy
Managed by UT-Battelle
for the Department of Energy
Empowering the individual citizen
Managed by UT-Battelle
for the Department of Energy
Information nutrition?
• Information is as critical as
food and energy
• How much data does one
produce and need?
• Is there an obesity effect?
Managed by UT-Battelle
for the Department of Energy
Titled playing field?
Managed by UT-Battelle
for the Department of Energy
Democratizing My Data in Big Data
• A new sharing paradigm
• Privacy is a concern, but has
resulted in overcorrected
systems
• It’s my privacy, so is my data
• Individual motivations and
incentives are a key driver
Managed by UT-Battelle
for the Department of Energy
I want control and my 2 cents
• Enabling individuals to manage
their own data
• Options and ease of
participating in the open data
economy
• Self-interest can often be key
• Technology can help
Managed by UT-Battelle
for the Department of Energy
Crowdsourcing: Points to ponder
• Crowdsourced information clearly augment space-based
data
– Increase density and resolution of data (Gap filling) e.g. NetQuakes
– Enhance currency and quality of observation and model data
(incidence report, damage qualification, and local knowledge)
– The media makes great use of it (CNN iReport, Weather Channel)
• Traditional top-down spatial data quality standard doesn’t
work
– When’s good is good enough (user defined and fit for purpose)
• When does crowdsourcing make the system vulnerable?
– Reliability of the crowd and crowd fatigue (are there disaster
magnitude and frequency thresholds similar to relief funds)
– Digital divide, victim crowd, and system overuse
– Social, legal, and ethical concerns
Managed by UT-Battelle
for the Department of Energy
Challenges and Opportunities
Social
• Recruiting the crowd could benefit from high-profile volunteer
catalysts
• The crowd may not be aware of engagement opportunities
• Success may be locally variable because of cultural differences
Legal
• Expectation of privacy is a variable standard
• Legal standards are not clearly defined and understood
• Self disclosure could be an effective way to address privacy
Ethical
Managed by UT-Battelle
for the Department of Energy
• Does this involve deceptive principles (instrumenting national
parks, GPS and battery life)?
• Should we promote the crowd as only volunteers?
• Self disclosure may come with expectations of service guarantee
Oak Ridge National Laboratory:
Meeting the challenges of the 21st century
Managed by UT-Battelle
for the Department of Energy
www.ornl.gov