AWT2/2012: Big Data Hype und Big Data Analytics

Transcription

AWT2/2012: Big Data Hype und Big Data Analytics
Big Data Hype und Big Data Analytics
Was ist möglich - und brauchen wir es wirklich?
© 2012 IBM Corporation
Sebastian Welter
07 Nov 2012
© 2012 IBM Corporation
Was ist BigData?
… Big Data Analytics
2
© 2012 IBM Corporation
Google
Amazon
Was ist BigData?
… Big Data Analytics
Microsoft
Yahoo
3
Facebook
© 2012 IBM Corporation
Big Data = Technologie?
No SQL
4
© 2012 IBM Corporation
Big Data = Technologie?
No SQL
5
© 2012 IBM Corporation
Big Data = Technologie?
No SQL
Mittel zum Zweck
Technologie alleine ist noch nicht BigData
6
© 2012 IBM Corporation
Big Data = Big Volume?
CERN LHC LCG = 100 Petabyte
Library of Congress = 3 Petabyte
Solar Dynamics Observatory = 1.5 Terabyte/Tag
BBC Olympia London = max. 2.8 Petabyte / Tag
7
© 2012 IBM Corporation
*LCG = CERN‘s Large Hadron Collider Computing Grid
Big Data = Big Volume?
Speichern
Zugreifen
8
© 2012 IBM Corporation
… Auswerten!
Facebook = 25 GB / d
Google Suchindex
= ??? Petabyte
9
100 TB
File Backup
10 TB Datenbank
Subsecond ERP, Warehouse, …
© 2012 IBM Corporation
Big Data Analytics
10
© 2012 IBM Corporation
Das Big in Big Data Analytics
Volume
11
Velocity
Variety
Veracity
© 2012 IBM Corporation
Analyze Call Detail Records in
real time
• Streaming Analytic
Accelerators
–CDR dropped call analysis
–Determine VIP customers with
service issues – proactive alerts
–CDR Adapters – ASN.1, Binary,
ASCII
–Analytic Operators – CDR deduplication, dropped call detection,
termination reason, customer
importance
–Visualization – real-time KPI
dashboard
• Data Warehouse Appliance
–Integrated network, devices,
customer, and services model
–Telecom model, KPIs, and KQIs
12
© 2012 IBM Corporation
Welche Informationen stecken in
meinen Texten?
Handbücher
Fehlerberichte
Reparaturanweisungen
Fehlerspeicher
Wie löse ich
mein Problem?
Akten, Memos, emails…
Wer? Mit wem? Warum?
Beträge? Organisationen?
© 2012 IBM Corporation
Alle verfügbaren Informationen auswerten
Sensorik: >1000 Informationen pro Sekunde / Patient
24 Stunden schneller lebensbedrohliche Infektionen erkennen
14
→ http://www.youtube.com/watch?v=Lwnr5lf-k0o
© 2012 IBM Corporation
Pacific Northwest Smart
Grid Demonstration Project
Capabilities:
Stream Computing – real-time
control system
Deep Analytics Appliance – analyze
massive data sets
Demonstrates scalability from 100
to 500K homes while retaining 10
years’ historical data
60k metered customers in 5 states
Accommodates ad hoc analysis of price
fluctuation, energy consumption profiles,
risk, fraud detection, grid health, etc.
© 2012 IBM Corporation
Medien-Analyse
•
Analyse der Mediendaten
•
Beispiel USC: 250-300K Tweets mit einem
1700-Wörter Wörterbuch analysiert:
 Korrekte Vorhersage der Einspielergebnisse
der 2011er Blockbuster
16
© 2012 IBM Corporation
 Reuters publiziert das
Äquivalent von 9000 Seiten
Finanznachrichten jeden
Tag
 5 neue Analyst/ResearchDokumente werden an der
Wallstreet pro Minute
erstellt.
 Vermögensverwalter
erhalten bis zu 1, 000
E-Mails täglich.
“The sheer amount of newsflow has made it difficult for people to
take positions confidently”
-Adam Margolis, Citi trader
1 - www.financial-domain.info/integrating-qualitative-and-quantitative-information/
2- IBM Client experience with ForEx traders
© 2012 IBM Corporation
• „Die Bundesmarine klärt auf: IBM
Content Analytics (ICA) im
Flotteneinsatz“
18
© 2012 IBM Corporation
Insights aus einer simplen Umfrage
Any comments or suggestions
19
© 2012 IBM Corporation
Institutional Risk
Extract
Integrate
© 2012 IBM Corporation
‘U.S. organizations
lose an estimated
7 percent of annual
revenues to fraud,
representing almost
$1 trillion in losses.’
– Association of Certified
Fraud Examiners
© 2012 IBM Corporation
‘The best way to reduce
the amount of data:
delete it.’
– Sheila Childs, Research
Vice President, Gartner
© 2012 IBM Corporation
A Content Big Bang (not a Theory)
Source: IBM market information
© 2012 IBM Corporation
Big Data Hype und
Big Data Analytics
Was ist möglich - und
brauchen wir es wirklich?
→ Ja
Vielen Dank!
© 2012 IBM Corporation
Sebastian Welter
Technical Sales Consultant für
ECM, BigData Analytics und Watson
swelter@de.ibm.com
© 2012 IBM Corporation
25
© 2012 IBM Corporation
Some application examples
Intelligent use of
unstructured data
NATO Armed forces uses the Technology
for better surveillance results –
detecting links between attacks and
evaluating risks
Intelligent Transportation
GPS, Counts, speeds,
travel times, public transport
250.000 GPS probes/s
Wind Turbine Placement
Watson-like Approaches
Automotives in Germany speed up their
after-sales and repair processes by
supplying the right solution to the
mechanic for a car’s problem
PBs of data, sensors, …
Analysis time to 3 days
from 3 weeks
Telco Promotions
Intellectual Property &
Research
Automatic research and analysis on IP,
research and patent – competition,
investments, research strategies
Enhance Quality Management
Industry customers leverage the raw
“voice of customer” as unstructured
data for quality early warning
26
100,000 records/sec, 6B/day
10 ms/decision
270TB for Deep Analytics
Large scale analytics
Monitor & analyze ICU monitor data
to recgonize infections earlier
120 new borns – 120,000 msg/s
24 hours faster infecion recognition
© 2012 IBM Corporation
350B
Transactions/Year
Meter Reads
every 15 min.
120M – meter reads/month
3.65B – meter reads/day
© 2012 IBM Corporation
 Public wind data is available on 284km x
284 km grids (2.5o LAT/LONG)
 More data means more accurate and richer
models (adding hundreds of variables)
- Vestas wind library at 2.5 PB: to grow to
over 6 PB in the near-term
- Granularity 27km x 27km grids: driving to
9x9, 3x3 to 10m x 10m simulations
 Reduced turbine placement identification
from weeks to hours
 Perspective: The Vestas Wind library, as
HD TV would take 70 years to watch
29
© 2012 IBM Corporation
 Optimize building energy
consumption with centralized
monitoring and control of building
monitoring system
 Automates preventive and
corrective maintenance of
building corrective systems
 Uses Streams, InfoSphere
BigInsights and Cognos
30
-
Log Analytics
Energy Bill Forecasting
Energy consumption optimization
Detection of anomalous usage
Presence-aware energy mgt.
Policy enforcement © 2012 IBM Corporation
Quellen
- soweit nicht IBM Präsentationen entnommen –
•
Slide 8
–
–
–
–
–
•
http://www.tink.ch/new/article/2010/04/12/lhc-und-cern-kurz-und-buendig/
http://blog.zeit.de/open-data/2011/09/15/weltraum-fur-alle/
http://www.computerworlduk.com/news/infrastructure/3375569/bbc-delivered-petabytes-of-content-in-a-singleolympics-day/
BBC Logo entnommen der Wikipedia, Copyright BBC
LHC Bild Copyright CERN
Slide 8 & 10
–
31
Apache Hadoop Logo: http://de.wikipedia.org/w/index.php?title=Datei:Hadoop-logo.jpg
Solar Dynamics Observatory Bilder und NASA Logo: Copyright NASA
© 2012 IBM Corporation