AWT2/2012: Big Data Hype und Big Data Analytics
Transcription
AWT2/2012: Big Data Hype und Big Data Analytics
Big Data Hype und Big Data Analytics Was ist möglich - und brauchen wir es wirklich? © 2012 IBM Corporation Sebastian Welter 07 Nov 2012 © 2012 IBM Corporation Was ist BigData? … Big Data Analytics 2 © 2012 IBM Corporation Google Amazon Was ist BigData? … Big Data Analytics Microsoft Yahoo 3 Facebook © 2012 IBM Corporation Big Data = Technologie? No SQL 4 © 2012 IBM Corporation Big Data = Technologie? No SQL 5 © 2012 IBM Corporation Big Data = Technologie? No SQL Mittel zum Zweck Technologie alleine ist noch nicht BigData 6 © 2012 IBM Corporation Big Data = Big Volume? CERN LHC LCG = 100 Petabyte Library of Congress = 3 Petabyte Solar Dynamics Observatory = 1.5 Terabyte/Tag BBC Olympia London = max. 2.8 Petabyte / Tag 7 © 2012 IBM Corporation *LCG = CERN‘s Large Hadron Collider Computing Grid Big Data = Big Volume? Speichern Zugreifen 8 © 2012 IBM Corporation … Auswerten! Facebook = 25 GB / d Google Suchindex = ??? Petabyte 9 100 TB File Backup 10 TB Datenbank Subsecond ERP, Warehouse, … © 2012 IBM Corporation Big Data Analytics 10 © 2012 IBM Corporation Das Big in Big Data Analytics Volume 11 Velocity Variety Veracity © 2012 IBM Corporation Analyze Call Detail Records in real time • Streaming Analytic Accelerators –CDR dropped call analysis –Determine VIP customers with service issues – proactive alerts –CDR Adapters – ASN.1, Binary, ASCII –Analytic Operators – CDR deduplication, dropped call detection, termination reason, customer importance –Visualization – real-time KPI dashboard • Data Warehouse Appliance –Integrated network, devices, customer, and services model –Telecom model, KPIs, and KQIs 12 © 2012 IBM Corporation Welche Informationen stecken in meinen Texten? Handbücher Fehlerberichte Reparaturanweisungen Fehlerspeicher Wie löse ich mein Problem? Akten, Memos, emails… Wer? Mit wem? Warum? Beträge? Organisationen? © 2012 IBM Corporation Alle verfügbaren Informationen auswerten Sensorik: >1000 Informationen pro Sekunde / Patient 24 Stunden schneller lebensbedrohliche Infektionen erkennen 14 → http://www.youtube.com/watch?v=Lwnr5lf-k0o © 2012 IBM Corporation Pacific Northwest Smart Grid Demonstration Project Capabilities: Stream Computing – real-time control system Deep Analytics Appliance – analyze massive data sets Demonstrates scalability from 100 to 500K homes while retaining 10 years’ historical data 60k metered customers in 5 states Accommodates ad hoc analysis of price fluctuation, energy consumption profiles, risk, fraud detection, grid health, etc. © 2012 IBM Corporation Medien-Analyse • Analyse der Mediendaten • Beispiel USC: 250-300K Tweets mit einem 1700-Wörter Wörterbuch analysiert: Korrekte Vorhersage der Einspielergebnisse der 2011er Blockbuster 16 © 2012 IBM Corporation Reuters publiziert das Äquivalent von 9000 Seiten Finanznachrichten jeden Tag 5 neue Analyst/ResearchDokumente werden an der Wallstreet pro Minute erstellt. Vermögensverwalter erhalten bis zu 1, 000 E-Mails täglich. “The sheer amount of newsflow has made it difficult for people to take positions confidently” -Adam Margolis, Citi trader 1 - www.financial-domain.info/integrating-qualitative-and-quantitative-information/ 2- IBM Client experience with ForEx traders © 2012 IBM Corporation • „Die Bundesmarine klärt auf: IBM Content Analytics (ICA) im Flotteneinsatz“ 18 © 2012 IBM Corporation Insights aus einer simplen Umfrage Any comments or suggestions 19 © 2012 IBM Corporation Institutional Risk Extract Integrate © 2012 IBM Corporation ‘U.S. organizations lose an estimated 7 percent of annual revenues to fraud, representing almost $1 trillion in losses.’ – Association of Certified Fraud Examiners © 2012 IBM Corporation ‘The best way to reduce the amount of data: delete it.’ – Sheila Childs, Research Vice President, Gartner © 2012 IBM Corporation A Content Big Bang (not a Theory) Source: IBM market information © 2012 IBM Corporation Big Data Hype und Big Data Analytics Was ist möglich - und brauchen wir es wirklich? → Ja Vielen Dank! © 2012 IBM Corporation Sebastian Welter Technical Sales Consultant für ECM, BigData Analytics und Watson swelter@de.ibm.com © 2012 IBM Corporation 25 © 2012 IBM Corporation Some application examples Intelligent use of unstructured data NATO Armed forces uses the Technology for better surveillance results – detecting links between attacks and evaluating risks Intelligent Transportation GPS, Counts, speeds, travel times, public transport 250.000 GPS probes/s Wind Turbine Placement Watson-like Approaches Automotives in Germany speed up their after-sales and repair processes by supplying the right solution to the mechanic for a car’s problem PBs of data, sensors, … Analysis time to 3 days from 3 weeks Telco Promotions Intellectual Property & Research Automatic research and analysis on IP, research and patent – competition, investments, research strategies Enhance Quality Management Industry customers leverage the raw “voice of customer” as unstructured data for quality early warning 26 100,000 records/sec, 6B/day 10 ms/decision 270TB for Deep Analytics Large scale analytics Monitor & analyze ICU monitor data to recgonize infections earlier 120 new borns – 120,000 msg/s 24 hours faster infecion recognition © 2012 IBM Corporation 350B Transactions/Year Meter Reads every 15 min. 120M – meter reads/month 3.65B – meter reads/day © 2012 IBM Corporation Public wind data is available on 284km x 284 km grids (2.5o LAT/LONG) More data means more accurate and richer models (adding hundreds of variables) - Vestas wind library at 2.5 PB: to grow to over 6 PB in the near-term - Granularity 27km x 27km grids: driving to 9x9, 3x3 to 10m x 10m simulations Reduced turbine placement identification from weeks to hours Perspective: The Vestas Wind library, as HD TV would take 70 years to watch 29 © 2012 IBM Corporation Optimize building energy consumption with centralized monitoring and control of building monitoring system Automates preventive and corrective maintenance of building corrective systems Uses Streams, InfoSphere BigInsights and Cognos 30 - Log Analytics Energy Bill Forecasting Energy consumption optimization Detection of anomalous usage Presence-aware energy mgt. Policy enforcement © 2012 IBM Corporation Quellen - soweit nicht IBM Präsentationen entnommen – • Slide 8 – – – – – • http://www.tink.ch/new/article/2010/04/12/lhc-und-cern-kurz-und-buendig/ http://blog.zeit.de/open-data/2011/09/15/weltraum-fur-alle/ http://www.computerworlduk.com/news/infrastructure/3375569/bbc-delivered-petabytes-of-content-in-a-singleolympics-day/ BBC Logo entnommen der Wikipedia, Copyright BBC LHC Bild Copyright CERN Slide 8 & 10 – 31 Apache Hadoop Logo: http://de.wikipedia.org/w/index.php?title=Datei:Hadoop-logo.jpg Solar Dynamics Observatory Bilder und NASA Logo: Copyright NASA © 2012 IBM Corporation