Analytics auf z Systems
Transcription
Analytics auf z Systems
Analytics auf z Systems Martin Schneider Manager, Analytics Center of Excellence IBM Research and Development, Boeblingen mdschnei@de.ibm.com Agenda Warum Analytics? Hybrider Ansatz: Transaktionen + Analytics Unterstützung auf z Systems Neues vom DB2 Analytics Accelerator Mathematische Optimierung Gartner 2014 CIO Agenda: Analytics auf Platz 1 der Technologie-Themen Source: Gartner “Taming the Digital Dragon: The 2014 CIO Agenda”, Dave Aron | Poh-Ling Lee – January 2014 3 Neue Beziehung zum Kunden Vormals: “Ich habe ein Produkt – ich Heute: “Ich habe einen Kunden – was suche dafür einen Kunden” braucht er am meisten?” Interaction Interaction Branding Accounts Branding Accounts Commerce Commerce 4 • “By eliminating analytic latency and data synchronization issues, hybrid transaction/analytical processing will enable IT leaders to simplify their information management infrastructure” • “This architecture will drive the most innovation in real-time analytics over the next 10 years via greater situation awareness and improved business agility” Gartner Research Note G00259033: Gartner 01-2014 Hybrid Transaction Analytical Processing Will Foster Opportunities The Analytics Landscape Advanced Analytics Competitive Advantage Prescriptive (Optimization) Predictive (Analytics) Descriptive (Reporting) Complexity Analytics bedeutet Gewinn oder Verlust 80% der Vermarkter senden dasselbe Material an alle Kunden 6% +7,6% der Firmen sind jährliche Zunahme „höchst des Customer zufrieden“ mit der Lifetime Value für Bereitstellung von Firmen mit Informationen für Engagement ihre Arbeit Analytics 226 Mio $ geschätzer Verlust durch Betrug im Gesundheitswesen 16% entgangenes Steueraufkommen aufgrund von Noncompliance Abhilfe: Nutzung von IT als Geschäftsstrategie 7 100 Mio $ Höhe des typischen Bußgelds im Fall eines Regulationsverstoßes einer Bank Was erschwert die Umsetzung? Komplexität Latenz Datenduplikation Kosten 8 Agenda Warum Analytics? Hybrider Ansatz: Transaktionen + Analytics Unterstützung auf z Systems Neues vom DB2 Analytics Accelerator Mathematische Optimierung Die Zukunft: Hybride Transaktions- und Analytics-Verarbeitung • abgeschlossener Kauf • verbrauchte Ressource • bezahlte Rechnung • eingereichter Erstattungsantrag • aktualisierte Information • beendeter Anruf im Call Center 10 Analytics als Teil des Geschäftsablaufs • Was ist passiert? • Wie viele, wie oft, wo? • Welche Aktionen sind nötig? • Was passiert, wenn? • Was führt zum besten Ergebnis? Für den Zusammenschluss von analytischen und transaktionalen Daten… Collect Transaction Data Transactional processing z System server z Systems Predictive analytics Historical Data Predictive Data Analyze z System server Historical view z System server Report …ist IBM z Systems die ideale Plattform 11 sicher – skalierbar – hochverfügbar – performant Operations and analytics coexistence: benchmark configuration OLTP transactions Operational analytics High concurrency Advanced analytics Standard reports OLAP DB2 native processing Coupling facility DB2 data sharing z/OS LPAR group z/OS LPAR DB2 member 1 DB2 member 2 Complex queries Historical queries Two main use cases: Operational Priority Keeping operational throughput constant, add analytics load to the system. Data used for analysis can be slightly 12 out of sync with operations Data Priority Data used for operations and analytics must be in complete synchronization. Slight degradation of operational throughput is acceptable IBM DB2 Analytics Accelerator Real time data ingestion Operations and analytics coexistence: results First use case: periodic data synchronization – end-of-business day data access Thousands of complex, analytical queries now integrated with operational workload Second use case: (near-) real time data access LPAR 2 LPAR 1 Accelerator 13 Baseline w/Analytics Operational throughput maintained with no additional mainframe capacity Baseline w/Analytics Data kept in sync real-time with minimal degradation in transaction ITR (3%) Agenda Warum Analytics? Hybrider Ansatz: Transaktionen + Analytics Unterstützung auf z Systems Neues vom DB2 Analytics Accelerator Mathematische Optimierung Datenplattformen für Analytics auf z Systems DB2 mit Analytics Accelerator – Gestern angekündigt: Version 5.1 Apache Hadoop – IBM Open Platform, IBM Big Insights for Apache Hadoop – Hadoop läuft auf dem Mainframe – Hadoop läuft woanders und holt Daten vom Mainframe Apache Spark DB2 Analytics Accelerator Version 5.1 The turbocharger for z Systems analytics A blending of PureData Systems for Analytics (powered by Netezza) and z Systems technology that dramatically speeds up complex business analysis – transparently to users 1 What is Hadoop? Hadoop is a new Data Management Framework born out of the Need to manage Internet Scale Data Wins Terabyte sort benchmark Publishes MapReduce, GFS Paper early research Apache Open Source MapReduce & HDFS projects created Runs 4,000 node Hadoop Cluster open source dev momentum Launches SQL Support for Hadoop Releases CHD3 initial success stories InfoSphere BigInsights launched Commercialization IBM InfoSphere BigInsights for Linux on System z Secure perimeter z Systems server 18 IBM InfoSphere System z Connector for Hadoop System z Mainframe Linux for System z z/OS InfoSphere BigInsights DB2 VSAM S M F MapReduce, Hbase, Hive System z Connector For Hadoop HDFS IMS Logs z/VM System z CP(s) Connector For Hadoop 1 IFL IFL … IFL Hadoop on your platform of choice IBM System z for security Power Systems Intel Servers Point and click or batch selfservice data access Lower cost processing & storage Now there are two z Systems options for Analytics using Hadoop Request IBM InfoSphere BigInsights for Linux on z Systems + z Systems Connector for Hadoop Request Integrate On-platform analysis of non-relational data Linux (IFLs) 21 IBM InfoSphere BigInsights Integrate Integrate insights from big data sources to augment mainframe analysis z/OS Intel-based IBM Power Systems What is Apache Spark? • Addressing limitations of the Hadoop MapReduce programming model – No iterative programming, latency issues, ... • Using a fault-tolerant abstraction for in-memory cluster computing – Resilient Distributed Datasets (RDDs) • Can be deployed on different cluster managers – YARN, MESOS, standalone • Supports a number of languages – Java, Scala, Python, SQL, R • Comes with a variety of specialized libraries – SQL, ML, Streaming, Graph • Enables additional use cases, user roles, and tasks – E.g. data scientist 22 What is Apache Spark? Languages Java / Python / Scala / R Spark SQL Spark MLlib Spark GraphX Spark Streaming Relational Operators Machine Learning Graph Processing Real-Time Streaming Spark Core Spark Core General Execution Engine YARN MESOS HDFS / Cassandra / Hbase / Parquet / ... Spark Libraries Standalone Cluster Manager Data Abstraction The Analytics Landscape Advanced Analytics Competitive Advantage Prescriptive (Optimization) Predictive (Analytics) QMF Descriptive (Reporting) Complexity Agenda Warum Analytics? Hybrider Ansatz: Transaktionen + Analytics Unterstützung auf z Systems Neues vom DB2 Analytics Accelerator Mathematische Optimierung DB2 Analytics Accelerator – Four Usage Scenarios Understand your workload and data: On average, 70% of the data that feeds data warehousing and business analytics solutions originates on the System z platform (financial information, customer lists, personal records, manufacturing…) Where transaction source data is being analyzed today Use Case Benefits 1 If the data is analyzed on the mainframe Rapid Acceleration of Business Critical Queries Performance improvements and cost reduction while retaining System z security and reliability 2 If the data is offloaded to a distributed data warehouse or data mart Reduce IT Sprawl for analytics Simplify and consolidate complex infrastructures, low latency, reliability, security and TCO If the data is not being analyzed yet Derive business insight from z/OS transaction systems 3 4 If the analysis is based on a lot of historical data 26 Improve access to historical data and lower storage costs One integrated, hybrid platform, optimized to run mixed workload. Simplicity and time to value Performance improvements and cost reduction Introducing Accelerator-only Tables in DB2 for z/OS Creation (DDL) and access remains through DB2 for z/OS in all cases Non-accelerated DB2 table • Data in DB2 only Accelerator table • Data in DB2 and the Accelerator Archive table / partition • Empty read-only partition in DB2 • Partition data is in accelerator only Accelerator-Only table (AOT) • “Proxy table“ in DB2 • Data is in accelerator only 27 Table 1 Table 2 Table 2 Table 3 Table 3 Table 4 Table 4 Multi-Step Reporting Applications with DB2 for z/OS Before Accelerator-only tables: source data might reside on the accelerator already Reporting Application Multi-Step Report 1 Credit Card Transaction History Credit Card Transaction History Customer Summary Mart Customer Summary Mart 2 n Temporary results 2 Temporary results n Temporary results Reports and Dashboards 1 Multi-Step Reporting Applications with DB2 for z/OS With Accelerator-only tables: temporary objects and processing on the Accelerator Reporting Application Multi-Step Report 1 2 n Credit Card Transaction History Credit Card Transaction History Customer Summary Mart Customer Summary Mart 1 Temporary results 2 Temporary results n Temporary results Reports and Dashboards In-Database Transformation • Improve existing transformation logic in DB2 for z/OS and the Accelerator – Automate ETL-to-ELT transformation with Data Stage Balanced Optimization – Efficient and fast ELT processing with Accelerator-only tables • Data Mart Consolidation – Deploy existing or new data marts with DB2 for z/OS – Consolidation benefits: simplification, lower latency, security, … 31 Traditional Approach: ETL on a different Platform Mainframe Distributed DBMS Customer Transactions Customer Transactions Customer data Transaction Processing Systems (OLTP) Copy Table Data (FTP) Customer data ETL logic Customer Transaction Summary and History Customer Summary Mart Analytics Disadvantages: process driven movement of large amounts of data aged data for analytics/reporting depending on performance of data movement and transformation process Unix Server Using Accelerator-only Tables and ELT logic in the Accelerator DB2 z/OS with Accelerator Customer Transactions Customer Data Transaction Processing Systems (OLTP) Customer Transactions Customer Data ELT logic Customer Transaction Summary and History AOTs Customer Summary Mart AOTs Analytics Advantages: Data for transactional and analytical processing Simpler to manage Better performance and reduced latency To get a backup copy of an AoT, they could be loaded into a DB2 z/OS regular or accelerated table again Ad-Hoc Analysis • Data Scientist Work Area – Data Scientist are creating temporary database objects for ad-hoc analysis – Access control through personal database in DB2 for z/OS – Accelerator-only tables to process and store filtered and transformed results of source transactional data 34 Data Scientist Work Area Customer Transactions Customer Data Customer Transactions Customer Data Transaction Processing Systems (OLTP) Work-Database John Work Area 1 Work Area 2 Data Scientist (John) Work-Database Jane Work Area 1 Work Area 2 Data Scientist (Jane) Data for transactional and analytical processing Integrate more data sources for analytics • Integrate with data not yet stored on DB2 for z/OS – DB2 Analytics Accelerator Loader for z/OS to add data from various sources directly into accelerator-only tables • IMS • Other DBMS • Files – Join accelerator-only table with other accelerated table data in DB2 for z/OS 36 Integrate more data sources for analytics Customer Transactions Customer Data Customer Transactions Customer Data Transaction Processing Systems (OLTP) DB2 Analytics Accelerator Loader File A Related Data from other Sources External Data Combined Result Data for transactional and analytical processing Analytics Agenda Warum Analytics? Hybrider Ansatz: Transaktionen + Analytics Unterstützung auf z Systems Neues vom DB2 Analytics Accelerator Mathematische Optimierung Extreme ROI €20 mil $160 mil €22 mil $226 mil Amount a major transportation company reduced operating costs annually through better allocation of rolling stock. Amount a central securities depository saved financial institutions in one year by faster clearing of securities transactions. Amount a power system operator reduced annual costs to consumers through better dispatch of generators. Amount a major hotel chain increased annual revenue by offering the right product to the right customer at the right price. The Science of Better Decisions Aircraft and crew allocation What to build, where and when? Risk vs. potential reward Optimization helps businesses to create measurable results: • create the best possible plans • explore alternatives and understand trade-off • respond to changes in business operations Inventory cost vs. customer satisfaction Cost vs.carbon emission “Optimization: the process of making something as good or effective as possible” (Oxford Advanced Learners’ Dictionary) Operations Research Optimization (OR Optimization) • An abstract model with variables and constraints on those variables • Setting all variables to some values – solution/plan • Evaluating the solution by a goal • Search algorithm (sometimes called engine or solver) Steps of optimization • Create the abstract models, e.g. Java/C++/Python API, OPL (optimization programming language) • Solve that model with a solver/engine • CPLEX – math programming for linear/quadratic models with float/integer variables • CPO - constraint programming and constraint-based scheduling with integer variables Examples Manufacturing • • • • • • Machines producing goods (capacity constraint) prices and capacity for raw material decide how many goods I should make on which machines Optimize on cost? Optimize on time? Minimize the missed deadlines of the finished products? Goods delivery • • A fleet of trucks needs to deliver goods to given destinations every morning Find the shortest route (km) my trucks travel visiting my customers during the day. Everything has to be measurable, otherwise there is no math-based optimization A feel for the problem size and performance • CPLEX can solve any size of problems (measured by the # of variables and # of constraints) provided the hardware matches the memory • 2 Billion variables type of problems exist (energy) • CPO can solve over 1M job scheduling problems (aircraft manufacturing) • Performance varies depending on the problem type, market, needs • Everything between realtime and several weeks Payments & Settlements – CPLEX/zOS Challenge • Achieve a higher volume of trade settlements at a lower cost to increase liquidity and capital flow in the Eurozone. • With high trading volumes anticipated, the bank needed to find the optimal set of nightly settlement trades within their short time horizon. Solution • The bank turned to IBM to help find a solution combining core optimization technology and institutional business expertise to come up with a superior solution. Benefits/ROI • Settling more trades at lower cost to increase liquidity and capital flow. • Using CPLEX will allow the bank to respond more quickly to new constraints as legislation and customer behavior changes. • The optimized settlement system should free up hundreds of millions of euro worth of collateral used to back up trades. Customer Profile Major central bank in EU, charged with implementing the trade settlement modules for T2S, working with three other national central banks on behalf of the European Central Bank. Batch jobs scheduling Challenge • Reducing batch nights to have more online transactions • Many mainframe jobs, too many to manually schedule • Identify bottlenecks in order to invest efficiently Solution • Use optimization on top of TWS or other mainframe schedulers • Consider mainframe power as a key scarce resource and improve utilization rate • Do more with what you have Benefits/ROI • Reduce cost • Increase throughput • Increase robustness Customer Profile • z Systems client who has many batch jobs that need to run during a fixed time window Resource Optimization on z System Goals: • • • • Minimize peak usage Obtain proof if new machine is needed Minimize the reallocation of partitions Avoid peak periods before 17:00 (foreign fund transfer has to finish by 17:00 each day otherwise fine is applied) Danke! Martin Schneider Manager, Analytics Center of Excellence IBM Research & Development Schönaicher Str. 220 71032 Boeblingen Tel. 0170 / 22 100 14 mdschnei@de.ibm.com