Slides
Transcription
Slides
HyPer: one DBMS for all Tobias Mühlbauer, Florian Funke, Viktor Leis, Henrik Mühe, Wolf Rödiger, Alfons Kemper, Thomas Neumann ! Technische Universität München ! New England Database Summit 2014 http://www.hyper-db.com/ 1 One DBMS for all? OLTP and OLAP Traditionally, DBMSs are either optimized for OLTP or OLAP ! Isolation: Snapshotting OLTP high rate of mostly tiny transactions • high data access locality • ! OLAP OLTP OLAP few, but long-running transactions • scans large parts of the database • must see a consistent database state during execution • ! Conflict of interest: traditional solutions like 2PL would block OLTP However: Main memory DBMSs enable new options! 2 One DBMS for all: the Brawny Few and the Wimpy Crowd One DBMS for all? Wimpy and brawny Tobias Mühlbauer Angelika Reiser Wolf Rödiger Alfons Kemper Robert Seilbeck Thomas Neumann Technische Universität München High-performance DBMSs are optimized for brawny servers {muehlbau, roediger, seilbeck, reiser, kemper, neumann}@in.tum.de Brawny servers ABSTRACT predominantly x86-64 arch multiple sockets, many cores Shipments •of smartphones and tablets with wimpy CPUs are outpacing brawny PC and server shipments by an everincreasing • margin. While high performance database systems have traditionally been optimized for brawny systems, wimpy systems have received only little attention; leading to ! performance and energy inefficiency on such systems. poor This demonstration presents HyPer, a high-performance hybrid OLTP&OLAP main memory database system that we optimized for both, brawny and wimpy systems. The • efficient compilation of transactions and queries into efficient machine code allows for high performance, independent of the•target platform. HyPer has a memory footprint of just a few megabytes, even though it supports the SQL92 standard, • a PL/SQL-like scripting language, and ACIDcompliant transactions. It is the goal of this demonstration to showcase the same HyPer codebase running on (a) ! a wimpy ARM-based smartphone system and (b) a brawny x86-64-based server system. In particular, we run the TPCC, TPC-H, and a combined CH-benCHmark and report performance and energy metrics. The demonstration further allows the interactive execution of arbitrary SQL queries and visualization of optimized query plans. Wimpy devices predominantly ARM arch wide-spread use of SQLite energy efficiency very important processor shipments in units ! predominantly ARMv7 Q2 2012 Q2 2013 200 M 150 M predominantly x86-64 100 M 50 M 0M Smartphone Tablet PC wimpy brawny IHS. Processor Market Set for Strong Growth in 2013, Courtesy of Smartphones and Tablets. Figure 1: Shipments of wimpy processors are outpacing shipments of brawny processors [4] While the number of devices with wimpy processors is ever-increasing, these devices receive only little attention from the database community. It is true that database vendors have developed small-footprint database systems, e.g., IBM DB2 Everyplace, Oracle Lite and BerkeleyDB, SAP Sybase SQL Anywhere, and Microsoft SQL Server CE. However, these systems either reached end-of-life, are nonrelational data stores, or are intended for synchronization with a remote backend server only. In fact, SQLite has Question: How to enable high-performance OLTP/OLAP on different architectures (ARM, x86-64, ...)? 3 Server HyPer: one DBMS for all ! Simultaneous (ACID) OLTP and (SQL-92+) OLAP: • Efficient snapshotting (ICDE 2011) Platform-independent high-performance on brawny and wimpy nodes: • Data-centric code generation (VLDB 2011) Recent research: • ARTful indexing: the adaptive radix tree (ICDE 2013) • HTM for concurrent transaction processing (ICDE 2014) • Compaction: Hot/Cold Clustering of transactional data (VLDB 2012) • Instant Loading: bulk loading at wire speed (VLDB 2014) • ScyPer: replication and scalable OLAP throughput (DanaC 2013) • Locality-aware parallel DBMS operators (ICDE 2014) 4 Efficient Snapshotting HyPer isolates long-running transactions (e.g., OLAP) using virtual memory (VM) snapshots. • OLTP “owns” database for OLAP only VM page table is initially copied • MMU/OS keep track of changes • snapshots remain consistent (copy on write) • very little overhead • Extremely fast execution model. Supports OLTP and OLAP simultaneously. 5 Data-centric ution Code Generation Main memory is so fast that CPU becomes the bottleneck • classical iterator model fine for disks, fragments in tight loops but not so for main memory source pipeline breaker into CPU registers • iterator model: many branches, ng operations bad code and data locality e next pipeline breaker ! HyPer’s data-centric code generation esses and produces • touches data compact as rarely ascode possible e generation • prefersusing tight LLVM work loops a=b x=7 R1 z=c z;count(*) y=3 verhead 1. load data into CPU registers R2 R3 2. perform all pipeline operations ce of hand-written code 3. materialize into next pipeline breaker • efficient platform-independent machine code generation using LLVM 6 Resent Research Tentative Execution • HyPer(h perform Mühe, H; Kemper, A.; Neumann, T., “Executing Long-Running Transactions in Synchronization-Free Main Memory Database Systems”, CIDR, 2013 • Long(tx( the(syst • Tenta&v ! 2PL(–(al ! tx(throu ! of(long(t ! efficient Snapshot Main Long TX TX Query ! HyPer offers outstanding performance for tiny transactions: > 100,000 TPC-C transactions per second per hardware thread • Paper(@ • Long-running transactions that write to the database cripple performance • • Tentative execution: process long-running transactions on snapshot and “merge” changes into main process 8 ARTful Indexing Leis, V.; Kemper, A.; Neumann, T., “The adaptive radix tree: ARTful indexing for main-memory databases”, ICDE, 2013 Adaptive Radix Tree (Trie) indexing • efficient general-purpose read/ write/update-able index structure • faster than highly-tuned, read-only search trees • optimized for modern hardware • 0 1 child pointer 3 2 0 1 3 2 Node4 13 129130 child index 0 1 2 child pointer 3 255 0 1 0 1 2 3 8 9 47 … key Node16 2 … Node48 highly space-efficient by adaptive choice of compact data structures performance comparable to HT • BUT: data is sorted; enables, e.g., range scans and prefix lookups • key child pointer 15 0 1 15 2 … … 255 child pointer 0 Node256 9 TID 1 2 3 4 5 TID TID TID TID 6 … 255 TID child pointer 4 5 TID TID TID 6 255 0 HTM for Concurrency Control … TID ART ART GPT GPT (dense) (sparse) (dense) (sparse) RB CSB kary FAST HT Leis, V.; Kemper, A.; Neumann, T., “Exploiting Hardware Transactional Memory in Main-Memory Databases”, ICDE, 2014 database transaction 400,000 nf lic t request a new timestamp, record safe timestamp HT M co HTM transaction conflict detection: read/write sets in hardware elided lock: latch single tuple access verify/update tuple timestamps M co nf lic t ... HTM transaction conflict detection: read/write sets in hardware elided lock: latch transactions per second conflict detection: read/write sets via timestamps elided lock: serial execution HT TID 3 timestamp conflict 2 partitioned HTM serial 2PL 300,000 200,000 100,000 single tuple access verify/update tuple timestamps 0 ... 1 release timestamp, update safe timestamp 2 3 4 multiprogramming level (threads) Concurrent transaction processing in main memory DBMSs often relies on user-provided data partitioning (1 thread per partition) • BUT: what if no partitioning is provided/can be found? • • Hardware transactional memory (HTM) allows for efficient light-weight optimistic concurrency control without explicit partitions 10 Compaction: Hot/Cold Clustering Hot/Cold Clustering & Compaction stering & Compaction Hot/Cold Clustering & Compact Funke, F.; Kemper, A.; Neumann, T., “Compacting Transactional Data in Hybrid OLTP&OLAP Databases”, VLDB 2012 Main memory is a scarce resource! Reality ! Hot and Cold Purge hot items ! ! ! Ideal clustering ! Reality ! Hot/Cold Clustering ! ! “Temperature” detection: • Hardware-assisted (MMU) • Almost no overhead! 11 Temperature Detection Immutable Compressed Optimized for OLAP Instant Loading Mühlbauer, T.; Rödiger, W.; Seilbeck, R.; Reiser, A.; Kemper, A.; Neumann, T., “Instant Loading for Main Memory Databases”, PVLDB, 2013, presented at VLDB 2014 CSV high-speed NAS/DFS or SSD/RAID ... loading/unloading at wire speed main memory load window of interest 1 unload ... CSV or binary queries (a) CSV updates Load CSV 2 Work: OLTP and OLAP in full-featured database 3 Unload Figure 2: Instant Loading for data staging processload-work-unload cycles across data.DBMSs ofing: structured text data is slow inCSV current Bulk loading (c) physica • Current loading Contributions. 3: Contine does not saturate wire true speed of data source data sink To achieve instantaneous loading, and Figure • • • CSV (a), relation we optimize CSV bulk loading for modern super-scalar multiInstant Loadingcore allows scalable loading at all wire speed by efficient taskCPUs by task- andbulk data-parallelizing phases of loadto a main memory ing. In particular, we make the following contributions: In and data-parallelization interest can even be a first step we propose generic high-performance parsing, as selection predica and input validation methods based on SSE 10x faster thandeserialization, Vectorwise/MonetDB, orders of magnitude faster than cess. Further, data c 4.2 SIMD instructions. While these already improve loadtraditional DBMSs (with same conversions/consistency checks) full set of features o ing time significantly, other phases of loading become the including efficient su bottleneck. We thus further12show how copying deserialized (OLTP)—can be u tuples into the storage backend can be sped up and how in- versität München Germany @in.tum.de Technische Universität München Munich, Germany reiser@in.tum.de Thomas Neumann ScyPer Replication and Scalable OLAP Throughput Technische Universität München Munich, GermanyW.; Reiser, A.; Kemper, A.; Neumann, T., “ScyPer: Elastic OLAP Throughput on Mühlbauer, T.; Rödiger, neumann@in.tum.de Transactional Data”, DanaC, 2013 OLAP Load Balancing 800 Snapshotting OLTP ... Primary HyPer Redo Log Multicasting On-Demand Secondary HyPer Nodes execution time [ms] OLAP writers readers aborts 23% faster 600 56% faster 400 200 0 normal execution logical log replaying physical log replaying Figure 3: Time savings when replaying 100k TPC-C Figure 1: Elastic provisioning of secondary HyPer transactions using physical redo logging • Secondary for high availability–why notlogical use and them for OLAP? nodes for scalable nodes OLAP needed throughput redo log has to be persisted and written to durable storage • Primary HyPer node processes transactions and multicasts the redo log to system [3] process more than 100,000 TPC-C transactions so that it can be replayed. The undo log however can be (TX) per second in a single thread, which is enough for hudiscarded when a TX commits. man generated workloads even during peak hours. A ball-ScyPer uses multicasting to propagate the redo log of committed TX from the primary to secondary nodes to keep park • estimate of Amazon’s yearly transactional data volume them up-to-date with the most recent transactional state. further reveals that retaining all data in-memory is feasible Multicasting allows to add secondaries on-demand without • large enterprises: with a revenue of $60 billion, even for increasing the network bandwidth usage. an average item price of $15, and about 54 B per orderline,UDP vs. PGM multicast. Standard UDP multicasting • less than 1/4 TB for the orderlines—the dominant we derive is not a feasible solution for ScyPer as it may drop messages, repository in a sales application. We thus conjecture 1that for deliver them multiple times, or transfer them out of order. 3 Instead, ScyPer uses OpenPGM for multicasting, an open the lion’s share of OLTP workloads, a single server suffices. secondaries (via Ethernet or Infiniband) We advocate for physical log multicasting after commit non-determinism in transaction logic and by unforeseeable faults no need to re-execute expensive transaction logic Data Shuffling Distributed Query Processing with HyPer Optimal Partition Assignment Rödiger, W.; Mühlbauer, T.; Unterbrunner, P.; Reiser, A.; Kemper, A.; Neumann, T., “Locality-Sensitive Operators for Parallel Main-Memory Database Clusters”, ICDE, 2014 Node 0 Node 0 ! # tuples # tuples # tuples 31 29 15 2 S R S R 7 Node24 1 7 24 19S 8 R 19 8 17 24 24 7 17 24 18 8 88 19 18 24 19 17 24 1924 19 4 48 18 19 27 262619 24 27 12 18184 19 12 30 272726 27 30 12 24 24 5 518 30 19 19 232327 24 16 16 2 25 19 14 14 272723 16 26 26 20202 14 18 18 222227 26 20 24 24 1717 18 24 22 17 Node 2 Node 2 S S 26Node 26 3S 3 262323 31616 239 9 166 6 91515 62424 156 6 241919 67 7 192020 72323 205 5 234 4 5 2020 4 20 R R 220 20 11R 11 2320 23 6 611 2323 23 4 46 1818 23 5 54 1414 18 22225 6 614 22 1515 23236 4 415 3 323 4 66 3 6 Node 0 Node 0 P0 P0 P1 P1 P2 P2 P3 P3 P4 P4 P5 P6 P7 P5 P6 P7 77P0 117 11P1 331 72 P 7 P 773 14 P1 4 P44 P161 P4 5 7 Node Node221 Node 27 2 221 11 2 3 2 11 Node 2 2 11 2 Node Node110 Node 2 10 10 7 1 4 110 411 2 33 10 3 10 2 3 33 399 10 22 100 3 3 9 2 0 network phase phase duration duration network phase duration Node Node00 ➡➡ ➡ ➡ ➡➡ ➡ ➡ ➡ S R S R 21 Node 100 21 10 13 S 10 R 13 10 921 1 9 1 10 1131 13 1310 13913 8 8 1 15115 1 1 13 141314 22 22 8 0150 29 29 1 221422 26 2622 21021 3 3 29 282228 11 1126 102110 7 7 3 102810 16 1611 151015 3 3 7 10 31 31 15 1516 15 3 29 29 2 2 Node 1 Node 1 Node 0 Distributed operators like the join operator need data shuffling Node11 Node Node 1 Node22 Node Node 2 P10 P 5 P6PP7 P0 PP10PP21PP32PP43PP54 PP65PP76 PP7 0 P P12 P P23 P P34 PP45 PP56 PP67P7P0PP01PP1 2PP2 3PP3 4P4P5PP 6 7 P0 P1 P2 P3 P4 P5 P6 P7 P0 P1 P2 P3 P4 P5 P6 P7 P0 P1 P2 P3 P4 P5 P6 P7 00 22 0 2 4 4 4 66 6 88 8 10 10 10 12 12 12 Locality-aware data shuffling: • can exploit already small degrees of data locality • does not degrade when data exhibits no co-location • optimally assigns partitions • avoids cross traffic through communication scheduling 14 http://www.hyper-db.com/ 15 TPC-H query 8 http://www.hyper-db.com/interface.html 16 TPC-H query 8 http://www.hyper-db.com/interface.html 17 TPC-H query 8 WebInterface on TPC-H scale factor 1, non-parallel query execution Note: compilation time independent of scale factor Soon: WebInterface on brawny machine, TPC-H scale factor 100, parallel query execution and instant loading interface for your own files! http://www.hyper-db.com/interface.html 18 Conclusion http://www.hyper-db.com/ ! Inspired by “OLTP through the looking glass” and “The End of an Architectural Era”, the HyPer project has been started. ! HyPer is one DBMS that … • offers highest performance on both, brawny x86-64 platforms as well as wimpy ARM platforms • performance comparable to/outperforming VoltDB in TPC-C • performance comparable to/outperforming Vectorwise in TPC-H • enables OLAP on the most recent OLTP state 19 References 1) Kemper, A.; Neumann, T., “HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots”, ICDE, 2011 2) Neumann, T., “Efficiently compiling efficient query plans for modern hardware”, VLDB, 2011 3) Mühe, H; Kemper, A.; Neumann, T., “Executing Long-Running Transactions in SynchronizationFree Main Memory Database Systems”, CIDR, 2013 4) Leis, V.; Kemper, A.; Neumann, T., “The adaptive radix tree: ARTful indexing for main-memory databases”, ICDE, 2013 5) Leis, V.; Kemper, A.; Neumann, T., “Exploiting Hardware Transactional Memory in Main-Memory Databases”, ICDE, 2014 6) Funke, F.; Kemper, A.; Neumann, T., “Compacting Transactional Data in Hybrid OLTP&OLAP Databases”, VLDB 2012 7) Mühlbauer, T.; Rödiger, W.; Seilbeck, R.; Reiser, A.; Kemper, A.; Neumann, T., “Instant Loading for Main Memory Databases”, PVLDB 2013, VLDB 2014 8) Mühlbauer, T.; Rödiger, W.; Reiser, A.; Kemper, A.; Neumann, T., “ScyPer: Elastic OLAP Throughput on Transactional Data”, DanaC, 2013 9) Rödiger, W.; Mühlbauer, T.; Unterbrunner, P.; Reiser, A.; Kemper, A.; Neumann, T., “Locality-Sensitive Operators for Parallel Main-Memory Database Clusters”, ICDE, 2014 20