Teradata HU
Transcription
Teradata HU
Teradata Architecture, Technology, Scalabilty, Performance and Vision for Active Enterprise Data Warehousing Dr. Barbara Schulmeister Teradata – a Division of NCR Barbara.Schulmeister@ncr.com 28. 6. 2005 Agenda • • • • • • • • • • • • History Definitions Hardware Architecture Fault Tolerance and High Availability Coexistence Operational System Tools and Utilities Data Distribution SQL Parser Active Data Warehouse Scalability Teradata Timeline Overview Born to be parallel! DBC Model 1: First MPP System! 1979... Teradata Corp. Founded 1984 “Product of the Year” – Forbes 1985 First 100GB System! First Beta system shipped Christmas to Wells Fargo Bank 1986 “Fastest Growing Small Company” – INC Magazine DBC Model 3 1987 1988 First 500GB System! Initial public offering on Wall Street 1989 First 700GB System! 1990 First Terabyte System! “Fastest Growing Electronic Company” – Electronic Business 1991 1992 DBC Model 4 Joint Venture with NCR for next generation systems “Leader in Commercial Parallel Processing” – Gartner Group 1993 1994... 3+ TB System! more Teradata Timeline (II) “#1 in MPP” – IDC Survey in Computerworld ...1995 Only Vendor to Publish Multi-user TPC-Ds! DB Expo Realware Award w/ Union Pacific: “Data Warehouse Innovations” First Vendor to Publish 1TB TPC-D Benchmark! 1996 Teradata Version 2 on NCR 3555 SMP Over 500 Production Data Warehouses Worldwide! Teradata V2 on WorldMark 4300 1997 Teradata V2 on WorldMark 5100 SMP & MPP “...only NCR’s Teradata V2 RDBMS has proven it can scale…” – Gartner Group Demonstrated World’s Largest Data Warehouse Database at 11TB! DWI VLDB Best Practice Award w/ ATT BMD: “Data Warehouse and the Web” ... 100GB TPC-D Benchmark Leader! 24TB Data Warehouse in Production! more Teradata Timeline (III) Teradata V2 ported to Microsoft Windows NT ...1998 Industry leading TPC-D benchmark for all volumes Industry leading TPCH at 1TB and 3TB Largest Data Warehouse system (176 node, 130 TB disk) 1999 Database Programming and Design Award 2000 Teradata attains 99.98% availability 2001 IT Award of Excellence • TDWI Solution Provider Best Practices in Data Warehousing • TDWI Leadership in Data Warehousing Award • DM Review World-Class Solution Award for business Intelligence • IT Times Award • DM Review 100 Award • DM Review Readership Award • Intelligent Enterprise Real Ware Award 2002 2003 V2R5 Teradata 64 bit Teradata 2004 V2R6 Teradata 2005 Linux the commitment continues… Alternative Approaches to Enterprise Analytics Virtual, Distributed, Federated Data Mart Centric Sources Sources Middleware Marts Users Users Hub-andSpoke Data Warehouse Enterprise Data Warehouse Sources Sources DW DW Marts Users Users Independent Data Marts Leave Data Where it Lies Hub-and-Spoke Data Warehouse Centralized Integrated Data With Direct Access P r o s • Easy to Build Organizationally • Limit Scope • Easy to Build Technically • No need for ETL • No need for separate platform • Allows easier customization of user interfaces & reports • Single Enterprise “Business” View • Data reusability • Consistency • Low Cost of Ownership C o n s • Business Enterprise view unavailable • Redundant data costs • High ETL costs • High App costs • High DBA and operational costs • Only viable for low volume access • Meta data issues • Network bandwidth and join complexity issues • Workload typically placed on workstation • Business Enterprise view challenging • Redundant data costs • High DBA and operational costs • Data latency • Requires corporate leadership and vision A Spectrum of Data Warehouse Architectures Data Mart Centric Sources Sources Middleware Marts Users Hub-andSpoke Data Warehouse Virtual, Distributed, Federated Users Enterprise Data Warehouse Sources Sources DW DW Marts Users Users The goal: Any question, on any data, at any time. Teradata’s Advocated Data Warehouse Approach for 20 years, Since 1984! Diffentiating OLTP - DSS Most time consuming steps: OLTP l l l l DSS Full scan of big tables Complexe joins Aggregation Sorting Frequency of steps OLTP or DSS NCR Server • Provide customers with growth opportunities and investment protection > Coexistence is enabled across five generations – NCR 5400E & 5400H Servers – NCR 4980 & 5380 Servers – NCR 4950 & 5350 Servers – NCR 4900 & 5300 Servers – NCR 485X & 525X Servers BYNET V2 / V3 485X & 525X 4900 & 5300 4950 & 5350 4980 & 5380 5400E & 5400H NCR Server Generations NCR 5400 Server SMP • 5400E Ethernet Switches > 1 - 4 nodes > BYNET V2 > ESCON & FICON for 3 and 4 node configurations > Field Upgradeable to 5400H Up to 4 nodes within each cabinet 1 3 nd Node 2 2nd Node st Node 1 1st Node 4th Node 1 3 1 3 1 3rd Node 3 1 Server Management Three UPS Modules 3GSM 3 Internal BYNET switches NCR 5400 Server MPP Ethernet Switches • Continued rapid adoption of latest Intel® Technology > Dual Intel Pentium Xeon EM 64T 3.6 GHz processors with Hyper-Threading (32-bit and 64-bit capability) > 800 MHz front side bus • Industry Standard Form Factor > Up to 10 nodes per cabinet > Integrated BYNET V3 (provides the capability to physical separate systems between 300-600 meters) > Integrated Server Management > N+1 UPS > Dual AC • Multi-Generation Coexistence > Investment protection BYNET V3 Switches 1 3 1 3 1 3 1 3 Up to 10 nodes within each cabinet 1 3 1 3 1 3 1 3 1 3 1 Server Management Five UPS Modules 3 1 3 Relative CPU Performance per Core Industry CPU Performance per Core 3000 2500 54000 2000 1500 Itanium 2 1.6 Ghz 130nm Xeon 3.0Ghz 1M 130nm 1000 Xeon 2M L2 3.6 Ghz 90nm Xeon 3.6 Ghz 90nm Xeon 2M L2 >3.6 Ghz 90nm Next Gen Arch. Dual Core 65 nm Dual Core 65 nm Montecito 90nm Tukwilla Common Platform 65nm Power 6 ~3Ghz 65nm Itanium 2 9M 130nm Power 5+ ~2.5Ghz 90 nm Power 5 ~1.9Ghz 130 nm Multi Core 45 nm Rock 90nm 500 Power 4+ 1.45Ghz 130 nm Xeon Ultrasparc 3 130 nm 1.6Ghz Itanium Power Sparc 0 2004 2005 2006 2007 Year Symmetric Multi Threading (Hyper Threading) Dual Core Relative CPU Performance based on multi-threading and multi-core roadmap capabilities Multicore, Multithreaded www.spec.org: benchmarks SPECint2000 and SPECint_rate2000 Gartner Product Ranking 2004 ASEM Sunfire SUN Teradata NCR pSeries IBM Proliant HP Integrity HP HP9000 PRODUCT HP Primepower FUJITSU 43 45 46 29 45 54 40 The Product category (which was called Technology in previous ASEM updates) focuses on the performance and reliability/availability aspects of each platform. In this category Teradata received a very strong 93.5% of total possible points and leads the IBM pSeries with 74.35% by 44 points or 19%. Source Gartner 2004 ASEM Report NCR Enterprise Storage 6842 • NCR Enterprise Storage 6842 Features > Two array modules per cabinet > 56, 73GB, 15K drives – greater than 8 Terabytes of spinning disk per cabinet > Dual Quad Fibre Channel Controllers per array for performance and availability > Typical configuration is 4 NCR 5400 Server nodes per 3 – 6842 arrays – 1.2 Terabytes of database space per node (RAID 1) > Supports RAID 1 and RAID 5 > Support for MP-RAS and Microsoft Windows Server 2003 environments EMC Symmetrix DMX • Enterprise Fit • Storage Standardization • Extended storage life through Redeployment EMC Model DMX 1000 M2 DMX 2000 M2 73GB – 15K RPM 73GB – 15K RPM Teradata Use MPP: supports 1 or 2 nodes per cabinet MPP: supports 2, 3, or 4 nodes per cabinet RAID Options RAID -1 Only RAID-1 Only MP-RAS and Windows MP-RAS and Windows 96 192 Disks Operating Environment Maximum Teradata disks Assumption: Compute and Storage Balance • A balanced configuration is one where the storage I/O subsystem for each compute node is configured with enough disk spindles, disk controllers, and connectivity so that the disk subsystem can satisfy the CPU demand from that node. • A supersaturated configuration also can satisfy the CPU demand from that node although the extra I/O may be underutilized. > This is useful for investment protection on certain upgrade paths. • All system configurations discussed in this presentation are based on balanced or supersaturated compute nodes. Node CPU and Storage I/O Balance Effective Node Utilization 95% Node Utilization th id dw n Ba Best Query Response Time # of Disk Drives/Storage Capacity Industry wide, disk drive capacity is increasing at a faster rate than disk drive performance Query Response Time e im eT ns po es yR er Qu I/O Bandwidth – MB/sec Optimum Node/Storage Balance and Response Time Common Upgrades Applied GROW RAW DATA VOLUME Query Response Time Performance more than adequate: Add more data to all nodes SYSTEM with CURRENT Nodes Query Response Time Increases because you didn’t add more compute power to support the additional raw data volume. Raw Data Volume Typical System Expansion LINEAR GROWTH Query Response Time Maintain Query Performance with more nodes SYSTEM with Current Nodes SYSTEM with More or Faster Nodes Scale out with Teradata by adding compute nodes, interconnect, storage arrays, and disks. aka “horizontal scalability” Query Response Time Remains Constant because you add proportionally more raw data volume as compute power. Raw Data Volume Common Upgrades Applied GROW QUERY PERFORMANCE Query Response Time Raw Data Volume adequate: Upgrade to faster CPUs SYSTEM with Current Nodes SYSTEM with More or Faster Nodes Query Response Time Decreases because you didn’t add more raw data volume to offset the increase in compute power. “Scale vertically” with Teradata by increasing compute power. Raw Data Volume Combo: Upgrade Nodes and Increase Storage Per Node Query Response Time Adjust Query Performance and Data Volume to match service level agreement SYSTEM with Current Nodes SYSTEM with More or Faster Nodes Scale to Target query performance and data volume by increasing compute power and adding storage. Raw Data Volume Scaling by Reconfiguration and Expansion Query Response Time Improve Query Performance and Adjust Data Volume to match service level agreement SYSTEM with Current Nodes SYSTEM with More Nodes Improve query performance and adjust data volume by reducing storage per node and adding more nodes. Raw Data Volume CPU uses independent direct I/O path to Disk All memory accesses are local Interconnect used only for database messages, no I/O or memory traffic BYNET Fabrics Architecture Determines Scalability CPU(s) Cache Memory Disk Storage CPU(s) Cache Memory Disk Storage CPU(s) Cache Memory Disk Storage CPU(s) Cache Memory Disk Storage CPU(s) Cache Memory Disk Storage CPU(s) Cache Memory Disk Storage CPU(s) Cache Memory Disk Storage CPU(s) Cache Memory Disk Storage Teradata Shared-Nothing MPP • Designed for Slope of 1 Linear Scaling • Optimized for very high data rates to/from disk • Excellent performance and efficiency for data warehousing Teradata MPP Architecture • Nodes > Incrementally scalable to 1024 nodes > Windows or Unix • Storage > Independent I/O > Scales per node • BYNET Interconnect Dual BYNET Interconnects SMP Node1 SMP Node2 SMP Node3 SMP Node4 CPU1 CPU1 CPU1 CPU1 CPU2 Memory CPU2 Memory CPU2 Memory CPU2 Memory > Fully scalable bandwidth • Connectivity > Fully scalable > Channel – ESCON/FICON > LAN, WAN • Server Management > One console to view the entire system Server Management Node Software Architecture Perfectly Tuned Nodes Working in Parallel for Scalability and Availability 4-Node MPP Clique Teradata Node SW Architecture (SMP) AMP1 AMP5 PE2 AMP2 AMP6 AMP3 AMP7 AMP4 AMP8 LAN Gateway Communication Interfaces Channel Gateway BYNET Access Module Processor VPROCS Parallel Database Extensions (PDE) UNIX, Windows 2000 PEs recieve the queries and figure out the query plan AMPs interact with the disk arrays and process the data PE VProc Parser Optimizer Dispatcher Session Control VAMP Relational Database Management File System / Data Management VPROCs AMP & PE VPROCs AMP & PE VPROCs AMP & PE VPROCs AMP & PE Disk Array PE1 VPROCS Parsing Engine Virtual Processors (VPROCS) The Scalable BYNET Interconnect Specifically designed for data warehousing workloads Multiple Simultaneous Point-to-Point Messaging Node Broadcast Messaging Node Node Node Node Node Node Node Node Node • Bandwidth scales linearly to 1,024 nodes • Redundant, fault tolerant network • Guaranteed message delivery The Teradata Optimizer chooses between Point-to-Point and Broadcast Messaging to select the most effective communication. Built-In Integrated Fail Over • Teradata provides built-in node failover. > Cost effective > Easy to deploy • Work migrates to the remaining nodes in the cliques. • System performance degradation up to 33%. X Traditional Configuration Large Cliques • Double the number of nodes in a clique up to 8. • Work distributed across a greater number of nodes. • Minimize system performance impacts – may not be noticeable to end-users. Node X Node Node Node Node Node Node Node 86% System Performance Continuity Fibre Channel Switches Disk Array Disk Array Disk Array Disk Array Disk Array Disk Array Hot Standby Nodes • Work re-directed to a Hot Standby Node. • No system performance impacts. • Teradata restart can be postponed to a maintenance window. X Node Disk Array Node Node Hot Standby 100% System Performance Continuity Disk Array Disk Array Large Clique + Hot Standby Node • Same performance benefits of Hot Standby node. • Reduced costs for larger system implementations. X Node Node Node Node Node Node Node Fibre Channel Switches Disk Array Disk Array Disk Array Disk Array Disk Array Disk Array Hot Standby 100% System Performance Continuity High Availability Case Hardware Power Failure UPS (redundant), Dual AC Node Failure Bynet failure Disc failure Teradata VPROC Migration (VAMP, PE) Redundant BYNET RAID-1/-5/-S in Disc Subsystem More than one Disc Failure Clique Failure Fallback-Option Fallback-Option BYNET Server Nodes DiskArray Subsystem Clique Coexistence Considerations Generation x VAMPS Performance Factor Generation x VAMPS Generation x VAMPS AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP 1x 1.5x 2.0x • VAMPs manage the same amount of data • Coexistence enables the faster nodes to be realized by running more VAMPs per node System Expansion with Teradata Coexistence • The utilization of multiple generations of hardware within a single Teradata MPP system Dual BYNET Interconnects SMP Node1 AMP AMP AMP SMP Node2 AMP AMP AMP SMP Node3 AMP AMP AMP 5380 - 9 AMPs per node SMP Node4 AMP AMP AMP SMP Node1 AMP AMP AMP SMP Node2 AMP AMP AMP SMP Node3 AMP AMP AMP SMP Node4 AMP AMP AMP 5400 - 12 AMPs per node Server Management Customer Example (72 Nodes, 4 Generations) 11/2000 1/2001 Original Footprint Expansion 1 8 Node 5250 12 Node 5250 Generation “A” 11/2001 Expansion 2 8 Node 5255 6/2002 Expansion 3 12 Node 5300 6/2003 Expansion 4 16 Node 5350 6/2004 Expansion 5 16 Node 5380 2005 Future 5400 Expansion Generation “B” Generation “C” Generation “D” Generation “E” Database and Operating System V2R5.0.3, V2R5.1.X, V2R6 on MP-RAS 3.03 V2R5.0.3, V2R5.1.X and V2R6 on MP-RAS 3.02 485x/525x 4900/5300 4950/5350 4980/5380 5400 V2R6 on WS03 (2Q 2005) V2R5.0.3, V2R5.1.X, V2R6 on W2K (2Q 2005) • Database > Teradata V2R6 > Support one Release Back V2R5.1.X (Current Exception in Place V2R5.0.3) • Unix > MP-RAS 3.03 required for Teradata Database on 5400 > MP-RAS 3.02 still supported on previous server generations • Microsoft Windows > Microsoft Windows Server 2003 recommended for new and expanding 5400 motions > Microsoft Windows 2000 supported in 2Q 2005 V2R5.0 Features + V2R5.1 Features Strategic Decision Making • Analytic extensions such as Extended Windows Functions & Multiple Aggregate Distincts • Random Stratified Sampling • Join Elimination • Extended Transitive Closure • Partial Group By • Early Group By • Derived Table Rewrite • Very Large SQL Single Version of the Truth • Extended Grouping • Inner Join Optimization • Eliminate Unnecessary Outer Joins • Hash Joins • UDFs for Complex Analysis and Unstructured Data Tactical & Event-Driven Decision Making • • • • • • Partial Covering Join Index Global Index Sparse Index Join Index Extensions ODS Workload Optimization Stored Procedures Enhancements Trusted, Integrated Environment •Index Wizard •Statistics Wizard & Collect Statistics Improvements •Query Log •Extreme Workload Management & Administration •Roles and Profiles •SQL Assistant/ Web Edition •Availability •Performance Dashboard & Reporting • Cylinder Read • Partitioned Tables (PPI) • Value List Compression • 2000 Columns, 64 Columns per Index • Identity Column • Enhancement to Identity Column • UTF16 Support • PPI Dynamic Partition Elimination • Large Objects (LOBs) • • • • • • • • Enhancements to Triggers • Extra FK-PK Joins in Join Index • UDFs for XML Processing etc. Security enhancements (Encryption) DBQL enhancements Database Object Level Use Count ROLES enhancements Priority Scheduler enhancements TDQM enhancements No Auto Restart After Disk-Array Power Failure • Cancel Rollback • Incompatible Package Warning • Disk I/O Integrity Check Data Freshness •• Continuous Continuous Update Update Performance Performance & & Manageability Manageability •• Faster Faster Join Join Index Index Update Update •• Join Join Update Update Performance Performance •• Bulk Bulk Update Update Performance Performance •• Teradata Teradata Warehouse Warehouse Builder Builder Full Full Functionality Functionality & & Platform Platform Support Support •UDFs for Data Transformation and Scoring V2R6.0 Feature List Strategic Decision Making • Remove 1MB limit on plan cache size • Increase response buffer to 1MB • Table header expansion • Improve Random AMP Sampling • Top N Row Operation • Recursive Queries Tactical & Event-Driven Decision Making • Improve Primary Index Operations\ • Improved IN-list processing • External Stored Procedures • Trigger calling a Stored Procedure • Stored Procedure Internals Enhancements • Queue tables Trusted, Integrated Environment Single View of Your Business • Teradata Dynamic Workload Management • Extensible User Authentication • Directory Integration • Global deadlock logging • Faster Rollbacks • Stored Procedure LOB support • External Table Function • Partition level BAR • Eliminate indexed row IDs (PPI) • PPI Join performance improvement • DBS Information consolidation Data Freshness • Replication Services • Array support • Priority Scheduler enhancements • Reduce restart time Teradata Tools & Utilities (1) Load/Unload Database Management Teradata Warehouse Builder FastLoad, MultiLoad & FastExport Teradata TPump Access Modules Teradata Database Metadata Teradata Utility Pak Teradata Administrator Teradata SQL Assistant Teradata SQL Assistant/Web Edition BTEQ ODBC JDBC CLI OLE DB Provider Teradata Manager Teradata Dynamic Query Manager Teradata System Emulation Tool Teradata Visual Explain Teradata Index Wizard Teradata Statistics Wizard Teradata Meta Data Services .com Mainframe Connectivity Mainframe Channel Connect TS/API, CICS, HUTCNS & IMS/DC Any Query, Any Time Technical Differentiator: Database Utilities Teradata Utilities Are Robust and Mature • Teradata utilities are fully parallel. • Teradata utilities have checkpoint restart capability. • Data loads directly from the source into the database. > > > > No No No No manual data partitioning. file splitting. intermediary file transfers. separate data conversion step. Parallel In Parallel Out Teradata Warehouse Teradata Tools & Utilities (2) • • • • • • Teradata Teradata Teradata Teradata Teradata ..... Data Profiler CRM Warehouse Miner Demand Chain Management Supply Chain Management (LDM = Logical Data Model) • • • • • • • • Financial Solution LDM Retail LDM Communication LDM Insurance/Healthcare LDM Manufacturing LDM Government LDM Media and Entertainment LDM Travel/Transportation LDM Two Basic Software Architecture Models Task Centric and Data Centric Request Request Task Shared Memory Request Parallel Optimizer Task Parallel Unit DATA Request Parallel Unit DATA Uniform and shared access to all platform resources (disk, etc) is REQUIRED Data Data Data Data Exclusive access to a subset of resources Data Centric Software: Teradata Virtual AMP Tables AMP - Balanced collection of three abstracted platform resources 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Table A P Memory Table B D Disk AMP 1 AMP 2 AMP 3 AMP 4 P P P P M 25 21 17 13 9 5 1 M Processor D M 26 22 18 14 10 6 2 D M D 27 23 19 15 11 7 3 Table C • Each virtual AMP has rows from every table • Each virtual AMP works independently on its rows • Goal: Database rows are equally distributed across multiple tables M 28 24 20 16 12 8 4 D Data distribution by Primary Index Primary Index value for a row Hashing Algorithm Destination Selection Word (DSW) – first 16 bits Row Hash (32 bits) Current configuration Primary Current configuration Fallback Reconfiguration Primary Reconfiguration Fallback Hash Map Dual BYNET Interconnects Node1 Node2 Node3 Node4 Node1 Node2 Node3 Node4 Node1 Node2 Node3 Node4 Node1 Node2 Node3 Node4 Teradata Hashing Table ORDER O rder Num ber C u s to m e r Num ber O rder D a te O rder S tatus 2 3 1 1 2 1 3 1 2 4/13 4/13 4/13 4/10 4/15 4/12 4/16 4/13 4/09 O O C O C C C C C (Hexadecimal) PK HASH MAP UPI 7325 7324 7415 7103 7225 7384 7402 7188 7202 SELECT * FROM ORDER WHERE order_number = 7225; 000 001 002 003 004 005 0 01 01 01 01 01 01 1 02 02 02 02 02 02 2 03 03 03 03 03 03 3 04 04 04 04 04 04 4 05 05 05 05 05 05 5 06 06 06 06 06 06 6 07 07 07 07 07 07 7 08 08 08 08 08 08 8 01 01 01 01 01 01 9 02 02 02 02 02 02 A 03 03 03 03 03 03 B 04 04 04 04 04 04 7225 AMP 1 Hashing Algorithm AMP 2 AMP 3 32 bit Row Hash DSW # 0000 0000 0001 1010 V V V V 0 0 1 A Remaining16 bits 1100 0111 0101 1011 Bucket Number 7225 2 4/13 O AMP x C 05 05 05 05 05 05 D 0 0 0 0 0 0 E F Primary Index Choice Criteria ACCESS Maximize one_AMP operations: choose the column most frequently used for access DISTRIBUTION Optimize parallel processing: choose a column that provides good distribution Volatility Reduce maintenance resource overhead (I/O): choose a column with stable data values Data distribution by Primary Index - 2 SQL request Parser algorithm 48 bit table ID 32 bit row hash value Index value Dual BYNET Interconnects Node1 Node2 Node3 Node4 Node1 Node2 Node3 Node4 Node1 Node2 Node3 Node4 Logical block identifier Node1 Node2 Node3 Node4 SQL Parser Overview Request Parcel Cached? Syntaxer DD DBase, AccRights TVM, TVFields Indexes Resolver Security Statistics Optimizer Costs Generator Data Parcel GNCApply AMP Steps Serial steps Parallel steps Individual and common steps (MSR) Additional: Triggers, check constraints, references, foreign keys, join indexes collected statistics or dynamic sampling Statistics Summary Collect statistics on • all non-unique indexes • UPI of any table with less than x rows per AMP (dependent on available number of AMPs) • All indexes of a join index • any non-indexed column used for join constraints • indexes of global temporary tables Collected statistics are not automatically updated by the system Refresh statistics when 5-10% of the table rows have changed Database Workload Continuum Transactional (OLTP) • User Profiles • Customers • Clerks • Services: • Transactions • Bookkeeping • Access Profile: • Frequent updates • Occasional lookup • Data: • Current “state” data • Limited history • Narrow Scope Tactical (ODS) • User Profiles • Front Line Services • Customers - Indirectly • Services: • Lookups • Tactical decisions • Analytics (e.g. scoring) • Access Profile: • Continuous updates • Frequent lookups • Data Model: • Current “state” data • Recent history • Integrated business areas Strategic (EDW) • User Profiles • Back Office Services • Management • Trading Partners • Services: • Strategic decisions • Analytics (e.g. scoring) • Access Profile: • Bulk Inserts – Some Updates • Frequent complex analytics • Data Model: • Periodic “state” data • Deep history • Enterprise integrated view Workload Continuum OLTP1 ••• OLTPi ••• OLTPn Transactional Repositories ODS1 • • • ODS2 Tactical Decision Repositories Enterprise Data Warehouse Strategic Decision Repositories Database Workload Continuum Transactional (OLTP) • User Profiles • Customers • Clerks • Services: • Transactions • Bookkeeping • Access Profile: • Frequent updates • Occasional lookup • Data: • Current “state” data • Limited history • Narrow Scope Tactical (ODS) • User Profiles • Front Line Services • Customers - Indirectly • Services: • Lookups • Tactical decisions • Analytics (e.g. scoring) • Access Profile: • Continuous updates • Frequent lookups • Data Model: • Current “state” data • Recent history • Integrated business areas Strategic (EDW) • User Profiles • Back Office Services • Management • Trading Partners • Services: • Strategic decisions • Analytics (e.g. scoring) • Access Profile: • Bulk Inserts – Some Updates • Frequent complex analytics • Data Model: • Periodic “state” data • Deep history • Enterprise integrated view Workload Continuum OLTP1 OLTPi OLTPn Transactional Repositories Active Data Warehouse Tactical and Strategic Decision Repositories Data Warehouse Needs Will Evolve Workload Complexity ACTIVATING MAKE it happen! • • • • • • • Query complexity grows Workload mixture grows Data volume grows Schema complexity grows Simultaneous Workloads: Depth of history grows Strategic, tactical, Number of users grows loading Expectations grow OPERATIONALIZING WHAT IS happening? Increasing depth and breadth of users and queries ANALYZING WHY did it happen? REPORTING WHAT happened? Event-based triggering takes hold Continuous update and time-sensitive queries become important Analytical modeling grows Increase in ad hoc analysis Primarily batch and some ad hoc reports PREDICTING WHAT WILL happen? Increasing depth and breadth of data Batch Ad Hoc Analytics Continuous Update/Short Queries Event-Based Triggering Data Sophistication Single View of the Business – Better, Faster Decisions – Drive Business Growth Data Warehouse Needs Will Evolve Workload Complexity ACTIVATING MAKE it happen! • • • • • • • Automate Query complexity grows Workload mixture grows Data volume grows Schema complexity grows Depth of history grows Number of users grows Expectations grow OPERATIONALIZING WHAT IS happening? PREDICTING WHAT WILL happen? Execute ANALYZING WHY Understand did it happen? REPORTING WHAT happened? Primarily batch and some ad hoc reports Continuous update and time-sensitive queries become important Analytical modeling grows Increase in ad hoc analysis Measure Event-based triggering takes hold Optimize Batch Chasm from static to dynamic decisionmaking Ad Hoc Analytics Continuous Update/Short Queries Data Sophistication Single View of the Business – Better, Faster Decisions – Drive Business Growth Event-Based Triggering Workload Complexity Data Warehouse Needs Will Evolve • • • • • • • Database Requirement: Data Warehouse Foundation must handle multi-dimensional growth! Query complexity grows Workload mixture grows Data volume grows Schema complexity grows Depth of history grows Number of users grows Expectations grow OPERATIONALIZING WHAT IS happening? PREDICTING WHAT WILL happen? ANALYZING WHY did it happen? REPORTING WHAT happened? ACTIVATING MAKE it happen! Event-based triggering takes hold Continuous update and time-sensitive queries become important Analytical modeling grows Batch Ad Hoc Increase in ad hoc analysis Primarily batch and some ad hoc reports Analytics Continuous Update/Short Queries Event-Based Triggering Data Sophistication Single View of the Business – Better, Faster Decisions – Drive Business Growth The Multi-Temperature Warehouse • Customers desire deep historical data in the warehouse. > The access frequency or average temperature of data varies. – HOT, WARM, COOL, dormant > Seamless management required. • Teradata systems can address this need through a combination of technologies, such as: > Partitioned primary index (PPI). > Multi-value compression. > Priority scheduler. Three Tiers of Workload Management Teradata Dynamic Query Manager Pre-Execution •Control what and how much is allowed to begin execution •Manage the level of resources allocated to different priorities of executing work Priority Scheduler ADW (prioritized queues) Database Query Log Post-Execution •Analyse query performance and behavior after completion Teradata Dynamic Query Manager Indexes • PI (UPI and NUPI) • SI (USI and NUSI) • Join Index single table index multi table index aggregated index sparse index (where clause used) partial covering global • Materialized Views (join index) An Integrated, Centralized Data Warehouse Solution Database Must Scale in Every Dimension Data Volume (Raw, User Data) Mixed Workload Query Concurrency Data Freshness Query Complexity Schema Sophistication Query Freedom Query Data Volume An Integrated, Centralized Data Warehouse Solution Database Must Scale in Every Dimension Data Volume (Raw, User Data) Mixed Workload Data Freshness Query Concurrency The Teradata Difference Query Complexity Schema Sophistication Query Freedom Query Data Volume The Teradata Difference “Multi-dimensional Scalability” Data Volume (Raw, User Data) Mixed Workload Query Concurrency Good Example? TPC-H Benchmark Customers Need to Evaluate “Real Life” Workloads Data Freshness Query Freedom Query Complexity Query Data Volume Schema Sophistication Teradata Experience in the Communications Industry Companies generating >80% of the industry revenue utilize Teradata Data Warehousing Some of Teradata’s Retail Customers Worldwide Teradata is Well-Positioned in the Top Global 3000 Industries 80% of Top Global Telco Firms 70% of Top Global Airlines 65% of Top Global Retailers 60% of Top Most Admired Global Companies 50% of the Top Transportation Logistic Firms • Leading industries > > > > > > > > Banking Government Insurance & Healthcare Manufacturing Retail Telecommunications Transportation Logistics Travel • World class customer list > More than 750 customers > Over 1200 installations • Global presence > Over 100 countries FORTUNE Global Rankings, April 2005 Industry Leaders Use Teradata Teradata Global 400 Customers 54% of Retailers 50% of Telco Industry 50% of Transportation Industry 32% of Financial Services Industry • Leading industries > > > > > > > > Banking Government Insurance & Healthcare Manufacturing Retail Telecommunications Transportation Logistics Travel • World class customer list > More than 750 customers > Over 1200 installations • Global presence > Over 100 countries 19% of Manufacturers www.teradata.com Data Volume (Raw, User Data) Mixed Workload Data Freshness Query Concurrency The Teradata Difference Query Complexity Schema Sophistication Query Freedom Query Data Volume