Teradata HU

Transcription

Teradata HU
Teradata
Architecture, Technology, Scalabilty, Performance
and Vision for Active Enterprise Data Warehousing
Dr. Barbara Schulmeister
Teradata – a Division of NCR
Barbara.Schulmeister@ncr.com
28. 6. 2005
Agenda
•
•
•
•
•
•
•
•
•
•
•
•
History
Definitions
Hardware
Architecture
Fault Tolerance and High Availability
Coexistence
Operational System
Tools and Utilities
Data Distribution
SQL Parser
Active Data Warehouse
Scalability
Teradata Timeline Overview
Born to be parallel!
DBC
Model 1:
First MPP
System!
1979...
Teradata
Corp.
Founded
1984
“Product of
the Year”
– Forbes
1985
First
100GB
System!
First Beta system
shipped Christmas
to Wells Fargo Bank
1986
“Fastest
Growing Small
Company”
– INC
Magazine
DBC
Model 3
1987
1988
First
500GB
System!
Initial public
offering on
Wall Street
1989
First
700GB
System!
1990
First
Terabyte
System!
“Fastest Growing
Electronic
Company”
– Electronic
Business
1991
1992
DBC
Model 4
Joint Venture with
NCR for next
generation systems
“Leader in
Commercial
Parallel
Processing”
– Gartner Group
1993
1994...
3+ TB
System!
more
Teradata Timeline (II)
“#1 in
MPP”
– IDC
Survey in
Computerworld
...1995
Only Vendor
to Publish
Multi-user
TPC-Ds!
DB Expo
Realware
Award w/
Union Pacific:
“Data Warehouse
Innovations”
First Vendor
to Publish
1TB TPC-D
Benchmark!
1996
Teradata
Version 2
on NCR
3555 SMP
Over 500
Production
Data
Warehouses
Worldwide!
Teradata V2
on
WorldMark
4300
1997
Teradata V2
on WorldMark
5100
SMP & MPP
“...only NCR’s
Teradata V2
RDBMS has
proven it can
scale…”
– Gartner
Group
Demonstrated
World’s Largest
Data Warehouse
Database at 11TB!
DWI VLDB Best
Practice Award
w/ ATT BMD:
“Data Warehouse
and
the Web”
...
100GB
TPC-D
Benchmark
Leader!
24TB
Data Warehouse
in Production!
more
Teradata Timeline (III)
Teradata V2
ported to
Microsoft
Windows NT
...1998
Industry leading
TPC-D benchmark
for all volumes
Industry leading
TPCH at 1TB and 3TB
Largest Data Warehouse system (176
node, 130 TB disk)
1999
Database
Programming
and Design
Award
2000
Teradata attains
99.98%
availability
2001
IT Award of
Excellence
• TDWI Solution Provider Best
Practices in Data Warehousing
• TDWI Leadership in Data
Warehousing Award
• DM Review World-Class Solution
Award for business Intelligence
• IT Times Award
• DM Review 100 Award
• DM Review Readership Award
• Intelligent Enterprise Real Ware Award
2002
2003
V2R5
Teradata
64 bit
Teradata
2004
V2R6
Teradata
2005
Linux
the commitment
continues…
Alternative Approaches to Enterprise Analytics
Virtual,
Distributed,
Federated
Data Mart
Centric
Sources
Sources
Middleware
Marts
Users
Users
Hub-andSpoke Data
Warehouse
Enterprise
Data
Warehouse
Sources
Sources
DW
DW
Marts
Users
Users
Independent Data
Marts
Leave Data Where it
Lies
Hub-and-Spoke Data
Warehouse
Centralized
Integrated Data With
Direct Access
P
r
o
s
• Easy to Build
Organizationally
• Limit Scope
• Easy to Build
Technically
• No need for ETL
• No need for separate
platform
• Allows easier
customization of user
interfaces & reports
• Single Enterprise
“Business” View
• Data reusability
• Consistency
• Low Cost of Ownership
C
o
n
s
• Business Enterprise
view unavailable
• Redundant data costs
• High ETL costs
• High App costs
• High DBA and
operational costs
• Only viable for low
volume access
• Meta data issues
• Network bandwidth and
join complexity issues
• Workload typically placed
on workstation
• Business Enterprise
view challenging
• Redundant data costs
• High DBA and
operational costs
• Data latency
• Requires corporate
leadership and vision
A Spectrum of Data Warehouse Architectures
Data Mart
Centric
Sources
Sources
Middleware
Marts
Users
Hub-andSpoke
Data
Warehouse
Virtual,
Distributed,
Federated
Users
Enterprise
Data
Warehouse
Sources
Sources
DW
DW
Marts
Users
Users
The goal:
Any question,
on any data,
at any time.
Teradata’s Advocated
Data Warehouse
Approach for 20 years,
Since 1984!
Diffentiating OLTP - DSS
Most time consuming steps:
OLTP
l
l
l
l
DSS
Full scan of big tables
Complexe joins
Aggregation
Sorting
Frequency of steps
OLTP or DSS
NCR Server
• Provide customers with
growth opportunities and
investment protection
> Coexistence is enabled
across five generations
– NCR 5400E & 5400H
Servers
– NCR 4980 & 5380 Servers
– NCR 4950 & 5350 Servers
– NCR 4900 & 5300 Servers
– NCR 485X & 525X Servers
BYNET V2 / V3
485X
&
525X
4900
&
5300
4950
&
5350
4980
&
5380
5400E
&
5400H
NCR Server Generations
NCR 5400 Server SMP
• 5400E
Ethernet Switches
> 1 - 4 nodes
> BYNET V2
> ESCON & FICON for 3 and
4 node configurations
> Field Upgradeable to
5400H
Up to 4 nodes within each cabinet
1
3
nd Node
2
2nd
Node
st Node
1
1st
Node
4th Node
1
3
1
3
1
3rd Node
3
1
Server Management
Three UPS Modules
3GSM
3
Internal BYNET
switches
NCR 5400 Server MPP
Ethernet
Switches
• Continued rapid adoption of latest
Intel® Technology
> Dual Intel Pentium Xeon EM 64T 3.6
GHz processors with Hyper-Threading
(32-bit and 64-bit capability)
> 800 MHz front side bus
• Industry Standard Form Factor
> Up to 10 nodes per cabinet
> Integrated BYNET V3 (provides the
capability to physical separate
systems between 300-600 meters)
> Integrated Server Management
> N+1 UPS
> Dual AC
• Multi-Generation Coexistence
> Investment protection
BYNET V3
Switches
1
3
1
3
1
3
1
3
Up to 10
nodes within
each cabinet
1
3
1
3
1
3
1
3
1
3
1
Server
Management
Five UPS
Modules
3
1
3
Relative CPU Performance per Core
Industry CPU Performance per Core
3000
2500
54000
2000
1500
Itanium 2
1.6 Ghz
130nm
Xeon
3.0Ghz 1M
130nm
1000
Xeon
2M L2
3.6 Ghz
90nm
Xeon
3.6 Ghz
90nm
Xeon
2M L2
>3.6 Ghz
90nm
Next Gen Arch.
Dual Core
65 nm
Dual Core
65 nm
Montecito
90nm
Tukwilla
Common Platform
65nm
Power 6
~3Ghz
65nm
Itanium 2
9M
130nm
Power 5+
~2.5Ghz
90 nm
Power 5
~1.9Ghz
130 nm
Multi Core
45 nm
Rock
90nm
500
Power 4+
1.45Ghz
130 nm
Xeon
Ultrasparc 3
130 nm
1.6Ghz
Itanium
Power
Sparc
0
2004
2005
2006
2007
Year
Symmetric Multi Threading (Hyper Threading)
Dual Core
Relative CPU Performance based on multi-threading
and multi-core roadmap capabilities
Multicore, Multithreaded
www.spec.org: benchmarks SPECint2000 and SPECint_rate2000
Gartner Product Ranking 2004 ASEM
Sunfire
SUN
Teradata
NCR
pSeries
IBM
Proliant
HP
Integrity
HP
HP9000
PRODUCT
HP
Primepower
FUJITSU
43
45
46
29
45
54
40
The Product category (which was called Technology in previous ASEM updates)
focuses on the performance and reliability/availability aspects of each platform. In
this category Teradata received a very strong 93.5% of total possible points and
leads the IBM pSeries with 74.35% by 44 points or 19%.
Source Gartner 2004 ASEM Report
NCR Enterprise Storage 6842
• NCR Enterprise Storage 6842 Features
> Two array modules per cabinet
> 56, 73GB, 15K drives
– greater than 8 Terabytes of spinning disk
per cabinet
> Dual Quad Fibre Channel Controllers
per array for performance and availability
> Typical configuration is 4 NCR 5400 Server nodes per 3 –
6842 arrays
– 1.2 Terabytes of database space per node (RAID 1)
> Supports RAID 1 and RAID 5
> Support for MP-RAS and Microsoft Windows Server 2003
environments
EMC Symmetrix DMX
• Enterprise Fit
• Storage Standardization
• Extended storage life
through Redeployment
EMC Model
DMX 1000 M2
DMX 2000 M2
73GB – 15K RPM
73GB – 15K RPM
Teradata Use
MPP: supports 1 or 2 nodes
per cabinet
MPP: supports 2, 3, or 4
nodes per cabinet
RAID Options
RAID -1 Only
RAID-1 Only
MP-RAS and Windows
MP-RAS and Windows
96
192
Disks
Operating Environment
Maximum Teradata disks
Assumption: Compute and Storage Balance
• A balanced configuration is one where the storage
I/O subsystem for each compute node is configured
with enough disk spindles, disk controllers, and
connectivity so that the disk subsystem can satisfy the
CPU demand from that node.
• A supersaturated configuration also can satisfy the
CPU demand from that node although the extra I/O
may be underutilized.
> This is useful for investment protection on certain
upgrade paths.
• All system configurations discussed in this presentation
are based on balanced or supersaturated compute
nodes.
Node CPU and Storage I/O Balance
Effective Node
Utilization
95% Node Utilization
th
id
dw
n
Ba
Best Query
Response Time
# of Disk Drives/Storage Capacity
Industry wide, disk drive capacity is increasing at a faster rate
than disk drive performance
Query Response Time
e
im
eT
ns
po
es
yR
er
Qu
I/O Bandwidth – MB/sec
Optimum
Node/Storage
Balance
and
Response Time
Common Upgrades Applied
GROW RAW DATA VOLUME
Query Response Time
Performance more than adequate: Add more data to all nodes
SYSTEM with CURRENT Nodes
Query Response Time
Increases
because you didn’t add more
compute power to support
the additional raw data
volume.
Raw Data Volume
Typical System Expansion
LINEAR GROWTH
Query Response Time
Maintain Query Performance with more nodes
SYSTEM with
Current Nodes
SYSTEM with More
or Faster Nodes
Scale out with Teradata by
adding compute nodes,
interconnect, storage arrays,
and disks.
aka “horizontal scalability”
Query Response Time
Remains Constant
because you add proportionally
more raw data volume as
compute power.
Raw Data Volume
Common Upgrades Applied
GROW QUERY PERFORMANCE
Query Response Time
Raw Data Volume adequate: Upgrade to faster CPUs
SYSTEM with
Current Nodes
SYSTEM with More
or Faster Nodes
Query Response Time
Decreases
because you didn’t add
more raw data volume to
offset the increase in
compute power.
“Scale vertically” with
Teradata by increasing
compute power.
Raw Data Volume
Combo: Upgrade Nodes and Increase
Storage Per Node
Query Response Time
Adjust Query Performance and Data Volume
to match service level agreement
SYSTEM with
Current Nodes
SYSTEM with More
or Faster Nodes
Scale to Target query
performance and data volume
by increasing compute power
and adding storage.
Raw Data Volume
Scaling by Reconfiguration and Expansion
Query Response Time
Improve Query Performance and
Adjust Data Volume to match service level agreement
SYSTEM with
Current Nodes
SYSTEM with
More Nodes
Improve query performance
and adjust data volume by
reducing storage per node and
adding more nodes.
Raw Data Volume
CPU uses
independent direct
I/O path to Disk
All memory accesses
are local
Interconnect used
only for database
messages, no I/O or
memory traffic
BYNET Fabrics
Architecture Determines Scalability
CPU(s)
Cache
Memory
Disk Storage
CPU(s)
Cache
Memory
Disk Storage
CPU(s)
Cache
Memory
Disk Storage
CPU(s)
Cache
Memory
Disk Storage
CPU(s)
Cache
Memory
Disk Storage
CPU(s)
Cache
Memory
Disk Storage
CPU(s)
Cache
Memory
Disk Storage
CPU(s)
Cache
Memory
Disk Storage
Teradata Shared-Nothing MPP
• Designed for Slope of 1 Linear Scaling
• Optimized for very high data rates to/from disk
• Excellent performance and efficiency for data
warehousing
Teradata MPP Architecture
• Nodes
> Incrementally scalable to
1024 nodes
> Windows or Unix
• Storage
> Independent I/O
> Scales per node
• BYNET Interconnect
Dual BYNET Interconnects
SMP Node1
SMP Node2
SMP Node3
SMP Node4
CPU1
CPU1
CPU1
CPU1
CPU2
Memory
CPU2
Memory
CPU2
Memory
CPU2
Memory
> Fully scalable bandwidth
• Connectivity
> Fully scalable
> Channel – ESCON/FICON
> LAN, WAN
• Server Management
> One console to view
the entire system
Server
Management
Node Software Architecture
Perfectly Tuned Nodes Working in Parallel for Scalability and Availability
4-Node MPP Clique
Teradata Node SW Architecture (SMP)
AMP1
AMP5
PE2
AMP2
AMP6
AMP3
AMP7
AMP4
AMP8
LAN
Gateway
Communication
Interfaces
Channel
Gateway
BYNET
Access
Module
Processor
VPROCS
Parallel Database
Extensions (PDE)
UNIX, Windows 2000
PEs recieve the queries and figure out the query plan
AMPs interact with the disk arrays and process the data
PE VProc
Parser
Optimizer
Dispatcher
Session
Control
VAMP
Relational Database Management
File System / Data Management
VPROCs
AMP & PE
VPROCs
AMP & PE
VPROCs
AMP & PE
VPROCs
AMP & PE
Disk Array
PE1
VPROCS
Parsing
Engine
Virtual
Processors
(VPROCS)
The Scalable BYNET Interconnect
Specifically designed for data warehousing workloads
Multiple Simultaneous
Point-to-Point Messaging
Node
Broadcast
Messaging
Node
Node
Node
Node
Node
Node
Node
Node
Node
• Bandwidth scales linearly to 1,024 nodes
• Redundant, fault tolerant network
• Guaranteed message delivery
The Teradata
Optimizer
chooses between
Point-to-Point
and Broadcast
Messaging to select
the most effective
communication.
Built-In Integrated Fail Over
• Teradata provides built-in node failover.
> Cost effective
> Easy to deploy
• Work migrates to the remaining nodes in the cliques.
• System performance degradation up to 33%.
X
Traditional Configuration
Large Cliques
• Double the number of nodes in a clique up to 8.
• Work distributed across a greater number of nodes.
• Minimize system performance impacts – may not be
noticeable to end-users.
Node
X
Node
Node
Node
Node
Node
Node
Node
86% System
Performance
Continuity
Fibre Channel
Switches
Disk Array
Disk Array
Disk Array
Disk Array
Disk Array
Disk Array
Hot Standby Nodes
• Work re-directed to a Hot Standby Node.
• No system performance impacts.
• Teradata restart can be postponed to a maintenance
window.
X
Node
Disk Array
Node
Node
Hot Standby
100% System
Performance
Continuity
Disk Array
Disk Array
Large Clique + Hot Standby Node
• Same performance benefits of Hot Standby node.
• Reduced costs for larger system implementations.
X
Node
Node
Node
Node
Node
Node
Node
Fibre Channel
Switches
Disk Array Disk Array Disk Array Disk Array Disk Array Disk Array
Hot
Standby
100% System
Performance
Continuity
High Availability
Case
Hardware
Power Failure
UPS (redundant),
Dual AC
Node Failure
Bynet failure
Disc failure
Teradata
VPROC Migration (VAMP, PE)
Redundant BYNET
RAID-1/-5/-S in Disc
Subsystem
More than one Disc Failure
Clique Failure
Fallback-Option
Fallback-Option
BYNET
Server Nodes
DiskArray
Subsystem
Clique
Coexistence Considerations
Generation x
VAMPS
Performance Factor
Generation x
VAMPS
Generation x
VAMPS
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
1x
1.5x
2.0x
• VAMPs manage the same amount of data
• Coexistence enables the faster nodes to be realized by
running more VAMPs per node
System Expansion with Teradata Coexistence
• The utilization of multiple generations of hardware within
a single Teradata MPP system
Dual BYNET Interconnects
SMP Node1
AMP
AMP
AMP
SMP Node2
AMP
AMP
AMP
SMP Node3
AMP
AMP
AMP
5380 - 9 AMPs per node
SMP Node4
AMP
AMP
AMP
SMP Node1
AMP
AMP
AMP
SMP Node2
AMP
AMP
AMP
SMP Node3
AMP
AMP
AMP
SMP Node4
AMP
AMP
AMP
5400 - 12 AMPs per node
Server
Management
Customer Example (72 Nodes, 4 Generations)
11/2000
1/2001
Original Footprint Expansion 1
8 Node 5250
12 Node 5250
Generation
“A”
11/2001
Expansion 2
8 Node 5255
6/2002
Expansion 3
12 Node 5300
6/2003
Expansion 4
16 Node 5350
6/2004
Expansion 5
16 Node 5380
2005
Future 5400
Expansion
Generation
“B”
Generation
“C”
Generation
“D”
Generation
“E”
Database and Operating System
V2R5.0.3, V2R5.1.X, V2R6 on MP-RAS 3.03
V2R5.0.3, V2R5.1.X and V2R6 on MP-RAS 3.02
485x/525x
4900/5300
4950/5350
4980/5380
5400
V2R6 on WS03 (2Q 2005)
V2R5.0.3, V2R5.1.X, V2R6 on W2K (2Q 2005)
• Database
> Teradata V2R6
> Support one Release Back V2R5.1.X (Current Exception in Place V2R5.0.3)
• Unix
> MP-RAS 3.03 required for Teradata Database on 5400
> MP-RAS 3.02 still supported on previous server generations
• Microsoft Windows
> Microsoft Windows Server 2003 recommended for new and expanding 5400 motions
> Microsoft Windows 2000 supported in 2Q 2005
V2R5.0 Features + V2R5.1 Features
Strategic Decision Making
• Analytic extensions
such as Extended
Windows Functions
& Multiple Aggregate
Distincts
• Random Stratified
Sampling
• Join Elimination
• Extended Transitive
Closure
• Partial Group By
• Early Group By
• Derived Table Rewrite
• Very Large SQL
Single
Version of
the Truth
• Extended Grouping
• Inner Join Optimization
• Eliminate Unnecessary
Outer Joins
• Hash Joins
• UDFs for Complex Analysis
and Unstructured Data
Tactical & Event-Driven
Decision Making
•
•
•
•
•
•
Partial Covering Join Index
Global Index
Sparse Index
Join Index Extensions
ODS Workload Optimization
Stored Procedures Enhancements
Trusted,
Integrated Environment
•Index Wizard
•Statistics Wizard
& Collect Statistics
Improvements
•Query Log
•Extreme Workload
Management &
Administration
•Roles and Profiles
•SQL Assistant/
Web Edition
•Availability
•Performance
Dashboard &
Reporting
• Cylinder Read
• Partitioned Tables (PPI)
• Value List Compression
• 2000 Columns, 64 Columns per Index
• Identity Column
• Enhancement to Identity Column
• UTF16 Support
• PPI Dynamic Partition Elimination
• Large Objects (LOBs)
•
•
•
•
•
•
•
• Enhancements to Triggers
• Extra FK-PK Joins in Join Index
• UDFs for XML Processing etc.
Security enhancements (Encryption)
DBQL enhancements
Database Object Level Use Count
ROLES enhancements
Priority Scheduler enhancements
TDQM enhancements
No Auto Restart After Disk-Array Power
Failure
• Cancel Rollback
• Incompatible Package Warning
• Disk I/O Integrity Check
Data
Freshness
•• Continuous
Continuous Update
Update Performance
Performance
&
& Manageability
Manageability
•• Faster
Faster Join
Join Index
Index Update
Update
•• Join
Join Update
Update Performance
Performance
•• Bulk
Bulk Update
Update Performance
Performance
•• Teradata
Teradata Warehouse
Warehouse Builder
Builder Full
Full Functionality
Functionality
&
& Platform
Platform Support
Support
•UDFs for Data Transformation
and Scoring
V2R6.0 Feature List
Strategic Decision Making
• Remove 1MB limit on plan cache size
• Increase response buffer to 1MB
• Table header expansion
• Improve Random AMP Sampling
• Top N Row Operation
• Recursive Queries
Tactical & Event-Driven Decision
Making
• Improve Primary Index Operations\
• Improved IN-list processing
• External Stored Procedures
• Trigger calling a Stored Procedure
• Stored Procedure Internals Enhancements
• Queue tables
Trusted,
Integrated Environment
Single View of
Your Business
• Teradata Dynamic Workload
Management
• Extensible User Authentication
• Directory Integration
• Global deadlock logging
• Faster Rollbacks
• Stored Procedure LOB support
• External Table Function
• Partition level BAR
• Eliminate indexed row IDs (PPI)
• PPI Join performance improvement
• DBS Information consolidation
Data
Freshness
• Replication Services
• Array support
• Priority Scheduler enhancements
• Reduce restart time
Teradata Tools & Utilities (1)
Load/Unload
Database Management
Teradata Warehouse Builder
FastLoad, MultiLoad &
FastExport
Teradata TPump
Access Modules
Teradata
Database
Metadata
Teradata Utility Pak
Teradata Administrator
Teradata SQL Assistant
Teradata SQL Assistant/Web Edition
BTEQ
ODBC
JDBC
CLI
OLE DB Provider
Teradata Manager
Teradata Dynamic Query Manager
Teradata System Emulation Tool
Teradata Visual Explain
Teradata Index Wizard
Teradata Statistics Wizard
Teradata Meta Data Services
.com
Mainframe Connectivity
Mainframe Channel Connect
TS/API, CICS, HUTCNS & IMS/DC
Any Query,
Any Time
Technical Differentiator: Database Utilities
Teradata Utilities Are Robust and Mature
• Teradata utilities are fully parallel.
• Teradata utilities have checkpoint restart capability.
• Data loads directly from the source into the database.
>
>
>
>
No
No
No
No
manual data partitioning.
file splitting.
intermediary file transfers.
separate data conversion step.
Parallel
In
Parallel
Out
Teradata
Warehouse
Teradata Tools & Utilities (2)
•
•
•
•
•
•
Teradata
Teradata
Teradata
Teradata
Teradata
.....
Data Profiler
CRM
Warehouse Miner
Demand Chain Management
Supply Chain Management
(LDM = Logical Data Model)
•
•
•
•
•
•
•
•
Financial Solution LDM
Retail LDM
Communication LDM
Insurance/Healthcare LDM
Manufacturing LDM
Government LDM
Media and Entertainment LDM
Travel/Transportation LDM
Two Basic Software Architecture Models
Task Centric and Data Centric
Request
Request
Task
Shared
Memory
Request
Parallel
Optimizer
Task
Parallel Unit
DATA
Request
Parallel Unit
DATA
Uniform and shared access
to all platform resources
(disk, etc) is REQUIRED
Data
Data
Data
Data
Exclusive access to a
subset of resources
Data Centric Software: Teradata Virtual AMP
Tables
AMP - Balanced collection of three abstracted platform resources
28
27
26
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
Table A
P
Memory
Table B
D
Disk
AMP 1
AMP 2
AMP 3
AMP 4
P
P
P
P
M
25
21
17
13
9
5
1
M
Processor
D
M
26
22
18
14
10
6
2
D
M
D
27
23
19
15
11
7
3
Table C
• Each virtual AMP has rows from every table
• Each virtual AMP works independently on its rows
• Goal: Database rows are equally distributed across
multiple tables
M
28
24
20
16
12
8
4
D
Data distribution by Primary Index
Primary Index value for a row
Hashing Algorithm
Destination Selection Word
(DSW) – first 16 bits
Row Hash (32 bits)
Current configuration Primary
Current configuration Fallback
Reconfiguration Primary
Reconfiguration Fallback
Hash Map
Dual BYNET Interconnects
Node1
Node2
Node3
Node4
Node1
Node2
Node3
Node4
Node1
Node2
Node3
Node4
Node1
Node2
Node3
Node4
Teradata Hashing
Table ORDER
O rder
Num ber
C u s to m e r
Num ber
O rder
D a te
O rder
S tatus
2
3
1
1
2
1
3
1
2
4/13
4/13
4/13
4/10
4/15
4/12
4/16
4/13
4/09
O
O
C
O
C
C
C
C
C
(Hexadecimal)
PK
HASH MAP
UPI
7325
7324
7415
7103
7225
7384
7402
7188
7202
SELECT * FROM ORDER
WHERE order_number = 7225;
000
001
002
003
004
005
0
01
01
01
01
01
01
1
02
02
02
02
02
02
2
03
03
03
03
03
03
3
04
04
04
04
04
04
4
05
05
05
05
05
05
5
06
06
06
06
06
06
6
07
07
07
07
07
07
7
08
08
08
08
08
08
8
01
01
01
01
01
01
9
02
02
02
02
02
02
A
03
03
03
03
03
03
B
04
04
04
04
04
04
7225
AMP 1
Hashing Algorithm
AMP 2
AMP 3
32 bit Row Hash
DSW #
0000 0000 0001 1010
V
V
V
V
0
0
1
A
Remaining16 bits
1100 0111 0101 1011
Bucket Number
7225 2 4/13 O
AMP x
C
05
05
05
05
05
05
D
0
0
0
0
0
0
E
F
Primary Index Choice Criteria
ACCESS
Maximize one_AMP operations: choose the column most frequently
used for access
DISTRIBUTION
Optimize parallel processing: choose a column that provides good
distribution
Volatility
Reduce maintenance resource overhead (I/O): choose a column
with stable data values
Data distribution by Primary Index - 2
SQL request
Parser algorithm
48 bit table ID
32 bit row hash value
Index value
Dual BYNET Interconnects
Node1
Node2
Node3
Node4
Node1
Node2
Node3
Node4
Node1
Node2
Node3
Node4
Logical block identifier
Node1
Node2
Node3
Node4
SQL Parser Overview
Request Parcel
Cached?
Syntaxer
DD
DBase, AccRights
TVM, TVFields
Indexes
Resolver
Security
Statistics
Optimizer
Costs
Generator
Data Parcel
GNCApply
AMP Steps
Serial steps
Parallel steps
Individual and common steps (MSR)
Additional:
Triggers, check constraints,
references, foreign keys, join indexes
collected statistics or dynamic
sampling
Statistics Summary
Collect statistics on
• all non-unique indexes
• UPI of any table with less than x rows per AMP
(dependent on available number of AMPs)
• All indexes of a join index
• any non-indexed column used for join constraints
• indexes of global temporary tables
Collected statistics are not automatically updated by the
system
Refresh statistics when 5-10% of the table rows have
changed
Database Workload Continuum
Transactional (OLTP)
• User Profiles
• Customers
• Clerks
• Services:
• Transactions
• Bookkeeping
• Access Profile:
• Frequent updates
• Occasional lookup
• Data:
• Current “state” data
• Limited history
• Narrow Scope
Tactical (ODS)
• User Profiles
• Front Line Services
• Customers - Indirectly
• Services:
• Lookups
• Tactical decisions
• Analytics (e.g. scoring)
• Access Profile:
• Continuous updates
• Frequent lookups
• Data Model:
• Current “state” data
• Recent history
• Integrated business areas
Strategic (EDW)
• User Profiles
• Back Office Services
• Management
• Trading Partners
• Services:
• Strategic decisions
• Analytics (e.g. scoring)
• Access Profile:
• Bulk Inserts – Some Updates
• Frequent complex analytics
• Data Model:
• Periodic “state” data
• Deep history
• Enterprise integrated view
Workload Continuum
OLTP1
•••
OLTPi
•••
OLTPn
Transactional Repositories
ODS1
• • •
ODS2
Tactical Decision Repositories
Enterprise
Data
Warehouse
Strategic Decision Repositories
Database Workload Continuum
Transactional (OLTP)
• User Profiles
• Customers
• Clerks
• Services:
• Transactions
• Bookkeeping
• Access Profile:
• Frequent updates
• Occasional lookup
• Data:
• Current “state” data
• Limited history
• Narrow Scope
Tactical (ODS)
• User Profiles
• Front Line Services
• Customers - Indirectly
• Services:
• Lookups
• Tactical decisions
• Analytics (e.g. scoring)
• Access Profile:
• Continuous updates
• Frequent lookups
• Data Model:
• Current “state” data
• Recent history
• Integrated business areas
Strategic (EDW)
• User Profiles
• Back Office Services
• Management
• Trading Partners
• Services:
• Strategic decisions
• Analytics (e.g. scoring)
• Access Profile:
• Bulk Inserts – Some Updates
• Frequent complex analytics
• Data Model:
• Periodic “state” data
• Deep history
• Enterprise integrated view
Workload Continuum
OLTP1
OLTPi
OLTPn
Transactional Repositories
Active
Data
Warehouse
Tactical and Strategic Decision Repositories
Data Warehouse Needs Will Evolve
Workload Complexity
ACTIVATING
MAKE it happen!
•
•
•
•
•
•
•
Query complexity grows
Workload mixture grows
Data volume grows
Schema
complexity grows
Simultaneous
Workloads:
Depth of
history grows
Strategic, tactical,
Number of users grows
loading
Expectations grow
OPERATIONALIZING
WHAT IS happening?
Increasing depth
and breadth of
users and queries
ANALYZING
WHY
did it happen?
REPORTING
WHAT
happened?
Event-based triggering
takes hold
Continuous update and
time-sensitive queries
become important
Analytical
modeling
grows
Increase in
ad hoc analysis
Primarily batch and
some ad hoc reports
PREDICTING
WHAT WILL
happen?
Increasing depth
and breadth of data
Batch
Ad Hoc
Analytics
Continuous Update/Short Queries
Event-Based Triggering
Data Sophistication
Single View of the Business – Better, Faster Decisions – Drive Business Growth
Data Warehouse Needs Will Evolve
Workload Complexity
ACTIVATING
MAKE it happen!
•
•
•
•
•
•
•
Automate
Query complexity grows
Workload mixture grows
Data volume grows
Schema complexity grows
Depth of history grows
Number of users grows
Expectations grow
OPERATIONALIZING
WHAT IS happening?
PREDICTING
WHAT WILL
happen?
Execute
ANALYZING
WHY
Understand did it happen?
REPORTING
WHAT
happened?
Primarily batch and
some ad hoc reports
Continuous update and
time-sensitive queries
become important
Analytical
modeling
grows
Increase in
ad hoc analysis
Measure
Event-based triggering
takes hold
Optimize
Batch
Chasm from static to
dynamic decisionmaking
Ad Hoc
Analytics
Continuous Update/Short Queries
Data Sophistication
Single View of the Business – Better, Faster Decisions – Drive Business Growth
Event-Based Triggering
Workload Complexity
Data Warehouse Needs Will Evolve
•
•
•
•
•
•
•
Database Requirement:
Data Warehouse
Foundation must handle
multi-dimensional growth!
Query complexity grows
Workload mixture grows
Data volume grows
Schema complexity grows
Depth of history grows
Number of users grows
Expectations grow
OPERATIONALIZING
WHAT IS happening?
PREDICTING
WHAT WILL
happen?
ANALYZING
WHY
did it happen?
REPORTING
WHAT
happened?
ACTIVATING
MAKE it happen!
Event-based triggering
takes hold
Continuous update and
time-sensitive queries
become important
Analytical
modeling
grows
Batch
Ad Hoc
Increase in
ad hoc analysis
Primarily batch and
some ad hoc reports
Analytics
Continuous Update/Short Queries
Event-Based Triggering
Data Sophistication
Single View of the Business – Better, Faster Decisions – Drive Business Growth
The Multi-Temperature Warehouse
• Customers desire deep historical data in the warehouse.
> The access frequency or average temperature of data
varies.
– HOT, WARM, COOL, dormant
> Seamless management required.
• Teradata systems can address this need through a
combination of technologies, such as:
> Partitioned primary index (PPI).
> Multi-value compression.
> Priority scheduler.
Three Tiers of Workload Management
Teradata Dynamic
Query Manager
Pre-Execution
•Control what and how much
is allowed to begin execution
•Manage the level of resources
allocated to different priorities
of executing work
Priority
Scheduler
ADW
(prioritized queues)
Database
Query Log
Post-Execution
•Analyse query performance
and behavior after completion
Teradata Dynamic Query Manager
Indexes
• PI (UPI and NUPI)
• SI (USI and NUSI)
• Join Index
single table index
multi table index
aggregated index
sparse index (where clause used)
partial covering global
• Materialized Views (join index)
An Integrated, Centralized Data Warehouse Solution
Database Must Scale in Every Dimension
Data Volume
(Raw, User Data)
Mixed
Workload
Query
Concurrency
Data
Freshness
Query
Complexity
Schema
Sophistication
Query
Freedom
Query Data Volume
An Integrated, Centralized Data Warehouse Solution
Database Must Scale in Every Dimension
Data Volume
(Raw, User Data)
Mixed
Workload
Data
Freshness
Query
Concurrency
The
Teradata
Difference
Query
Complexity
Schema
Sophistication
Query
Freedom
Query Data Volume
The Teradata Difference
“Multi-dimensional Scalability”
Data Volume
(Raw, User Data)
Mixed
Workload
Query
Concurrency
Good Example?
TPC-H
Benchmark
Customers Need to
Evaluate “Real Life”
Workloads
Data
Freshness
Query
Freedom
Query
Complexity
Query Data Volume
Schema
Sophistication
Teradata Experience
in the Communications Industry
Companies generating >80%
of the industry revenue utilize
Teradata Data Warehousing
Some of Teradata’s Retail Customers
Worldwide
Teradata is Well-Positioned in the
Top Global 3000 Industries
80% of Top Global
Telco Firms
70% of Top
Global Airlines
65% of Top
Global Retailers
60% of Top
Most Admired
Global Companies
50% of the Top
Transportation
Logistic Firms
• Leading industries
>
>
>
>
>
>
>
>
Banking
Government
Insurance & Healthcare
Manufacturing
Retail
Telecommunications
Transportation Logistics
Travel
• World class customer list
> More than 750 customers
> Over 1200 installations
• Global presence
> Over 100 countries
FORTUNE Global Rankings, April 2005
Industry Leaders Use Teradata
Teradata Global 400 Customers
54% of
Retailers
50% of
Telco Industry
50% of
Transportation
Industry
32% of
Financial Services
Industry
• Leading industries
>
>
>
>
>
>
>
>
Banking
Government
Insurance & Healthcare
Manufacturing
Retail
Telecommunications
Transportation Logistics
Travel
• World class customer list
> More than 750 customers
> Over 1200 installations
• Global presence
> Over 100 countries
19% of
Manufacturers
www.teradata.com
Data Volume
(Raw, User Data)
Mixed
Workload
Data
Freshness
Query
Concurrency
The
Teradata
Difference
Query
Complexity
Schema
Sophistication
Query
Freedom
Query Data Volume