Document 6502708
Transcription
Document 6502708
20 010 0 How w to o Buiild an a Efffecttive Data D a Mo odel © Daniel LLinstedt, 200 08-2010 Dan Linsteedt, LLC 1/1/2010 0 How to Build an Effective Data Vault Model Page 2 of 152 How to Build an Effective Data Vault Model Data Vault Modeling How To Guide Copyright © Dan Linstedt, 2008-2010 President, Empowered Holdings, Inc http://EmpoweredHoldings.com – Company Home http://DanLinstedt.com – Data Vault Home All rights reserved. All images are the property of Dan Linstedt, unless an image source is otherwise noted. No part of this book may be reproduced in any form or by any electronic or mechanical means including information storage and retrieval systems, without permission in writing from the author. The only exception is by a reviewer, who may quote short excerpts in a review. Printed in the United States of America First Printing: September, 2010 Co-Editors: Kent Graziano Abstract: The purpose of this book is to present and discuss the technical components of the Data Vault Data Model. The examples in this book provide a strong foundation for how to build, and design structures in using the Data Vault modeling technique. This book is a second in the series of books surrounding the Data Vault model and methodology (approach). The target audience is anyone wishing to implement a Data Vault model for integration purposes whether it be an Enterprise Data Warehouse, Operational Data Warehouse, or Dynamic Data Integration Store. Front Cover Image References: http://www.becomehealthynow.com/article/bodynervousadvanced/817 http://brain0.com/erobot.html © Dan Linstedt 2010, all rights reserved http://danLinstedt.com How to Build an Effective Data Vault Model Page 3 of 152 Table of Contents 1.0 Introduction and Terminology ...................................................................................................... 11 1.1 Do I need to be a Data Modeler to Read this Book? .............................................................. 11 1.2 Review of Basic Terminology .................................................................................................... 11 1.3 Notations used in this text ........................................................................................................ 15 1.4 Data Models as Ontology’s ....................................................................................................... 15 1.5 Data Model Naming Conventions and Abbreviations ............................................................. 17 1.6 Introduction to Hubs, Links, and Satellites ............................................................................. 19 1.7 Flexibility of the Data Vault Model ........................................................................................... 21 1.8 Data Vault Basis of Commutative Properties and Set Based Math ....................................... 23 1.9 Data Vault and Parallel Processing Mathematics ................................................................... 26 1.10 Loading Processes: Batch Versus Real Time .......................................................................... 31 2.0 Architectural Definitions ............................................................................................................... 32 2.1 Staging Area .............................................................................................................................. 32 2.2 EDW – Data Vault...................................................................................................................... 33 2.3 Metrics Vault.............................................................................................................................. 34 2.4 Meta Vault ................................................................................................................................. 34 2.5 Report Collections ..................................................................................................................... 35 2.6 Data Marts ................................................................................................................................. 35 2.7 Business Data Vault .................................................................................................................. 35 2.8 Operational Data Vault ............................................................................................................. 36 2.9 Dynamic Data Vault .................................................................................................................. 37 3.0 Common Attributes ....................................................................................................................... 38 3.1 Sequence Numbers .................................................................................................................. 40 3.2 Sub Sequence Numbers (Item Numbering) ............................................................................ 41 3.3 Load Dates ................................................................................................................................ 41 3.4 Load End Dates ......................................................................................................................... 43 © Dan Linstedt 2010, all rights reserved http://danLinstedt.com How to Build an Effective Data Vault Model Page 4 of 152 3.5 Last Seen Dates ........................................................................................................................ 44 3.6 Extract Dates ............................................................................................................................. 48 3.7 Record Creation Dates.............................................................................................................. 49 3.8 Record Sources ......................................................................................................................... 49 3.9 Process ID’s ............................................................................................................................... 50 4.0 Hub Entities ................................................................................................................................... 51 4.1 Hub Definition and Purpose ..................................................................................................... 53 4.2 What is a Business Key? .......................................................................................................... 54 4.3 Where do we find Business Keys? ........................................................................................... 55 4.4 Why are Business Keys Important? ......................................................................................... 56 4.5 Why not Surrogate Keys as “Master Keys”? ........................................................................... 58 4.6 Hub Smart Keys, Intelligent Keys ............................................................................................. 58 4.7 Hub Composite Business Keys ................................................................................................ 59 4.8 Hub Entity Structure .................................................................................................................. 60 4.9 Hub Examples ........................................................................................................................... 61 4.10 Dependent and Non-dependent Child Keys ............................................................................ 63 4.11 Mining patterns in the Hub Entity ............................................................................................ 65 4.12 Process of Building a Hub Table .............................................................................................. 67 4.13 Modeling Rules and Standards for Hub Tables ...................................................................... 69 4.14 What Happens when the Hub Standards Are Broken............................................................. 70 5.0 Link Entities ................................................................................................................................... 72 5.1 Link Definition and Purpose ..................................................................................................... 72 5.2 Reasons for Many To Many Relationships .............................................................................. 72 5.3 Flexibility .................................................................................................................................... 76 5.4 Granularity ................................................................................................................................. 79 5.5 Dynamic Adaptability ................................................................................................................ 82 5.6 Scalability................................................................................................................................... 83 © Dan Linstedt 2010, all rights reserved http://danLinstedt.com How to Build an Effective Data Vault Model Page 5 of 152 5.7 Link Entity Structure.................................................................................................................. 86 5.8 Link Driving Key ......................................................................................................................... 87 5.9 Link Examples ........................................................................................................................... 89 5.10 Degenerate Fields In Links ....................................................................................................... 90 5.11 Multi-Temporal Date Structures ............................................................................................... 91 5.12 Link-To-Link (Parent/Child Relationships) ............................................................................... 92 5.13 Link Applications ....................................................................................................................... 95 5.14 Hierarchical Links...................................................................................................................... 96 5.15 Same-As Links ........................................................................................................................... 98 5.16 Begin and End Dating Links ..................................................................................................... 99 5.17 Low Value Links....................................................................................................................... 102 5.18 Transactional Links ................................................................................................................. 103 5.19 Computed Aggregate Links..................................................................................................... 105 5.20 Strength and Confidence Ratings in Links ............................................................................ 106 5.21 Vector Links (Directional)........................................................................................................ 107 5.22 Exploration Links ..................................................................................................................... 108 5.23 Capturing Changes to Source Systems Over Time................................................................ 108 6.0 Satellite Entities .......................................................................................................................... 110 6.1 Satellite Definition and Purpose ............................................................................................ 110 6.2 Satellite Entity Structure ......................................................................................................... 111 6.3 Satellite Examples................................................................................................................... 112 6.4 Importance of Keeping History ............................................................................................... 113 6.5 Classification or Type of Data Examples................................................................................ 114 6.6 Rate of Change Examples ...................................................................................................... 116 6.7 Satellites Arranged by Source System ................................................................................... 118 6.8 Overloaded Satellites (The Flip-Flop Effect) .......................................................................... 120 6.9 Satellite Applications: ............................................................................................................. 122 © Dan Linstedt 2010, all rights reserved http://danLinstedt.com How to Build an Effective Data Vault Model Page 6 of 152 6.10 Effectivity Satellites ................................................................................................................. 122 6.11 Record Tracking Satellites ...................................................................................................... 123 6.12 Status Tracking Satellites ....................................................................................................... 126 6.13 Computed Satellites (Quality Generated) .............................................................................. 127 6.14 Multiple Active Satellite Rows ................................................................................................ 128 6.15 Splitting Satellites ................................................................................................................... 130 6.16 Consolidating Satellites .......................................................................................................... 134 7.0 Query Assistant Tables ............................................................................................................... 138 7.1 Point in Time Tables ................................................................................................................ 138 7.2 Bridge Tables ........................................................................................................................... 141 8.0 Reference Tables ........................................................................................................................ 144 8.1 No-History Reference Tables .................................................................................................. 145 8.2 History Based Reference Tables ............................................................................................ 145 8.3 Code and Descriptions............................................................................................................ 145 8.4 National Drug Codes ............................................................................................................... 145 8.5 ICD9 Diagnosis Codes ............................................................................................................ 145 8.6 Calendars (Financial and Gregorian) ..................................................................................... 145 9.0 Ontologies, Metadata, and Enterprise Data Warehousing ....................................................... 146 9.1 Introduction to an Ontology .................................................................................................... 146 9.2 Ontological Importance in an EDW ........................................................................................ 147 9.3 Maximizing Unstructured Data with a Data Vault ................................................................. 148 9.4 Building Dynamic BI Releases ................................................................................................ 148 9.5 Maintaining, Managing and Governing Ontological Metadata ............................................. 148 9.6 Data Vault Hierarchies, Modeling and Managing ................................................................. 148 10.0 Additional Data Vault Thoughts .................................................................................................. 149 10.1 Introduction to a Business Based Data Vault ....................................................................... 149 10.2 Metadata and the Data Vault Model ..................................................................................... 151 © Dan Linstedt 2010, all rights reserved http://danLinstedt.com How to Build an Effective Data Vault Model Page 7 of 152 10.3 Master Data and the Data Vault Model ................................................................................. 152 10.4 Introduction to Load Metrics and the Data Vault Model ...................................................... 152 10.5 Growth Patterns and the Architecture ................................................................................... 152 10.6 Future Look: Ontology and Dynamic BI Solutions ................................................................. 152 10.7 Dynamic Data Warehousing, an Introduction ....................................................................... 152 © Dan Linstedt 2010, all rights reserved http://danLinstedt.com How to Build an Effective Data Vault Model Page 8 of 152 Table of Figures Figure 1-1: Example E-R Diagram (Elmasri/Navathe) ............................................................................ 13 Figure 1-2: Crows Foot and Arrow Notation Example ............................................................................ 15 Figure 1-3: Small Example: Ontology for Vehicle.................................................................................... 16 Figure 1-4: Example Abbreviations and Naming Conventions .............................................................. 18 Figure 1-5: Example Data Vault ............................................................................................................... 20 Figure 1-6: Flexibility of Adapting to Change .......................................................................................... 22 Figure 1-7: 3rd Normal Form Product and Supplier Example ................................................................ 23 Figure 1-8: Applied Set Theory for the Data Vault .................................................................................. 25 Figure 1-9: Parallel Computing Simplified .............................................................................................. 27 Figure 1-10: Logical Data Vault Hyper Cube........................................................................................... 28 Figure 1-11: Physical Data Vault Layout (Starting point) ....................................................................... 29 Figure 1-12: Physical Data Vault Layout (Partitioned) ........................................................................... 29 Figure 2-1: Enterprise BI Architectural Components ............................................................................. 32 Figure 3-1: Time Series Batch Loaded Data ........................................................................................... 38 Figure 3-2 Real-Time Arrival, Data Geology ............................................................................................ 39 Figure 3-3: Load Date Time Stamp and Record Source ........................................................................ 42 Figure 3-4: Example Load Date Time Stamp Data ................................................................................. 42 Figure 3-5: Load End Date Computations, Descriptive Data Life Cycle ................................................ 44 Figure 3-6: Structures containing Last Seen Dates ............................................................................... 45 Figure 3-7: Scan all data in EDW............................................................................................................. 46 Figure 3-8: Reduced Scan Set after Applying Last Seen Date .............................................................. 48 Figure 4-1: Business Key Changing Across Line of Business ................................................................ 52 Figure 4-2: Hub Example Images ............................................................................................................ 53 Figure 4-3: Hub Example Data ................................................................................................................ 54 Figure 4-4: Smart Key Example ............................................................................................................... 58 Figure 4-5: Composite Business Key Hub Example ............................................................................... 60 Figure 4-6: Example Hub Entity Structure .............................................................................................. 61 Figure 4-7: Example Hubs from Adventure Works 2008 ....................................................................... 62 Figure 4-8: Example of National Drug Code Data Vault ......................................................................... 63 Figure 4-9: Dependent Child Relationship Modeling ............................................................................. 64 Figure 4-10: Typical Hub Row Sizing ....................................................................................................... 70 Figure 5-1: Relationship Changes Over Time ......................................................................................... 74 Figure 5-2: Link Table Structure Housing Multiple Relationships ......................................................... 75 © Dan Linstedt 2010, all rights reserved http://danLinstedt.com How to Build an Effective Data Vault Model Page 9 of 152 Figure 5-3: Starting Model Before Changes ........................................................................................... 76 Figure 5-4: Data Vault After Modification ............................................................................................... 77 Figure 5-5: Additional Data Vault Model - More Changes...................................................................... 78 Figure 5-6: Global Data Vault Linking ..................................................................................................... 79 Figure 5-7: Uncovering Fact Table Grain ................................................................................................ 80 Figure 5-8: Data Vault Grain, Representing Star Schema ..................................................................... 80 Figure 5-9: Traditional Data Vault Storage Layout ................................................................................. 83 Figure 5-10: Performance Physical Split Version 1 ................................................................................ 84 Figure 5-11: Performance Physical Split Version 2 ................................................................................ 85 Figure 5-12: Performance Physical Split Version 3 ................................................................................ 85 Figure 5-13: Sample Link Structure ........................................................................................................ 86 Figure 5-14: Example Driving Key for Link ............................................................................................. 87 Figure 5-15: Example of Link Satellite with Driving Key ........................................................................ 87 Figure 5-16: Insert to Link/Sat Based on Driving Key ........................................................................... 88 Figure 5-17: Link Driving Key/Satellite End Dated ................................................................................ 88 Figure 5-18: Example of Link Tables From Adventure Works 2008 Data Vault................................... 89 Figure 5-19: Example of Link To Link Relationships .............................................................................. 92 Figure 5-20: Step 1, Flattening Link-To-Link Hierarchy ......................................................................... 94 Figure 5-21: Step 2, Flattening Link-To-Link Hierarchy ......................................................................... 94 Figure 5-22: Example Organization Structure ........................................................................................ 96 Figure 5-23: Hierarchical Link for Offices ............................................................................................... 97 Figure 5-24: Example Hierarchical Link of Employees .......................................................................... 97 Figure 5-25: Same-As Link Example, Business Data ............................................................................. 98 Figure 5-26: Same-As Link Data Vault Model......................................................................................... 99 Figure 5-27: Incorrect Link with Begin/End Date ................................................................................... 99 Figure 5-28: Begin & End Dates in Links .............................................................................................. 101 Figure 5-29: Example of Poorly Constructed Link ................................................................................ 101 Figure 5-30: Satellite Effectivity on a Link ............................................................................................ 102 Figure 5-31: Transactional Link Example ............................................................................................. 103 Figure 5-32: Transactional Link, No Satellite ....................................................................................... 104 Figure 5-33: Example of Computed Aggregate Link ............................................................................ 105 Figure 6-1: Example Satellite Entity ...................................................................................................... 112 Figure 6-2: Example Satellite Entities ................................................................................................... 113 Figure 6-3: Satellites Split by Type Of Data Option 1 ........................................................................... 115 Figure 6-4: Satellite Data Rate of Change Example ............................................................................. 116 Figure 6-5: Satellite Split by Rate Of Change ....................................................................................... 117 © Dan Linstedt 2010, all rights reserved http://danLinstedt.com How to Build an Effective Data Vault Model Page 10 of 152 Figure 6-6: Customer Satellites Split by Source System ..................................................................... 119 Figure 6-7: Satellite Overload from Many Sources .............................................................................. 121 Figure 6-8: Satellite Effectivity ............................................................................................................... 122 Figure 6-9: Denormalized Record Source Tracking Satellite ............................................................... 124 Figure 6-10:Normalized Record Source Tracking Satellite.................................................................. 125 Figure 6-11: Status Tracking Satellite .................................................................................................. 127 Figure 6-12: Multi-Active Satellite Rows ............................................................................................... 129 Figure 6-13: Multi-Active Satellite Row Data ........................................................................................ 130 Figure 6-14: Multi-Active Satellite with Business Sub-Sequence........................................................ 130 Figure 6-15: Step 1: Identify Satellite Split Columns ........................................................................... 131 Figure 6-16: Step 2: Split Satellite Columns, Design New tables ....................................................... 132 Figure 6-17: Step 3: Copy Data From Original to New Satellites ........................................................ 132 Figure 6-18: Step 4: Eliminate Duplicates ............................................................................................ 133 Figure 6-19: Step 4: Alternate Elimination of Duplicates .................................................................... 133 Figure 6-20: Step 5: End Dates Adjusted After Satellite Split ............................................................. 134 Figure 6-21: Consolidating Satellite Data ............................................................................................. 135 Figure 6-22: Load End Dates Calculated in Consolidated Satellite .................................................... 137 Figure 7-1: Structure of PIT Table ......................................................................................................... 138 Figure 7-2: PIT Table Architecture Overview ......................................................................................... 139 Figure 7-3: Example PIT Table with Snapshot Dates ........................................................................... 140 Figure 7-4: Bridge Table Structure ........................................................................................................ 141 Figure 7-5: Bridge Table Architectural Overview .................................................................................. 142 Figure 7-6: Bridge Table Example Data ................................................................................................ 143 © Dan Linstedt 2010, all rights reserved http://danLinstedt.com