DWCMM: The Data Warehouse Capability Maturity Model
Transcription
DWCMM: The Data Warehouse Capability Maturity Model
DWCMM: The Data Warehouse Capability Maturity Model Thesis for the Degree of Master of Science Author: Catalina Sacu Student Number: 3305260 Thesis Number: INF/SCR-09-86 Institute of Information and Computer Sciences Utrecht University, The Netherlands Supervisors: Utrecht University: dr. Marco Spruit 1st dr. ir. Johan Versendaal 2nd Inergy: Frank Habers Abstract Data Warehouses (DWs) and Business Intelligence (BI) have been part of a very dynamic and popular field of research in the last years as they help organizations in making better decisions and increasing their profitability. Unfortunately, many DW/BI solutions fail to bring the desired results, and therefore, it is important to have an overview of the critical success factors. However, this is usually very difficult to do as a DW/BI project is a very complex endeavour. This research offers a solution to this problem by creating a Data Warehouse Capability Maturity Model (DWCMM) focused on the technical and organizational aspects involved in developing a DW environment. Based on an extensive literature study, the DWCMM consists of a maturity matrix and a maturity assessment questionnaire that analyze the main categories and sub-categories necessary when implementing a DW/BI solution. The model and its associated questionnaire can be used to help organizations assess their current DW solution and provide them with guidelines for future improvements. In order to validate and enrich the theory created, the DWCMM was evaluated empirically through five expert interviews and four case studies. Based on the evaluation results, some minor changes were made to improve the model. The main conclusion of this research is that the DWCMM can be successfully applied in practice and organizations can use it as a starting point for improving their DW/BI solution. II Acknowledgements Utrecht, August 2010 I would like to use this opportunity to thank some people who made a significant contribution to this research. First, I would like to express my gratitude to my supervisors at Utrecht University, dr. Marco Spruit and dr. ir. Johan Versendaal, for their professional guidance and constructive feedback during the project. Second, I would like to thank my external supervisor Frank Habers for offering me the opportunity to perform my research during an internship at Inergy. He has been extremely helpful in providing prompt access to all the resources I needed for the research and in offering me guidance and advice every time I needed it. He has also helped me arrange several expert interviews for validating my model. I would also like to thank other colleagues from Inergy, especially Rick Tijsen, for their support and enthusiasm during my research there. Furthermore, I would also like to express my gratitude to all the experts and respondents who made room for me in their busy schedules and accepted to review my model and fill in the assessment questionnaire, respectively. Last, but not least, I would like to thank my parents, my boyfriend and my friends for their constant love, support and welcome distractions when needed. Catalina Sacu III Table of Contents 1 2 Introduction .......................................................................................................................................... 1 1.1 Problem Definition & Research Motivation ................................................................................. 1 1.2 Research Questions ....................................................................................................................... 2 1.3 Research Approach ....................................................................................................................... 3 What are Data Warehousing and Business Intelligence? ..................................................................... 5 2.1 The Intelligent Organization ............................................................................................................... 5 2.2 Data-Information-Knowledge............................................................................................................. 6 2.3 The Origins of DW/BI ........................................................................................................................ 8 2.4 DW/BI Definition ............................................................................................................................... 9 2.5 DW/BI Business Value ..................................................................................................................... 11 2.6 Inmon vs. Kimball ............................................................................................................................ 12 2.7 Maturity Modelling........................................................................................................................... 13 2.7.1 Data Warehouse Capability Maturity Model (DWCMM) Forerunners .............................. 14 2.8 Summary........................................................................................................................................... 16 3 A Data Warehouse Capability Maturity Model ....................................................................................... 17 3.1 From Nolan‘s Stages of Growth to the Data Warehouse Capability Maturity Model ...................... 17 3.2 DWCMM .......................................................................................................................................... 18 4 DW Technical Solution Maturity ............................................................................................................. 24 4.1 General Architecture and Infrastructure...................................................................................... 24 4.1.1 What is Architecture?.......................................................................................................... 24 4.1.2 Conceptual Architecture and Its Layers .............................................................................. 25 4.1.3 Infrastructure ....................................................................................................................... 27 4.1.4 Metadata .............................................................................................................................. 28 4.1.5 Security ............................................................................................................................... 30 4.1.6 Business Rules for DW ....................................................................................................... 31 4.1.7 DW Performance Tuning .................................................................................................... 32 4.1.8 DW Update Frequency........................................................................................................ 33 4.2 Data Modelling ........................................................................................................................... 34 4.2.1 Data Modelling Definition and Characteristics................................................................... 34 4.2.2 Data Models Classifications (Data Models Levels and Techniques) .................................. 34 4.2.3 Dimensional Modelling....................................................................................................... 37 IV 4.2.4 Data Modelling Tool ........................................................................................................... 41 4.2.5 Data Modelling Standards ................................................................................................... 41 4.2.6 Data Modelling Metadata Management .............................................................................. 42 4.3 Extract – Transform – Load (ETL) ............................................................................................. 43 4.3.1 What is ETL? ...................................................................................................................... 43 4.3.2 Extract ................................................................................................................................. 43 4.3.3 Transform............................................................................................................................ 44 4.3.4 Load .................................................................................................................................... 46 4.3.5 Manage ................................................................................................................................ 47 4.3.6 ETL Tools ........................................................................................................................... 48 4.3.7 ETL Metadata Management................................................................................................ 49 4.3.8 ETL Standards..................................................................................................................... 49 4.4 BI Applications ........................................................................................................................... 50 4.4.1 What are BI Applications? .................................................................................................. 50 4.4.2 Types of BI Applications .................................................................................................... 50 4.4.3 BI Applications Delivery Method ....................................................................................... 54 4.4.4 BI Applications Tools ......................................................................................................... 54 4.4.5 BI Applications Metadata Management.............................................................................. 55 4.4.6 BI Applications Standards .................................................................................................. 55 4.5 Summary........................................................................................................................................... 56 5 DW Organization and Processes .............................................................................................................. 57 5.1 DW Development Processes............................................................................................................. 57 5.1.1 DW Development Phases .......................................................................................................... 58 5.1.2 The DW/BI Sponsor .................................................................................................................. 64 5.1.3 The DW Project Team and Roles .............................................................................................. 65 5.1.4 DW Quality Management ......................................................................................................... 66 5.1.5 Knowledge Management........................................................................................................... 66 5.2 DW Service Processes ...................................................................................................................... 67 5.2.1 From Maintenance and Monitoring to Providing a Service ...................................................... 68 5.2.2 IT Service Frameworks ............................................................................................................. 68 5.2.3 DW Service Components .......................................................................................................... 71 5.3 Summary........................................................................................................................................... 77 6 Evaluation of the DWCMM..................................................................................................................... 78 V 6.1 Expert Validation.............................................................................................................................. 78 6.1.1 Expert Review Results and Changes ......................................................................................... 78 6.2 Multiple Case Studies ....................................................................................................................... 81 6.2.1 Case Study Approach ................................................................................................................ 82 6.2.2 Case Overview .......................................................................................................................... 84 6.2.3 Case Studies Results and Conclusions ...................................................................................... 90 6.3 Summary........................................................................................................................................... 93 7 Conclusions and Further Research ........................................................................................................... 94 7.1 Conclusions ...................................................................................................................................... 94 7.2 Limitations and Further Research ..................................................................................................... 95 8 References ................................................................................................................................................ 97 Appendix A: DW Detailed Maturity Matrix ............................................................................................. 105 Appendix B: The DW Maturity Assessment Questionnaire (Final Version)............................................ 112 Appendix C: DW Maturity Assessment Questionnaire (Redefined Version) ........................................... 124 Appendix D: Expert Interview Protocol ................................................................................................... 134 Appendix E: Case Study Interview Protocol ............................................................................................ 136 Appendix F: Case Study Feedback Template ........................................................................................... 138 Appendix G: Paper.................................................................................................................................... 139 VI List of Figures Figure 1: IS Research Framework (adapted from (Hevner et al., 2004))...................................................... 4 Figure 2: Information Gap (adapted from (Tijsen et al., 2009)). .................................................................. 5 Figure 3: The BI Cycle (adapted from (Thomas, 2001)). ............................................................................. 6 Figure 4: The Data-Information-Knowledge-Wisdom Hierarchy (adapted from (Hey, 2004)). .................. 7 Figure 5: Data Warehouse Capability Maturity Model (DWCMM). ......................................................... 18 Figure 6: DWCMM Condensed Maturity Matrix. ...................................................................................... 22 Figure 7: A Typical DW Architecture (adapted from (Chaudhuri & Dayal, 1997)). ................................. 25 Figure 8: DW Design Process Levels (adapted from (Husemann et al., 2000)). ........................................ 35 Figure 9: Star Schema vs. Cube (adapted from (Chauduri & Dayal, 1997)). ............................................. 38 Figure 10: Case Study Method (adapted from (Yin, 2009)). ...................................................................... 83 Figure 11: Alignment Between Organization A‘s Maturity Scores. ........................................................... 88 Figure 12: Alignment Between Organization B‘s Maturity Scores. ........................................................... 88 Figure 13: Alignment Between Organization C‘s Maturity Scores. ........................................................... 88 Figure 14: Alignment Between Organization D‘s Maturity Scores. ........................................................... 89 Figure 15: Benchmarking for Organization A. ........................................................................................... 90 VII List of Tables Table 1: Differences between operational databases and DWs (adapted from (Breitner, 1997)). .............. 10 Table 2: Comparison of Essential Features of Inmon‘s and Kimball‘s Data Warehouse Models (Breslin, 2004). .......................................................................................................................................................... 13 Table 3: Overview of Maturity Models. ..................................................................................................... 14 Table 4: DW General Questions. ................................................................................................................ 20 Table 5: DW Architecture Maturity Assessment Questions. ...................................................................... 27 Table 6: Infrastructure Maturity Assessment Questions. ............................................................................ 28 Table 7: Business Metadata vs. Technical Metadata (adapted from (Moss & Atre, 2003)). ...................... 28 Table 8: Metadata Management Maturity Assessment Question. .............................................................. 30 Table 9: Security Maturity Assessment Question. ...................................................................................... 31 Table 10: Business Rules Maturity Assessment Questions. ....................................................................... 32 Table 11: Performance Tuning Maturity Assessment Question. ................................................................ 33 Table 12: Update Frequency Maturity Assessment Question. .................................................................... 34 Table 13: Data Model Synchronization and Levels Maturity Assessment Questions. ............................... 37 Table 14: Dimensional Modelling Maturity Assessment Questions........................................................... 40 Table 15: Data Modelling Tool Maturity Assessment Questions. .............................................................. 41 Table 16: Data Modelling Standards Maturity Assessment Questions. ...................................................... 42 Table 17: Data Modelling Metadata Management Maturity Assessment Questions. ................................. 43 Table 18: Data Quality Maturity Assessment Questions. ........................................................................... 46 Table 19: ETL Complexity Maturity Assessment Question. ...................................................................... 47 Table 20: ETL Management and Monitoring Maturity Assessment Question. .......................................... 48 Table 21: ETL Tools Maturity Assessment Question. ................................................................................ 49 Table 22: ETL Metadata Management Maturity Assessment Question. .................................................... 49 Table 23: ETL Standards Maturity Assessment Questions......................................................................... 50 Table 21: Table 24: BI Applications Maturity Assessment Question......................................................... 53 Table 25: BI Applications Delivery Method Maturity Assessment Question. ........................................... 54 Table 26: BI Tools Maturity Assessment Question. ................................................................................... 55 Table 27: BI Applications Metadata Management Maturity Assessment Question. .................................. 55 Table 28: BI Applications Standards Maturity Assessment Questions. ..................................................... 56 Table 29: DW Development Processes General Maturity Assessment Question. ...................................... 58 Table 30: Project Management Maturity Assessment Question. ................................................................ 60 Table 31: Requirements Definition Maturity Assessment Question........................................................... 61 VIII Table 32: Testing and Acceptance Maturity Assessment Question. ........................................................... 63 Table 33: Development/ Testing/ Acceptance/ Production Maturity Assessment Questions. .................... 64 Table 34: DW/BI Sponsorship Maturity Assessment Question.................................................................. 65 Table 35: DW Project Team and Roles Maturity Assessment Question. ................................................... 65 Table 36: DW Quality Management Maturity Assessment Question. ........................................................ 66 Table 37: Knowledge Management Maturity Assessment Question. ......................................................... 67 Table 38: Overview of IT Service Frameworks.......................................................................................... 69 Table 39: ITIL‘s Core Components (adapted from (Cater-Steel, 2006)).................................................... 69 Table 40: IT Service CMM‘s Key Process Areas (adapted from (Paulk et al., 1995)). .............................. 71 Table 41: Maintenance and Monitoring Maturity Assessment Question. ................................................... 72 Table 42: Service Quality Management Maturity Assessment Question. .................................................. 73 Table 43: Service Level Management Maturity Assessment Question. ..................................................... 74 Table 44: Incident Management Maturity Assessment Question. .............................................................. 74 Table 45: Change Management Maturity Assessment Question. ............................................................... 75 Table 46: Incident Management Maturity Assessment Question. .............................................................. 76 Table 47: Availability Management Maturity Assessment Question. ........................................................ 76 Table 48: Release Management Maturity Assessment Question. ............................................................... 77 Table 49: Expert Overview. ........................................................................................................................ 78 Table 50: Rephrased or Changed Questions and Answers. ........................................................................ 80 Table 51: Case and Respondent Overview. ................................................................................................ 84 Table 52: Technologies Usage Overview. .................................................................................................. 85 Table 53: Organizations‘ Maturity Scores. ................................................................................................. 87 Table 54: Maturity Scores Analysis. ........................................................................................................... 89 IX DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu 1 Introduction In nowadays economy, organizations are part of a very dynamic environment due to continuous changing conditions and relationships. At the same time, the external environment is an important source of information (Aldrich & Mindlin, 1978) that organizations will have to gather and process very rapidly in order to maintain their competitive advantage (Choo, 1995). Moreover, as (Kaye, 1996) notes, ―organizations must collect, process, use, and communicate information, both external and internal, in order to plan, operate and take decisions.‖ The ongoing request for profits, increasing competition and demanding customers, all require organizations to take the best decisions as fast as possible (Vitt et al., 2002). Hence, in order to survive, companies have to adapt themselves to this new information environment by shortening the period of time between the moment of acquiring the information and getting the right results. One of the solutions that can narrow down this time gap and improve the decision making process is the implementation of Data Warehouses and Business Intelligence (BI) applications. 1.1 Problem Definition & Research Motivation The most fundamental aspect in a particular organization in today‘s highly globalized market is the critical decision making capacity of the management, which influences the successful running of business operations. Hence, it is very important for organizations to manage both transaction- and non-transactionoriented information for making timely decisions and react to changing business circumstances (AbuSaleem, 2005). Moreover, in the last couple of years, enterprises have changed their business focus towards customer orientation to remain competitive. Accordingly, maintaining relationships with clients and managing their data have appeared as top issues to be considered by global companies. Also, many researchers have reported that the amount of data in a given organization doubles every five years (AbuSaleem, 2005). In order to process this high amount of data and make the best decisions as fast as possible, the information must be reliable, accurate, real-time and easy-to-access. For such information, all the enterprise-related data should be integrated and appropriately analyzed from a multi-dimensional point of view. The solution for this is a data warehouse (DW). Over the years, DWs have become one of the fundamentals of the information systems that are used to support the decision making initiatives. The new era of enterprise-wide systems integration and the growing needs towards BI both accelerate the development of DW solutions (AbuAli & Abu-Addose, 2010). Most large companies have already established DW systems as a component of the information systems landscape. According to (Gartner, 2007), BI and DWs are at the forefront of the use of IT to support management decision-making. DWs can be thought of as the large-scale data infrastructure for decision support. BI can be viewed as the data analysis and presentation layer that sits between the DW and the executive decision-makers (Arnott & Pervan, 2005). In this way, the DW/BI solutions can transform raw data into information and then into knowledge. However, a DW is not only a software package. The adoption of DW technology requires massive capital expenditure and a certain deal of implementation time. DW projects are hence very expensive, timeconsuming and risky undertakings compared with other information technology initiatives, as cited by prior researchers (Wixom & Watson, 2001; Hwang et al., 2004; Mukherjee & D‘Souza, 2003; Solomon, -1- DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu 2005). The typical project costs over $1 million in the first year alone (AbuAli & Abu-Addose, 2010). And, it is estimated that one-half to two-thirds of all initial DW efforts fail (Hayen et al., 2007). Moreover, (Gartner, 2007) estimates that more than fifty percent of DW projects have limited acceptance or fail. Therefore, it is crucial to have a thorough understanding of the critical success factors and variables that determine the efficient implementation of a DW solution. These factors can refer to the development of the DW/BI solution or to the usage and adoption of BI. In this master thesis, we will focus on the former as we consider that it represents the foundation for a solid DW solution that can have a high rate of usage and adoption. First, it is critical to properly design and implement the databases that lie at the heart of your DW. The right architecture and design can ensure performance today and scalability tomorrow. Second, all components of the data warehouse solution (e.g.: data repository, infrastructure, user interface) must be designed to work together in a flexible, easyto-use way. A third task is to develop a consistent data model and establish what and how source data will be extracted. In addition to these factors, the DW needs to be created and developed quickly and efficiently so that the organization can gain the business benefits as soon as possible (AbuAli & AbuAddose, 2010). As can be seen, a DW project can unquestionably be complex and challenging. This is why it is important to gain some insight into the technical and organizational variables that determine the successful development of a DW solution and assess these variables. Therefore, it is the main goal of this master thesis to do this by creating a Data Warehouse Capability Maturity Model (DWCMM) and answering the following main research question: How can the maturity of a company’s data warehouse technical aspects be assessed and acted upon? 1.2 Research Questions As stated before, the main goal of this research is to develop a DWCMM that will help organizations assess their current DW solution from both technical and, organizational and processes points of view. In order to do this and address the main research question, several sub-questions have been formulated and have to be answered. First, we would like to give an overview on the field of BI and DW in order to have a better understanding of the context of BI/DW by answering the first sub-question: What are BI and DWs? Then, we will elaborate on the second important element from our model, the maturity part. We will identify the main characteristics of maturity models and the most representative maturity models for our research by answering the following sub-question: What do maturity models represent and which are the most representative ones for our research? Once we have a general overview on BI/DW and maturity modelling, we can continue with presenting the stages of the DWCMM and the main characteristics for each stage. We will in this way answer the next two sub-questions: What are the most important variables and characteristics to be considered when building a data warehouse? and How can we design a capability maturity model for a data warehouse assessment? Having created and presented the model, we can now apply it as an assessment method at different organizations and see whether it is a viable source of information and which changes need to be done. This will answer the last sub-question: To which extent does the data warehouse capability maturity model result in a successful assessment and guideline for the analyzed organizations? -2- DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu To summarize, in order to deliver a valid DWCMM, our research aims to answer the following research questions: Main question: How can the maturity of a company’s data warehouse technical aspects be assessed and acted upon? Sub-questions: 1) What are business intelligence and data warehouses? 2) What do maturity models represent and which are the most representative ones for our research? 3) What are the most important variables and characteristics to be considered when building a data warehouse? 4) How can we design a capability maturity model for a data warehouse assessment? 5) To which extent does the data warehouse capability maturity model result in a successful assessment and guideline for the analyzed organizations? 1.3 Research Approach Information systems (IS) are implemented within an organization for the purpose of improving the efficiency and effectiveness of that organization. Hence, the main goal of research in this field is to create ―knowledge that enables the application of information technology for managerial and organizational purposes‖ (Hevner et al., 2004). According to (Hevner et al., 2004), mainly two paradigms characterize the research in the IS discipline: behavioural science and design science. Behavioural science aims to develop and verify theories that explain or predict human or organizational behaviour. Design science paradigm on the other hand seeks to extend the boundaries of human and organizational capabilities by creating new and innovative artifacts. As discussed above, the main goal of our research is to develop a DWCMM that depicts the maturity stages of a DW project, which can be used to assist organizations in identifying their current maturity stage and evolving to a higher level. For this purpose, a design research approach is used as its main philosophy is to generate scientific knowledge by building and validating a previously designed artifact (Hevner et al., 2004). In this research, the artifact is the DWCMM, which is developed according to the seven design science guidelines stated by (Hevner et al., 2004) and to the five steps in developing design research artifacts as described by (Vaishnavi & Kuechler, 2008): awareness of problem – it can come from multiple research sources. In our case, awareness of the problem area was raised in discussions with DW/BI practitioners and literature study on data warehousing and maturity modelling. suggestion – it is essentially a creative step wherein new functionality is envisioned based on a novel configuration of either existing or new elements. Before the actual development of the DWCMM, we have done a thorough literature study, proposed ideas and received suggestions from experts regarding the components of the model and the relationship between them. We have also designed an outline framework of the model. development – it involves the actual implementation of the model using various techniques depending on the artifact to be constructed. This stage is highly related to the previous one. In our -3- DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu research, it involves the actual creation and presentation of the DWCMM with all its analyzed categories and maturity stages. evaluation – it consists of evaluating the constructed artifact according to criteria that are always implicit and frequently made explicit in the awareness step. According to (Hevner et al., 2004), case studies have proved to be an appropriate evaluation method in design research. Therefore, the validation phase in our case has consisted of five expert interviews and four case studies. We have received a lot of feedback and suggestions from the expert interviews for improving the model. Then, once we redefined the model, we continued its validation within four organizations following (Yin, 2009) case study approach. conclusion – this phase is the finale of a specific research effort when results are summarized, conclusions are drawn and suggestions for further research are discussed. The way in which our research fits within the IS Research Framework designed by (Hevner et al., 2004) can be depicted in the figure below. Figure 1: IS Research Framework (adapted from (Hevner et al., 2004)). As we adopted a design science approach for our study, the following structure was chosen for this thesis document. We will first provide some background information on the main concepts of the study in chapter 2. We will then present an overview of the design artifacts of this research in chapter 3. Chapters 4 and 5 offer a detailed analysis of the main components of the model we developed. In chapter 6, results are presented for two evaluation activities of the model – expert interviews and test case studies. Finally, in chapter 7, conclusions are drawn and limitations of this research are discussed along with some points on further research. -4- DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu 2 What are Data Warehousing and Business Intelligence? In this section, the key background concepts of the study – data warehousing, business intelligence and maturity modelling – will be summarized, and related work will be explored. 2.1 The Intelligent Organization The ubiquitous complexity, speed of change and uncertainty of the present economical environment determine organizations to face enormous challenges (Schwaninger, 2001). However, for a long time organizations worked in closed settings and saw themselves as fortresses with walls and boundaries that limited their activities and influence (Choo, 1995). Nowadays, this static representation of organizations has become a relic. Today‘s organizations are complex, open systems that cannot function isolated from the surrounding dynamic environment. As already discussed in the introduction, the external environment is an important source of information (Aldrich & Mindlin, 1978) that organizations will have to gather and process very rapidly in order to maintain their competitive advantage (Choo, 1995). However, nowadays, information is being generated at an ever-increasing rate, which makes it very difficult for companies to manage it. Decision makers often find themselves into an information overload problem, being very hard for them to identify the right information for decision purposes in the available time (O'Reilly, 1980). This causes a so-called ―information gap‖ due to the need for fast decision making on one hand, and the longer time needed to acquire the right information on the other hand (Tijsen et al., 2009). This requires decision makers to utilize information management systems and analysis for supporting their business decisions (Turban et al., 2007). This is where BI/DW can help. As depicted in figure 2, BI helps narrowing down the information gap by shortening the required time to obtain relevant data and by efficient utilization of available time to apply information (Tijsen et al., 2009). Figure 2: Information Gap (adapted from (Tijsen et al., 2009)). In this way, organizations are not only information consumers, but also creators of information and knowledge (Choo, 1995). This can help them understand and adapt very fast to the changes in their business environment and maintain their competitive advantage. According to (Porter, 1985), an effective competitive strategy requires a deep understanding of the relationship between the firm and its environment. And this can be obtained by applying DW/BI as a competitive differentiator. This refers to using DW/BI not only to get to know your own organization and customers, but also your competitors -5- DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu and getting competitive intelligence. As J. D. Rockefeller once said, ―Next to knowing all about your own business, the best thing to know about is the other fellow‘s business.‖ As can be seen in figure 3, there is a whole BI cycle which starts with planning based on corporate needs; then ethically collecting reliable information from valid sources; then analyzing the data to form intelligence in conjunction with strategic planning and market research. Finally, in order for the intelligence to have value, it must be disseminated in a form that is clear and understandable (Thomas, 2001). BI is a rigorous process where sources of information, including published information as well as human sources, play a vital role. The BI process was working long before the development of computers and knowledge database software, but those tools have allowed BI to have much greater value in the decision-making process and in the way organizations sustain their competitive advantage. Figure 3: The BI Cycle (adapted from (Thomas, 2001)). In this way, organizations can become ―intelligent‖ and stay ahead of change which according to (Drucker, 1999) is the only way of coping with change effectively. The main characteristics that distinguish intelligent organizations are (Schwaninger, 2001): to adapt to change as a function of external stimuli to influence and shape their environment to find a new milieu, if necessary, or to reconfigure themselves virtuously with their environment to make a positive net contribution to the viability and development of the larger environments into which they are embedded. 2.2 Data-Information-Knowledge As the terms data, information and knowledge have been used in the previous paragraphs and will be used further in this thesis, we would like to give a short overview on each of the terms and see the differences between them. In everyday‘s writing the distinction between data and information is not clearly made, and they are often used interchangeably; the same applies for information and knowledge. However, many scientists claim that data, information and knowledge are part of a sequential order (Zins, 2007). Data are the raw material for information, and information is the raw material for knowledge. A well -6- DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu known representation of the relationships between the three concepts is the ―DIKW (Data, Information, Knowledge, Wisdom) Hierarchy‖. One of the versions of the hierarchy depicts it as a linear chain (Hey, 2004) as can be seen in figure 4. Not all versions of the DIKW model reference all four components (earlier versions not including data, later versions omitting or downplaying wisdom), but the main idea is the same. We will only elaborate on the first three concepts here as they are the most used and acknowledged ones. Figure 4: The Data-Information-Knowledge-Wisdom Hierarchy (adapted from (Hey, 2004)). The distinctions and relationships between data, information and knowledge will be further elaborated in the remainder of the paragraph. Data Data has experienced a variety of definitions, largely depending on the context of its use. For example, Information Science defines data as unprocessed information and other domains leave data as a representation of objective facts (Hey, 2004). According to (Ackoff, 1989), data is raw. It simply exists and has no significance beyond its existence. Data is acquired from the external world through our senses in the form of signals and signs. Much neural processing has to take place between the reception of a stimulus and its sensing as data by an agent (Kuhn, 1974). In an organizational context, data is usually described as structured records of transactions which are stored in a technology system for different departments such as finance, accounting, sales, etc. (Davenport & Prusak, 2000). Data says nothing about its own importance or irrelevance; nor does it provide judgement, interpretation or sustainable basis of action. However, it is important to organizations because it is essential raw material for the creation of information. Information Information is data that has been processed, interpreted and given meaning (useful or not) by way of relational connection. It provides answers to ―who‖, ―what‖, ―where‖ and ―when‖ questions. In computer parlance, a relational database makes information from the data stored within it (Ackoff, 1989). According to (Boisot & Canals, 2004), information constitutes those significant regularities residing in -7- DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu the data that agents attempt to extract from it. In order to interpret data into information, a system needs knowledge. The meaning of terms may be different for different people, and it is our knowledge about particular domains - and the world in general - that enables us to get meaning out of these data strings. Hence, for data to become information, interpretation and elaboration processes are required (Aamodt & Nygård, 1995). Computers can assist in transforming data into information, but cannot replace humans. Today‘s managers believe that having more information technology will not necessary improve the state of information. To sum up, we could say that ―information is data that makes a difference‖ (Davenport & Prusak, 2000). Knowledge Knowledge is broader, deeper and richer than data and information. It is the appropriate collection of information such that it is placed in a certain context and its intent is to be useful. Knowledge is a deterministic process and it provides answers to ―how‖ questions (Ackoff, 1989). According to (Davenport & Prusak, 2000), knowledge is ―a fluid mix of framed experience, values, contextual information and expert insight that provides a framework for evaluating and incorporating new experiences and information‖. As can be seen, knowledge is not neat or simple to obtain. It can be considered both process and stock, and its creation takes place within and between people. Knowledge allows us to act more effectively than information and data as it gives the opportunity of predicting future outcomes. We could say that knowledge in practical sense is ―value added information‖ (Jashapara, 2004) which helps us make better and faster decisions. 2.3 The Origins of DW/BI While the term BI is new (since the early 1990s), computer-based BI systems go back, in one form or another, for more than forty years (Gray & Negash, 2003). Approaches to BI have thus evolved over decades of technological innovation and management experience with IT. The history of BI systems begins in the mid-1960s when researchers began systematically studying the use of computerized quantitative models to assist in decision making and planning (Power, 2003). (Ferguson & Jones, 1969) reported the first experimental study using a computer aided decision system by investigating a production scheduling application. At the same time, organizations were beginning to computerize many of the operational aspects of their business. Information systems were developed to perform operational applications such as order processing, billing, inventory control and payroll (Arnott & Pervan, 2005). Once the importance of the data from the operational processes was acknowledged for the decision-making process, the first Management Information System (MIS) was developed. Another turning point in this field was Morton‘s work who, together with Gorry, defined the concept of ―Decision Support Systems‖ (DSS) (Gorry & Morton, 1971). They constructed a framework for improving MIS and conceived DSS as systems that support any managerial activity in decisions that are semistructured or unstructured (Arnott & Pervan, 2005). The aim of DSS was to create an environment in which the human decision maker and the IT-based system worked together in an interactive way to solve problems, the human dealing with the complex unstructured parts of the problem, and the information system providing assistance by automating the structured elements of the decision situation. The oldest form of DSS was the personal DSS which were small-scale systems that were normally developed for one manager, or a small number of independent managers, for one decision task. They -8- DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu effectively replaced MIS as the management support approach of choice and for around a decade they were the only form of DSS in practice. Starting with the 1980s, many activities associated with building and studying DSS occurred in universities and organizations that resulted in expanding the scope of DSS applications. This determined a broad historical progression and development of DSS into the next main categories (Arnott & Pervan, 2005): Group DSS - a group DSS consists of a set of technology and language components and procedures, communication and information processing that support a group of people engaged in a decision-related process (Huber, 1984; Kraemer & King, 1988). Negotiation DSS – a negotiation DSS also operates in a group context, but as the name suggests, they involve the application of computer technologies to facilitate negotiations (Rangaswamy & Shell, 1997). As GSS were developed, the need to provide electronic support for groups involved in negotiation problems and processes evolved as a focused sub-branch of GSS with different conceptual foundations to support those needs. Executive information systems (EIS) – an EIS is a data-oriented DSS that provides reporting about the nature of an organization to management (Fitzgerald, 1992). Despite the ‗executive‘ title, they are used by all levels of management. EIS were enabled by technology improvements in the mid to late 1980s and, by the mid 1990s EIS had become mainstream and were an integral component of the IT portfolio of any reasonably sized organization. As the 1990s unfolded, we saw the emergence of Data Warehousing (DW) and Business Intelligence (BI) which replaced the EIS. We will focus on these two terms further in the next paragraph in order to get a better overview of the field that inspired us in our research. 2.4 DW/BI Definition The term BI was first introduced by (Luhn, 1958) in his article called ―A Business Intelligence System‖. In his view, BI was defined as: ―the ability to apprehend the interrelationships of presented facts in such a way as to guide actions towards a desired goal‖. However, the term BI was coined and popularized in the early 1990s by Howard Dresner (Gartner Group analyst). He described BI as a set of concepts and methods to improve business decision making by using fact-based support systems (Power, 2003). In the last couple of years, a lot of attention has been spent on BI and therefore many definitions can be found in literature. Some of the most representative ones can be seen here. According to (Golfarelli et al., 2004), BI can be defined as the process of turning data into information and then into knowledge. A similar view on BI is the one of (Eckerson, 2007) who believes that BI represents ―the tools, technologies and processes required to turn data into information and information into knowledge and plans that optimize business actions.‖ Furthermore, (Gray & Negash, 2003) consider that BI systems ―combine data gathering, data storage and knowledge management with analytical tools to present complex and competitive information to planners and decision makers‖. We can see from these definitions that BI helps the decision making process by efficiently and effectively transforming data into knowledge through the use of different analytical tools. Moreover, the concept of DW dates back in the late 1980s when IBM researchers (Devlin & Murphy, 1988) published their article ―An Architecture for a Business and Information Systems‖ and introduced the term ―business data warehouse‖. However, the DW technology and development became popular in -9- DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu the 1990s after (Inmon, 1992) published his seminal book ―Building the Data Warehouse‖. Furthermore, the bull market of the 1990s led to a plethora of mergers and acquisitions and an increasing globalization of the world economy. Large organizations were therefore faced with significant challenges in maintaining an integrated view of their business. This was the environment that determined the increase of development and usage of DWs (Arnott & Pervan, 2005). Similar to BI, a lot of definitions can be found for DW, but all of them start from the ways that Inmon and Kimball (the creators of the two main schools of thought and practice within data warehousing) defined it. (Inmon, 1992) defines the DW as a ―subject-oriented, integrated, time-varying, non-volatile collection of data that is used primarily in organizational decision making‖. (Kimball, 1996) offers a much simpler definition of a DW which provides less insight and depth than Inmon‘s, but is no less accurate. In his opinion, a DW is ―a copy of transaction data specifically structured for query and analysis‖ (Kimball, 1996). DWs are therefore targeted for decision support as they collect information about one or more business processes involved in the whole organization. The DW can be seen as a repository that stores data gathered from many operational databases, and from which the information and knowledge needed to effectively manage the organization emerge. Typically, the DW is maintained separately from the organization‘s operational databases as it supports online analytical processing (OLAP) through a variety of front end tools such as query tools, report writers, data mining and analysis tools. OLAP functional and performance requirements are quite different from those of the online transaction processing (OLTP) that applications traditionally supported by the operational databases. The main differences between operational databases and DWs can be seen in the table below. Characteristics Source of data Operational Databases DWs Operational data; OLTP are the Consolidated data; data comes original source of data from various OLTP databases Few Many Number of sources Gigabyte Gigabyte-Terabyte Size of sources Current values Archived, derived, summarized Data content Control and run fundamental Help planning, problem solving, Purpose of data business tasks prediction and decision support Simple Complex Complexity of transactions Static, predefined Dynamic, flexible Kind of transactions Current-valued Current-valued & historical Actuality High Medium/Low Numbers of users/Frequency Table 1: Differences between operational databases and DWs (adapted from (Breitner, 1997)). Now that we defined both the BI and DW terms, we can see that there is some overlapping, but also some differences between them. Also, in literature there has been a debate regarding these two concepts. Some authors believe that BI is the overarching term with the DW being the central data store foundation, whereas others refer to data warehousing as the overall concept with the DW databases and BI layers as subset deliverables (Kimball et al., 2008). As (Kimball et al., 2008) and (Inmon, 2005), two of the most notorious figures in this field, believe that the DW is the foundation of BI, we will proceed with this approach in the remainder of this thesis. - 10 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu 2.5 DW/BI Business Value Since the development of a DW/BI environment is usually a very expensive endeavour, an organization considering such an initiative needs a BI strategy and a business justification to show the balance between the costs involved and the benefits gained. A DW/BI initiative provides numerous benefits—not only tangible benefits such as increasing the sales volume and profits, but also intangible benefits such as enhancing the organization's reputation. Many of these benefits, especially the intangible ones, are difficult to quantify in terms of monetary value. The real benefit of a DW/BI solution occurs when the created knowledge is actionable. That means that an organization cannot just provide for the information factory; it must also have some methods for extracting value from that knowledge. This is not a technical issue – it is an organizational one. To have identified actionable knowledge is one thing, but to take the proper action requires a nimble organization with individuals empowered to take that action. Hence, before embarking on building a DW/BI environment, every included DW/BI activity should be accompanied by some strategy to gain business value (Loshin, 2003). Moreover, although the general benefits of DW/BI initiatives are widely documented, they cannot justify the DW/BI project unless these benefits can be associated to the organization‘s specific business problems and strategic business goals (Moss & Atre, 2003). Justification for a DW/BI initiative must always be business-driven and not technology-driven. It is very important for such an initiative to have support from top level management in order to be successful. Therefore, the DW/BI initiative as a whole, and the proposed BI application specifically, should support the strategic business goals. Each proposed BI application must reduce measurable business problems (i.e.: problems affecting the profitability or efficiency of an organization) in order to justify building the application. Furthermore, the business representative should be primarily responsible for determining the business value of the proposed DW/BI application. The information technology (IT) department can become a solution partner with the business representative and can help explore the business problems and define the potential benefits and costs of the DW/BI solution. IT can also help clarify and coordinate the different needs of the varied groups of business users in order to develop a solution that will have a higher rate of adoption. With the business representative leading the business case assessment effort, IT staff can assist with the four business justification components (Moss & Atre, 2003): Business drivers - Identify the business drivers, strategic business goals, and DW/BI application objectives. Ensure that the DW/BI solution objectives support the strategic business goals. Business analysis issues - Define the business analysis issues and the information needed to meet the strategic business goals by stating the high-level information requirements for the business. Cost-benefit analysis – Estimate the benefits and costs for building and maintaining a successful BI decision-support environment. Determine the return on investment (ROI) by assigning monetary value to the tangible benefits and highlighting the positive impact the intangible benefits will have on the organization. Risk assessment - Assess the risks in terms of technology, complexity, integration, organization, project team, user adoption and financial investment. - 11 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu As can be seen from this paragraph, it is very important to have synchronization between the business goals of an organization and technical DW/BI solution. The business part has to be the driver for building the technical application. However, due to time constraints and the fact that there are previous DW/BI maturity models that also focus on the business side of the problem (Watson et al., 2001; Eckerson, 2004; Hostmann, 2007), in this thesis we will focus on the technical aspects and the organizational processes and roles involved in developing a DW/BI solution. Once the business goals and strategy are clearly defined, it all comes down to being able to develop and maintain a solid technical solution. 2.6 Inmon vs. Kimball As mentioned in the paragraphs above, there are two different fundamental approaches to data warehousing: enterprise level data warehouses (Inmon, 1992) and division or department level data marts (Kimball, 1996). Understanding the basics of the architecture and methodology of both models provides a good foundational knowledge of data warehousing. Based on this and an organization‘s special needs, architects can then choose between Inmon‘s, Kimball‘s or a hybrid architectural model. Inmon sees the DW as a part of a much larger information environment, which he calls Corporate Information Factory (CIF). To ensure that the DW fits well in this larger environment, he advocated the construction of both an atomic DW and departmental databases. Inmon‘s approach stresses top-down development using adaptations of proven database methods and tools. He proposes a three-level data model (Breslin, 2004). The first level is represented by entity relationship diagrams (ERDs); the second level establishes the data item set (DIS) for each department; and the third level is the physical model, created ―by merely extending the mid-level data model to include keys and physical characteristics‖ (Inmon et al., 2005). As it can be seen, Inmon‘s approach is evolutionary rather than revolutionary. His tools and methods can actively be used only by IT professionals, whereas end users have a more passive role in the DW development, mostly receiving the results generated by the IT professionals. On the other hand, Kimball proposes a bottom-up approach by first building one data mart per business process and then creating the organization‘s DW as a sum of all data marts. The interoperability between various data marts is ensured by the data bus which requires that all data marts are modeled within consistent standards called conformed dimensions. Kimball proposes a unique four-step dimensional design process that consists of: selecting the business process; declaring the grain (i.e.: the level of detail) of the DW; choosing the dimensions; and identifying the facts. Fact tables contain metric data and dimension tables show the context of the facts and modify the data. Dimensional modelling has a series of advantages such as understandability, query performance and extensibility to accommodate new data (Kimball et al., 2008). Dimensional modelling tools can be used by end users with some special training which ensures the active involvement of end users in the development of the DW (Breslin, 2004). Inmon‘s and Kimball‘s models are similar in some ways such as the treatment of the time attribute or the extract-transform-load (ETL) process, but they are also very different regarding other aspects such as the development methodologies and architecture, data modelling and philosophy. A summary of these differences can be depicted in table 2 adapted from (Breslin, 2004). - 12 - DWCMM: The Data Warehouse Capability Maturity Model Methodology and Architecture Overall approach Architectural structure Complexity of the method Comparison with established development methodologies Consideration of the physical design Data Modelling Data orientation Tools End-user acceptability Philosophy Primary audience Place in the organization Catalina Sacu Inmon Kimball Top-down Enterprise wide (atomic) data warehouse is the foundation for data marts. Bottom-up Data marts model a single business process and the enterprise consistency is achieved through data bus and conformed dimensions. Fairly simple Four-step process, inspired from relational databases methods. Fairly light Quite complex Derived from methodology. Fairly thorough the Subject- or data-driven Traditional (ERDs, DISs) Low spiral Process oriented Dimensional modeling High IT professionals End-users Integral part of Corporate Transformer and retainer of Information Factory (CIF) operational data Objective Deliver a sound technical solution Deliver a solution that makes it easy based on proven database methods for end users to directly query the and technologies. data and get good response times. Table 2: Comparison of Essential Features of Inmon‘s and Kimball‘s Data Warehouse Models (Breslin, 2004). The model that we are developing in this thesis can be applied to both Inmon‘s and Kimball‘s conceptual views on DW development, but there are some specific aspects in the data modelling assessment that are limited to dimensional modelling. The reasons for this include both time constraints and the fact that most of the DWs developed in practice make use of this technique especially for data marts and for models presented to users. For more information on the two data modelling techniques, see 4.2.2 and 4.2.3. 2.7 Maturity Modelling As the main goal of our research is to develop a Data Warehouse Capability Maturity Model, we will now give an overview on the subject of maturity modelling and take a look at the maturity models that served as a source of inspiration for our endeavour. In this highly competitive environment, it is very important for organizations to be aware of their current situation and know the steps they need to take for continuous improvement. This requires the company‘s positioning with regard to its IT capabilities and the quality of its products and processes. This positioning usually involves a comparison with the company‘s goals, external requirements (e. g.: customer demands, laws or guidelines), or benchmarks. However, an objective assessment of a company‘s position often proves to be a difficult task. Maturity models can be helpful in this situation. They essentially describe the development of an entity over time, where the entity can be anything of interest: a human being, an organizational function, an organization, etc. (Klimko, 2001). Maturity models can be used as an evaluative and comparative basis for organizational improvement (de Bruin et al., 2005), and to derive an informed approach for increasing the capability of a specific area within an organization (Hakes, 1996). They usually have a number of sequentially ordered levels, where the bottom stage stands for an initial state than can be, for example, characterized by an organization - 13 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu having little capabilities in the domain under consideration. In contrast, the highest stage represents a concept of total maturity. Advancing on the evolution path between the two extremes involves a continuous progression regarding the organization‘s capabilities or process performance. The maturity model serves as an assessment of the position on the evolution path, as it offers a set of criteria and characteristics that need to be fulfilled in order to reach a particular maturity level (Becker et al., 2009). During a maturity appraisal which can be done by predetermined procedures such as questionnaires, a snapshot of the organization regarding the given criteria is made (i.e.: a descriptive model). Based on the results of the as-is analysis, recommendations for improvement measures can be derived and prioritized in order to reach higher maturity levels (i.e.: a prescriptive model). Then, once the model is applied in a wide range of organizations, similar practices across organizations can be compared in order to benchmark maturity within more disparate industries (i.e.: a comparative model). 2.7.1 Data Warehouse Capability Maturity Model (DWCMM) Forerunners Studies have shown that more than one hundred and fifty maturity models have been developed (de Bruin et al., 2005), but only some of them managed to gain global acceptance. Also, there are several information technology and/or information system maturity models dealing with different aspects of maturity: technological, organizational and process maturity. The most important maturity models that served as a source of inspiration for our research can be seen in table 3 and are briefly presented in the following paragraphs. Authors Nolan (1973) Software Engineering Institute (SEI) (1993) Watson, Ariyachandra & Matyska (2001) Chamoni & Gluchowski (2004) The Data Warehousing Institute (TDWI) (2004) Gartner – Hostmann (2007) Model Stages of Growth Capability Maturity Model (CMM) Focus IT Growth Inside an Organization Software Development Processes Data Warehousing Stages of Growth Data Warehousing Business Intelligence Maturity Model Business Intelligence Maturity Model Business Intelligence Business Intelligence Business Intelligence and Performance Management Maturity Model Table 3: Overview of Maturity Models. Business Intelligence Performance Management and Nolan’s Stages of Growth First, one of the most widely used concepts in organizational and IS research is the ―stages of growth‖. The fundamental belief is that many things change over time, in sequential, predictable ways. The stages of growth are commonly depicted graphically using an S-shaped curve, where the turnings of the curve mark important transitions. The number of stages varies with the phenomena under investigation, but most models have between three and six stages (Watson et al., 2001). One of the most famous ―stages of growth‖ maturity models is Richard Nolan‘s one, published in (Nolan, 1973). The model has been widely recognized and used by both practitioners and researchers alike. It is based on the companies‘ spending for electronic data processing, but it can be expanded to the general approach of IT in an organization. Nolan‘s initial model describes four distinct stages: initiation, expansion, formalization and maturity. In 1979, Nolan transformed the original four-stage model into a six-stage model by adding two new stages: - 14 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu integration and data administration that were put in between the stages of formalization and maturity. For a more detailed and also critical analysis of the Nolan curve see (Galliers & Sutherland, 1991). Capability Maturity Model (CMM) The second classical maturity model is the Capability Maturity Model (CMM) developed at the end of the eighties by Watts Humphrey and his team from the Software Engineering Institute (SEI) at Carnegie Mellon University. The CMM is a framework that describes the key elements of an effective software process and presents an evolutionary improvement path from an ad-hoc, immature process to a mature, disciplined one. It covers practices for planning, engineering and managing software development and maintenance. The components of the CMM include (Paulk et al., 1995): five maturity levels – initial, repeatable, defined, managed and optimizing; process capabilities – describe the range of expected results that can be achieved by following a software process; key process areas – components of the maturity levels that identify a cluster of related activities that, when performed collectively, achieve a set of goals considered important for establishing process capability at that maturity level; goals – summarize the key practices of a key process area; common features – indicate whether the implementation and institutionalization of a key process area is effective, repeatable, and lasting; key practices – each key process area is described in terms of key practices that, when implemented, help to satisfy the goals of that key process area. Furthermore, a number of maturity models have been developed for assessing the maturity of BI and DWs solutions. Data Warehousing Stages of Growth The ―Data Warehousing Stages of Growth‖ was adapted from Nolan‘s growth curve. It includes three stages that describe the current evolution of DWs: initiation — the initial version of the warehouse; growth — the expansion of the warehouse; maturity — the warehouse becomes fully integrated into the company‘s operations. and nine variables that describe the different stages: data, architecture, stability of the production environment, warehouse staff, users, impact on users‘ skills and jobs, applications, costs and benefits, organizational impact (Watson et al., 2001). However, this model has its limitations as it is a generalization, so it does not describe perfectly every company‘s experiences. Also, the model is a few years old and meanwhile, new developments have occurred, that point to additional stages. BI Maturity Model (biMM) Another interesting model is the ―Business Intelligence Maturity Model‖ developed by (Chamoni & Gluchowski, 2004), but both the model and the paper are in German which makes it rather difficult for non-German speakers to understand its content. It comprises of five levels of evolutionary BI - 15 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu development analyzed from three perspectives: business content, technology and organizational impact. Different aspects of these perspectives are recorded and evaluated for each of the five stages. The model is applied in different organizations in order to do some BI benchmarking in specific industrial sectors and offer general strategic recommendations. TDWI’s BI Maturity Model Another famous BI maturity model is the one developed by The Data Warehousing Institute (TDWI) (Eckerson, 2004). It is a six-stage model that shows the trajectory that most organizations follow when evolving their BI infrastructure. The maturity stages are: prenatal, infant, child, teenager, adult and sage. They are defined by a number of characteristics including scope, analytic structure, executive perceptions, types of analytics, stewardship, funding, technology platform, change management and administration. In 2009, TDWI published a poster with a more complex BI maturity model which can be considered as a generalization of multiple BI projects and implementations indicating certain patterns of behaviour based on five different aspects: the BI adoption, organization control and processes, usage, insight and return on investment (ROI). In order to give more value to the model, TDWI also created an assessment questionnaire with questions on funding, value, architecture, data, development and delivery that can be filled in by different organizations in order to have some BI benchmarking done. Gartner’s BI and Performance Management (PM) Maturity Model The last model that we would like to present is Gartner‘s Group BI and Performance Management (PM) Maturity Model (Hostmann, 2007). Their model helps an organization understand its current position with regard to BI and what it needs to do to move to the next level. Gartner bases its maturity curve on the real-world phenomenon that organizational change is usually incremental over time and proposes five maturity stages: unaware, tactical, focused, strategic and pervasive. An important discovery in their analysis is that one characteristic was more likely than any other to indicate whether an organization is capable of operating at the higher levels of BI/PM maturity: its implementation of the BI Competency Center (BICC), or its lack thereof. BICC is a group of business, IT, and information analysts who work together to define BI strategies and requirements for the entire organization. 2.8 Summary This chapter has presented information on the key background concepts for our thesis – data warehousing, business intelligence and maturity modelling. We first talked about the ―intelligent organization‖ and how DW/BI solutions can help companies improve their performance. Then, we gave a short overview on DW/BI evolution and defined the two concepts. Emphasis was then put on the fact that a DW/BI initiative must always be business-driven and not technology-driven in order to be successful. We continued with presenting the two main conceptual approaches to data warehousing (i.e.: Inmon vs. Kimball). Finally, we provided some information on maturity modelling and the main maturity models that served as a foundation for the artifact we designed. We will continue with an overview on the model we developed in chapter 3. - 16 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu 3 DWCMM: The Data Warehouse Capability Maturity Model This section describes in detail the deliverables proposed as a solution to the research problem. 3.1 From Nolan’s Stages of Growth to the Data Warehouse Capability Maturity Model As presented in the previous paragraphs, a lot of maturity models have been developed for different fields. And several models have been proposed for the field of DW/BI. Each of them has a different way of assessing maturity, but there are some common elements for all the models. First of all, Nolan‘s ―stages of growth‖ was a breakthrough in the organizational and IS research (Nolan, 1973). It shows the growth and evolution of information technology (IT) in a business or similar organization from stage 1 called ―initiation‖ to the last stage called ―maturity‖. The second maturity model which was actually the starting point for this thesis is the CMM (Paulk et al., 1995). It has become a recognized standard for rating software development organizations. The CMM is a framework that describes the key elements of an effective software process and presents an evolutionary improvement path from an ad-hoc, immature process to a mature, disciplined one. Since its development, CMM has become a universal model for assessing software process maturity. Therefore, we decided to use it as a main foundation for our model. However, the CMM has often been criticized for its complexity and difficulty of implementation. That is why we simplified it by keeping the five maturity levels (i.e.: initial, repeatable, defined, managed and optimizing), the process capabilities and the key process areas, which in our model would translate to the chosen benchmark variables/categories for doing the DW maturity assessment. As DW/BI is widely applied in practice, several maturity models were developed especially for this field as already presented. One of the most recent and famous models is the one developed by the TDWI (Eckerson, 2004). Another interesting model is Gartner‘s BI and PM maturity model (Hostmann, 2007). They both show the trajectory that most organizations follow when evolving their BI or PM infrastructure. However, even if both models are interesting, they are not sustained by scientific literature and they focus more on the business side of BI implementation and not on the technical aspects of a DW project. Furthermore, even if the other two models, the DW stages of growth (Watson et al., 2001) and the BI maturity model (Chamoni & Gluchowski, 2004), have more scientific roots, they have their deficiencies. As mentioned before, the latter is in German, whereas the former is a few years old and meanwhile, new developments have occurred that point to additional stages. Although both models focus on more variables involved in the DW/BI development, they do not go deep into analyzing the technical aspects of a DW solution. Therefore, it can be seen that even if DW/BI solutions are often implemented in practice and a lot of maturity models have been created, none is actually focusing on the technical aspects of the DW/BI solution and the organizational processes that sustain them. Hence, this is the research gap we would like to fill in by developing a Data Warehouse Capability Maturity Model (DWCMM) that focuses on the DW technical solution and DW organization and processes. A short overview of the model will be shown in the next paragraph and more details on each component will be given in the upcoming chapters. - 17 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu 3.2 DWCMM Using the CMM as a main foundation, the other maturity models described above and a thorough and extensive literature study, we developed the DWCMM that can be depicted in figure 5. Figure 5: Data Warehouse Capability Maturity Model (DWCMM). When analyzing the maturity of a DW solution, we are actually taking a snapshot of an organization at the current moment in time. Therefore, in order to do a valuable assessment, it is important to include in the maturity analysis the most representative dimensions involved in the development of a DW solution. Several authors describe that the main phases usually involved in a DW project lifecycle are (Kimball et al., 2008; Moss & Atre, 2003; Ponniah, 2001): project planning and management, requirements definition, design, development, testing and acceptance, deployment, growth and maintenance. All of these phases and processes refer to the implementation and maintenance of the actual DW technical solution which includes: general architecture and infrastructure, data modelling, ETL, BI applications. These categories can be analyzed from many points of view which will be depicted in our model and the maturity assessment we developed. Therefore, as the DWCMM will be restricted for doing the assessment of the technical aspects, without taking into consideration the DW/BI usage and adoption or the DW/BI business value, it will consider two main benchmark variables/categories for analysis, each of them having several sub-categories: DW Technical Solution General Architecture and Infrastructure Data Modelling Extract-Transform-Load (ETL) BI Applications DW Organization & Processes Development Processes Service Processes. - 18 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu In order to be able to do the assessment for each of the chosen categories and sub-categories, we also developed a DW maturity assessment questionnaire. It is important to emphasize the fact that the questionnaire we have developed does a high level assessment of an organization’s DW solution and it is limited strictly to the DW technical aspects. Emphasis should also be put on the fact that the model will assess “what” and “if” certain characteristics and processes are implemented and not “how” they are implemented. It is a practical solution as it takes less than an hour to fill in the questions that will be scored and it is addressed to someone from the DW team who has knowledge and experience in all the presented categories included in the DWCMM (e.g.: DW technical architect, BI project manager, BI manager, BI consultant, etc.). However, although it may be tempting to use the scores from the assessment questionnaire as a definitive statement of the organization‘s DW maturity, this should be avoided. The maturity score is just a rough gauge that merely scratches the surface of most DW projects. That is why the maturity assessment we developed should serve as a starting point. To truly assess the technical maturity and discover the areas of strengths and weakness, organizations should perform more thorough analysis for each benchmark category. The DW maturity assessment questionnaire has 60 questions divided into the following three categories: DW General Questions (9 questions) – it comprises of several questions about the DW/BI solution and they are not scored. Their purpose is to offer a better image on the drivers for implementing the DW environment, the budget allocated for data warehousing and BI, the DW business value, end-user adoption, etc. This will be useful in creating a complete picture on the current DW solution and its maturity. Also, once the questionnaire is filled in by more organizations, this data will serve as input for statistical analysis and comparisons between organizations from the same industry or across industries. The questions from this category can be seen in the table below. 1) Could you elaborate on the main drivers for implementing a BI/DW solution in your organization? 2) How long has your organization been using BI/DW? 3) Could you elaborate on the success of the BI/DW solution in your organization, in terms of: a) Returns vs. Costs b) Time (Intended vs. Actual) c) Quality d) End-user adoption. 4) Which answer best describes how executives perceive the purpose of your organization‘s BI/DW environment? a) Operational cost center – An IT system needed to run the business b) Tactical resource - Tools to assist decision making c) Mission-critical resource - A system that is critical to running business operations d) Strategic resource – Key to achieving performance objectives and goals e) Competitive differentiator – Key to gaining or keeping customers and/or market share. 5) What percentage of the annual IT budget for your organization does the BI/DW budget represent? 6) What percentage of the IT department is taking care of BI (i.e.: how many people from the total number of IT employees)? 7) Who is the budget owner of the BI/DW solution in your organization (i.e.: who is responsible for paying the invoice)? 8) Which technologies do you use for developing the BI/DW solution in your organization (i.e.: for data modelling, ETL, BI applications, database)? 9) What data modelling technique do you use for your BI/DW solution (e.g.: dimensional modelling, normalized - 19 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu modelling, data vault, etc.)? Table 4: DW General Questions. DW Technical Solution (32 questions) – it comprises of several scored questions for each of the following sub-categories: General Architecture and Infrastructure (9 questions) Data Modelling (9 questions) ETL (7 questions) BI Applications (7 questions). More details on this part will be given in chapter 4. DW Organization & Processes (19 questions) – it comprises of several scored questions for each of the following sub-categories: Development Processes (11 questions) Service Processes (8 questions). More details on this part will be given in chapter 5. The whole DW maturity assessment questionnaire is shown in appendix B. Each question from the questionnaire will have five possible answers which are scored from 1 to 5, 1 being a characteristic for the lowest maturity stage and 5 for the highest one. When an organization takes the survey, it will receive: a maturity score for each sub-category by computing the average value of the weightings (i.e.: sum of the weightings / number of questions). an overall score for each of the two main categories by computing the average value of the scores obtained for each sub-category. an overall maturity score is shown following the same principle applied to the main two categories scores. We believe that the maturity scores for the sub-categories can give a good overview on the current DW solution implemented by the organization. This is the reason why, after computing the maturity scores for each sub-category, a radar graph, as the one depicted in figure 5, will be drawn to show the alignment between these scores. In this way, the organization will have a clearer image on their current DW project and will know what sub-category is the strongest and which one is left behind. An important point here is that the answers are usually mingled in order to get a more unbiased result. Some questions have their answers given in a hierarchical order as, in order to get to a higher maturity level, the organization should already have implemented the requirements found in the previous stages. As will be seen in the validation chapters, the model was tested in several organizations with all the answers mingled. However, this created confusion for some of the questions (especially the ones from the service processes part), and therefore, we decided to keep some answers in hierarchical order, assuming that every respondent would like to get a fair result and will not offer biased answers. After reviewing the maturity scores and the given answers by a specific organization, some general feedback and advice for future improvements will be provided. Each organization that takes the assessment will receive a document with a short explanation on the scoring method, a table with their maturity scores and the radar graph, and then some general feedback that will consist of: a general overview on the maturity scores. an analysis of the positive aspects already implemented in the DW solution. - 20 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu several steps that the organization should take in order to improve their current DW application. A template of this document can be seen in appendix F. Moreover, as our model measures the maturity of a DW solution, we also created two maturity matrices – a condensed maturity matrix and a detailed one – each of them having five maturity stages as inspired by the CMM: Initial (1) Repeatable (2) Defined (3) Managed (4) Optimized (5), where the initial stage describes an incipient DW development and the optimized level shows a very mature solution that can be obtained by an organization with a lot of experience in the field where everything is standardized and monitored. An organization will usually be situated on one of these stages if their score is a perfect match with the number of the stage (i.e.: 1, 2, 3, 4 or 5) or somewhere in between otherwise. However, this mapping is not a perfect match. The condensed DW maturity matrix gives a short overview of the most important characteristics for each sub-category for each maturity level. This will offer a better image on the main goal of the DWCMM and on what the detailed maturity matrix entails. The condensed maturity matrix can be seen in figure 6. Stages DW Technical Solution Benchmark Variables Architecture Initial (1) Desktop marts data Repeatable (2) Defined (3) Managed (4) Optimized (5) DW/BI service that federates a central DW and other sources via standard interface Enterprise-wide standards and automatic synchronization of all the data models Optimized ETL for real-time DW with all the standards defined Independent data marts Independent data warehouses Central DW with/without data marts Data Modelling No data models synchronization or standards Manually synchronized data models Manually or automatically synchronized data models Automatic synchronization of most data models ETL Simple ETL with no standards that just extracts and loads data into the DW Basic ETL with simple transformations BI Applications Static and parameter-driven reports Ad-hoc reporting; OLAP Advanced ETL (e.g. slowly changing dimensions manager, data quality system, reusability, etc.) Dashboards & scorecards More advanced ETL (e.g. hierarchy manager, special dimensions manager, etc.) Predictive analytics; data & text mining Closed-loop & real-time BI applications - 21 - DW Organization & Processes DWCMM: The Data Warehouse Capability Maturity Model Development Processes Service Processes Ad-hoc, nonstandardized development processes or defined phases Some Standardized development development processes processes with policies and all the phases procedures separated and established with all the roles some phases formalized separated Ad-hoc, non- Some service Standardized standardized processes service service processes policies and processes with procedures all the roles established formalized Figure 6: DWCMM Condensed Maturity Matrix. Catalina Sacu Quantitative development processes management Continuous development processes improvement Quantitative service processes management Continuous service processes improvement However, as already mentioned, more important is the detailed DW maturity matrix which can be seen in appendix A. We will give a short overview on the detailed DW maturity matrix in this paragraph. First, the characteristics for each maturity stage are usually obtained by mapping the corresponding answers of each question from the maturity assessment questionnaire (except for several characteristics such as: project management, testing and acceptance, whose answers are formulated in a different way). In this way, an organization will be able to see their maturity stage by category (e.g.: architecture) and by main category characteristics (e.g.: metadata, standards, infrastructure, etc.). The matrix has two dimensions: columns – show each benchmark sub-category (i.e.: Architecture, Data Modelling, ETL, BI Applications; Development Processes, Service Processes) with their maturity stages from Initial (1) to Optimized (5). rows – show the main analyzed characteristics (e.g.: for Architecture – conceptual architecture, business rules, metadata, security, data sources, performance, infrastructure, update frequency) for each sub-category divided by maturity stage. The matrix can be interpreted in two ways: 1) Take each stage and see which the specific characteristics for each sub-category for that particular stage are. 2) Take each sub-category and see which its specific characteristics for each stage or for a particular stage are. As the developed questionnaire does an assessment for each benchmark sub-category, a specific organization will most likely follow the second interpretation. They would probably like to know what steps to take in order to improve each sub-category and hence, the overall maturity score, which will lead to a higher maturity stage. It is also very unlikely that an organization will have at the same moment in time, all the characteristics for all the sub-categories on the same maturity stage. However, the first interpretation does not have to be followed so strictly. After all, this is only a model and the mapping between theory and reality is not perfect. Therefore, if a company gets a maturity score of 3, this does not mean that all the characteristics for all the sub-categories are on stage three. Depending also on the - 22 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu standard deviation and the answers themselves, we can find out more information about the actual situation. This is why we believe that the second interpretation would be more useful and we will exemplify it here for general architecture and infrastructure. The main characteristics for general architecture and infrastructure evaluated in our model are: conceptual architecture, business rules, metadata management, security, data sources, infrastructure, performance, and update frequency. The maturity stages for conceptual architecture have the following structure: Initial (1) – desktop data marts (e.g.: Excel sheets) Repeatable (2) – multiple independent data marts Defined (3) – multiple independent data warehouses Managed (4) – a single, central DW with multiple data marts (Inmon) or conformed data marts (Kimball) Optimized (5) – a DW/BI service that federates a central DW and other data sources via a standard interface. Therefore, if an organization scores 3 for this specific characteristic, we would advise it to reconsider their architecture and maybe go one step further and implement a single, central DW. In this way, they could reach maturity stage four for this specific characteristic and it would be the first step towards a higher overall maturity score. The same interpretation can be given if analyzing any characteristic for architecture or for any other benchmark category. At the same time, one could say that in order to be on maturity stage 3, an organization should have the following characteristics implemented for architecture (more or less): Conceptual architecture – multiple independent data marts. Business rules – some business rules defined or implemented. Metadata Management – central metadata repository separated by tools. Security – independent authorization for each tool, etc. Now that we have offered an overview of the DWCMM, we will continue with presenting the DWCMM and the DW maturity assessment questionnaire and matrix more thoroughly in the next chapters. For this, we will elaborate on each category and sub-category as shown in the DWCMM and we will present the characteristics and questions we chose in order to assess the maturity of each benchmark variable from the DWCMM. In chapter 4 we will focus on the DW Technical Solution maturity and we will continue in chapter 5 with the DW Organization & Processes part. - 23 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu 4 DW Technical Solution Maturity The main elements of the DWCMM having been identified, it is now time to elaborate on each part of the maturity assessment questionnaire to present our arguments regarding the choice of questions for each sub-category of the DWCMM. We will start with the components of the DW technical solution – general architecture and infrastructure, data modelling, ETL, BI applications – in this chapter and continue with the DW processes and organization in the next one. 4.1 General Architecture and Infrastructure We already talked about what a DW is and the most common approaches to develop one in the previous chapters. In this section, we would like to analyze the most important elements that need to be considered when assessing the maturity of DW general architecture and infrastructure (this benchmark variable was initially called ―architecture‖; see 6.1.1 for more details on changing the name). Depending on this, we will also define the most representative questions for architecture included in the maturity assessment questionnaire. 4.1.1 What is Architecture? Architecture as a general term refers to a blueprint that allows communication, planning, maintenance, learning, and reuse (Sen & Sinha, 2005). According to (Kimball et al., 2008), the architecture of a DW consists of three major pieces: data architecture - organizes the data and defines the quality and management standards for data and metadata; application architecture – the software framework that controls the movement of data from source to user; and technical architecture - the underlying computing infrastructure that enables the data and application architectures. The whole architecture is divided in two parts (Kimball et al., 2008): the back room where the data modelling and the ETL process take place and the front room which refers to the BI applications and services. Besides these three main components (i.e.: data modelling, ETL, BI applications), architecture also includes underlying elements such as infrastructure, metadata and security that support the flow of data from the source systems to the end-users (Kimball et al., 2008; Chauduri & Dayal, 1997). At the same time, architecture refers to the major data storage components – source systems, data staging area, data warehouse database, operational data store, data marts – and the way they are assembled together (Ponniah, 2001). This is connected to the conceptual approach of designing and building the DW (e.g.: conformed data marts – Kimball or enterprise-wide DW – Inmon, etc.). Therefore, in this thesis we consider architecture as a separate category for assessing maturity in which we include questions regarding: conceptual architecture and its layers, infrastructure, metadata management, security management, update frequency, business rules, performance optimization. We will elaborate on each of these elements and, at the same time, present the questions related to these elements that we included in our maturity questionnaire. - 24 - DWCMM: The Data Warehouse Capability Maturity Model 4.1.2 Catalina Sacu Conceptual Architecture and Its Layers In this section we will present a typical DW architecture which usually contains several data storage layers such as source systems, data staging area, data warehouse database, operational data store, data marts. It is not mandatory for all these elements to be part of the architecture. A typical DW architecture can be seen in figure 7. Figure 7: A Typical DW Architecture (adapted from (Chaudhuri & Dayal, 1997)). Source Systems The first component of a DW is represented by the source systems without which there would be no data. They provide the input into the solution and require detailed analysis at the beginning of the project. In most cases, data must come from multiple systems built with multiple data stores hosted on multiple platforms. The source systems usually include: Excel files, text files, XML files, relational databases, enterprise resource planning (ERP) and customer relationship management (CRM) systems, etc. For a broader view on these types of data sources, see (Kimball et al., 2008). Lately, organizations have begun implementing capabilities in order to include in their DW various types of unstructured data sources (e.g.: text documents, e-mail files; images or videos) and Web data sources. However, this implies new technologies such as content intelligence (i.e.: search, classification and discovery techniques) which are not yet very mature (Blumberg & Atre, 2003). Therefore, one could say that an organization that is able to extract data from this kind of sources is a more mature one. Data Staging Area A data staging area is a temporary location in the back room of a DW where data from source systems is copied. Occasionally, the implementation of the DW encounters environmental problems as it pulls data from many source operational systems. Therefore, a separate staging area is needed to prepare data for the DW, but it is not universally built. The copy of the data can be a one-to-one mapping of the source systems‘ content, but in a more convenient environment. The data staging area is not accessible for the end users and it does not support query or presentation services. It acts as a surrogate for the source systems and it offers several benefits (Walker, 2006): - 25 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu It is a good place to perform data quality profiling. It can be used as a point close to the source to perform data quality cleaning. It will serve as a workbench for ETL, etc.. Data Marts Data marts can be considered as subsets of the data volume from the whole organization specific to a group of users or department. Therefore, they are limited to specific subject areas. For example, a data mart for the marketing department would have subjects limited to customers, products, sales, etc. (Chauduri & Dayal, 1997). The data from a data mart are usually aggregated to a certain level which can sometimes provide rapid response to end-user requests. Data marts require less cost and effort to develop and provide access to functional or private information to specific organizational units. They are suited for businesses demanding a fast time to market, quick impact on the bottom line, and minimal infrastructure changes (Murtaza, 1998). However, even if from a short-term perspective a data mart seems a better investment than a DW, from a long-term perspective, the former is never a substitute for the latter. The main reason for this is because many organizations misunderstand the concept of data marts and develop independent solutions that propagate freely throughout the organization and become a problem when attempting to integrate them (Kimball et al., 2008). Therefore, when developed, data marts should be conformed and integrated, or derived from an enterprise wide DW. Data Warehouse Database As already presented, there are two main conceptual DW architectures: central DW with multiple data marts (Inmon) or conformed data marts (Kimball). Of course, there are also hybrid approaches that combine the enterprise wide DW and the conformed data marts technique. Here, we just refer to the DW database or the separate repository that does the actual storage of data. The DW is no special technology in itself as it is a relational or multidimensional data structure that is optimized for analysis and querying. As the data structure and operations are different from the ones in the transactional systems, it is important to have the DW environment separated from the operational ones (Chauduri & Dayal, 1997). Operational Data Store The Operational Data Store (ODS) is a database that provides a consolidated view of volatile transactional data from multiple operational systems. According to Bill Inmon, the originator of the concept, an ODS is ―a subject-oriented, integrated, volatile, current-valued, detailed-only collection of data in support of an organization's need for up-to-the-second, operational, integrated, collective information.‖ (Inmon, 1992). As can be seen, an ODS differs from a DW in that the ODS‘s contents are updated in the course of business, whereas a data warehouse contains static data. Therefore, this architecture is suitable for real time or near real time reporting and analysis that can be done without impacting the performance of the production systems. Unfortunately, operational data is not designed for decision support applications, and complex queries may result in long response times and heavy impact on the transactional systems. Maturity Assessment Question(s) All this being said, we can now show the maturity assessment questions related to these elements: - 26 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu 1) What is the predominant architecture of your DW? a) Level 1 – Desktop data marts (e.g.: Excel sheets) b) Level 2 – Multiple independent data marts c) Level 3 – Multiple independent data warehouses d) Level 4 – A single, central DW with multiple data marts (Inmon) or conformed data marts (Kimball) e) Level 5 – A DW/BI service that federates a central enterprise DW and other data sources via a standard interface. 2) What types of data sources does your DW extract data from at the highest level? a) Level 1 – CSVs files b) Level 2 – Operational databases c) Level 3 – ERP and CRM systems; XML files d) Level 4 – Unstructured data sources (e.g.: text documents, e-mail files) e) Level 5 – Various types of unstructured data sources (e.g.: images, videos) and Web data sources. Table 5: DW Architecture Maturity Assessment Questions. As can be seen, we focused our attention on the conceptual architecture and on the types of data sources that the DW supports at the highest level as we considered them to be the most important high-level elements that characterize the maturity of the conceptual architecture. The hierarchical order in which the answers were organized was deduced from the information given beforehand on these elements and from the literature study we had done. 4.1.3 Infrastructure Infrastructure is a very important component of a DW as it provides the underlying foundation that enables the DW architecture to be implemented. It is sometimes called technical architecture and it includes several elements such as: hardware platforms and components (i.e.: disks, memory, CPUs, DW/ETL/BI applications servers), operating systems (e.g.: UNIX), database platforms (e.g.: relational engines or multidimensional/OLAP engines), connectivity and networking. Several factors influence the implemented infrastructure: the business requirements, the technical and systems issues, the specific skills and experience of the DW team, policy and other organizational issues, expected growth rates, etc. (Kimball et al., 2008). An important aspect here is the parallel processing hardware architecture used: symmetric multiprocessing (SMP), massively parallel processing (MMP) and non-uniform memory architecture (NUMA). These architectures differ in the way the processors work with the disk, memory and each other. It is important to gain sufficient insight into each option‘s features, benefits and limitations in order to select the proper server hardware. Therefore, you cannot say that one is more mature than the other. For more information on parallel processing hardware architectures, see (Kimball et al., 2008; Ponniah, 2001). As DWs contain large volumes of data with a different structure than the operational databases, a specialized infrastructure for DW can be critical for performance and better results. The most important aspect is to have different servers for the OLTP and DW systems. However, many organizations ignore this thing and use the same servers for both systems which leads to low performance. Once this is done, higher performance can be achieved by having separate servers for DW, ETL and BI applications. Lately, a new hardware solution has been developed for increasing the performance of the DW system: a - 27 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu specialized DW appliance. It consists of a small amount of proprietary hardware with an integrated set of servers, storage, operating system(s), DBMS and software specifically pre-installed and pre-optimized for data warehousing. Though such appliances are expensive relative to regular hardware, the custom hardware they contain allows them to claim a 10-50 times improvement over existing database solutions (Madden, 2006). Another reason for buying such an appliance is simplicity. The appliance is delivered complete (―no assembly required‖) and installs rapidly. Finally, if there are any problems, the appliance requires complex analysis, but only a single call to the appliance vendor for a solution (Feinberg & Beyer, 2010). Maturity Assessment Question(s) From the information presented above, we decided that a representative question to assess the maturity of the infrastructure refers to the specialization of infrastructure for a DW solution: 3) To what degree is your infrastructure specialized for a DW? a) Very low – Desktop platform b) Low – Shared OLTP systems and DW environment c) Moderate – Separate OLTP systems and DW environment d) High – Separate servers for OLTP systems, DW, ETL and BI applications e) Very high – Specialized DW appliances (e.g.: Netezza, Teradata). Table 6: Infrastructure Maturity Assessment Questions. 4.1.4 Metadata Metadata is usually defined as ―data about data‖ (Shankaranarayanan & Even, 2004). However, this definition does not give a clear image on what metadata actually is. Metadata can be seen as all the information that defines and describes the structures, operations and contents of the DW system in order to support the administration and effective exploitation of the DW. The DW/BI industry often refers to two main categories of metadata (Moss & Atre, 2003): Business metadata - provides business users with a roadmap for accessing the business data in the DW/BI decision-support environment. It describes the contents of the DW in more user accessible terms. It shows what data the user can find, where it comes from, what it means and what its relationships is to the other data in the DW. Technical metadata - supports the technicians and ―power users‖ by providing them with technical information about the objects and processes that make up the DW/BI system. Some differences between business metadata and technical metadata are highlighted in table 7. Business Metadata Provided by business people Documented in business terms on data models and in data dictionaries Used by business people Technical Metadata Provided by technicians or tools Documented in technical terms in databases, files, programs, and tools Used by technicians, ―power users‖, databases, programs, and tools (e.g.: ETL, OLAP) Names fully spelled out in business language Abbreviated names with special characters, used in databases, files, and programs Table 7: Business Metadata vs. Technical Metadata (adapted from (Moss & Atre, 2003)). - 28 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu (Kimball et al., 2008) propose a third category of metadata: Process metadata – describes the results of various operations in the DW and it is especially applied to the ETL or query processes. For example, in the ETL process, each task logs key data about is execution, such as start and end time, CPU seconds used, rows processed, etc. Similar process metadata is generated when users query the DW. This data is very important for performance monitoring and improvement process. Metadata can be considered the DNA of the DW as it defines its elements and how they work together. It drives the DW and provides flexibility by buffering the various components of the system from each other (Ponniah, 2001). A very important aspect related to metadata is integration. Metadata is usually stored and maintained in repositories. These are structured storage and retrieval systems, typically built on top of a conventional DBMS. A repository is not simply a storage component, but also embodies functionalities necessary to handle the stored metadata. However, the reality is that most tools create and manage their own metadata repository and therefore, there will be several metadata repositories scattered around the DW system. These repositories often use different storage types and thus, they may have overlapping content. It‘s this combination of multiple repositories that causes problems and hence, the best solution is a single integrated metadata repository (Kimball et al., 2008). However, implementing an integrated metadata repository can be very challenging, but if succeeded, it would be valuable in several ways: it could help identify the impact of making a change to the DW system; it could serve as a source for auditing and documentation; it would ensure metadata quality and synchronization, etc. As usually an organization supports tools from more vendors, it is rather difficult to create an integrated metadata repository due to lack of standardization. But, despite all these challenges, a metadata repository is a mandatory component of every DW environment and metadata should be gathered for all the components of the DW (i.e.: data modelling, ETL, BI applications, etc.) (Ponniah, 2001; Moss & Atre, 2003). Another important aspect related to metadata is accessibility (Moss & Atre, 2003). In order to reach its goal, BI applications metadata should always be available and easily accessible to end users for a better understanding and usage of the DW solution. Of course, the best solution would be a complete integration of metadata with the BI applications (i.e.: metadata can be accessed through one button push on the attributes, metrics, etc.). However, this is also the hardest one to implement. In case the organization has a metadata repository implemented, another efficient way of accessing metadata is through a metadata management tool. But, there are still many organizations that do not pay much attention to business metadata and its accessibility and, therefore, metadata is very often not available or available by sending documents to users by request. Maturity Assessment Question(s) As metadata is an underlying element in a DW and it has specific characteristics for each of the major components – data modelling, ETL, BI applications – we will have one maturity question regarding metadata in each of the mentioned categories. For architecture, we decided that the metadata maturity question should refer to the general metadata management: 4) To what degree is your metadata management implemented? - 29 - DWCMM: The Data Warehouse Capability Maturity Model a) b) c) d) e) Catalina Sacu Very low – No metadata management Low – Non-integrated metadata by solution Moderate – Central metadata repository separated by tools High – Central up-to-date metadata repository Very high – Web-accessed central metadata repository with integrated, standardized, up-to-date metadata. Table 8: Metadata Management Maturity Assessment Question. 4.1.5 Security A DW is a veritable gold mine of information as all of the organization‘s critical information is readily available in a format easy to retrieve and use. The DW system must publish data to those who need to see it, while simultaneously protecting the data. On the one hand, the DW team is judged by how easily the business user can access the data, and on the other hand, the team is blamed if sensitive data gets into the wrong hand or if data is lost. Therefore, security is very important for the success of the DW even if some organizations seem to ignore this fact. User access security is usually implemented through several methods (Kimball et al., 2008; Moss & Atre, 2003; Ponniah, 2001): Authentication – the process of identifying a person, usually based on a logon ID and password. This process is meant to ensure that the person is who he or she claims to be. There are several levels of authentication depending on how sensitive the data is. The first level consists of a simple, static password, followed by a system-enforced password pattern and periodically required changes. An organization with a DW solution should at least have this security method implemented. Role-based security – databases usually offer role-based security. A role is just a grouping of users with some common requirements for accessing the database. Once the roles are created, users can be set up in the appropriate roles and access privileges may be granted at the level of a role. A privilege is an authorization to perform a particular operation; without explicitly granted privileges, a user cannot access any information in the database. While privileges let you restrict the types of operations a user can perform, managing these privileges may be complex. To address the complexity of privilege management, database roles encapsulate one or more privileges that can be granted to and revoked from users. Tool-based security – tool-based security is usually not as flexible as role-based security at database level. Nevertheless, tool-based security can form some part of the security solution. However, if the DW team is planning to use the DBMS itself for security protection, then toolbased security may be considered redundant. Authorization – the process of determining what specific content a user is allowed to access. Once users are authenticated, the authorization process defines the access policy. Authorization is a more complex problem in the DW system than authentication because limiting access can have significant maintenance and computational overhead. No matter of the chosen security strategy, a very important and hard to achieve goal is to establish a security policy for the DW compliant with the organizational security policy and to implement and integrate this security at a companywide level (Ponniah, 2001). - 30 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Maturity Assessment Question(s) From the most important aspects related to security presented above – related to the way security is implemented for the DW – we came up with the following maturity question for this DW component: 5) To what degree is security implemented in your DW architecture? a) Very low – No security implemented b) Low – Authentication security c) Moderate – Independent tool-based security d) High – Role-based security at database level e) Very high – Integrated companywide authorization security Table 9: Security Maturity Assessment Question. 4.1.6 Business Rules for DW Business rules are abstractions of the policies and practices of a business organization. They reflect the decisions needed to accomplish business policy and objectives of an organization (Kaula, 2009). Business rules are used to capture and implement precise business logic in processes, procedures, and systems (manual or automated). Therefore, business rules are an important aspect when implementing a DW. Example of business rules used in a DW are: different attributes, ranges, domains, operational records, etc. Business rules can serve different purposes in the development of a DW (Ponniah, 2001): They are very important for data quality and integrity. In order to have the right data in the DW, it is important that the values of each data item adhere to prescribed business rules. For example, in an auction system, the sale price cannot be less than the reserve price. Many data quality problems are determined by violation of such business rules. An example would be when an employee record comes up with the number of days (i.e.: days worked in a year plus vacation days, holidays and sick days) more than 365 or 366. They are a source for business metadata. They should be taken into consideration when requirements are defined They should be used for data modelling and applied for the extraction and transformation of data. Maturity Assessment Question(s) To sum up, an enterprise that properly documents and actually follows its business rules will have a better DW and will also manage change better than one that ignores its rules. Being hard to assess at a high level which business rules are defined and implemented, we decided to include a more general assessment question that can be seen here. 6) To what degree have you defined and documented definitions and business rules for the necessary transformations, key terms and metrics? a) Very low – No business rules defined b) Low – Most of the business rules defined and documented c) Moderate – Few business rules defined and documented d) High – Some business rules defined and documented e) Very high – All business rules defined and documented. 7) To what degree have you implemented definitions and business rules for the necessary transformations, key - 31 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu terms and metrics? a) Very low– No business rules implemented b) Low – Most of the business rules implemented c) Moderate – Few business rules implemented d) High – Some business rules implemented e) Very high – All business rules implemented. Table 10: Business Rules Maturity Assessment Questions. 4.1.7 DW Performance Tuning DWs usually contain large volumes of data. At the same time, they are query-centric systems and hence, the need to process queries faster dominates. That is the reason why various methods are needed to improve performance (Ponniah, 2001): - - - - Software performance improvement – the most often used are (Chauduri & Dayal, 1997): index management – indexes are database objects associated with database tables and created to speed up access to data within the tables. Indexing techniques have already been in existence for decades for transactional systems, but in order for them to handle large volume of data and complex queries common in DWs, some new or modified techniques have to be implemented for indexing the DWs (Vanichayobon & Gruenwald, 2004). The most used indexing techniques for data warehousing are: B-tree index, bitmap index, projection index. data partitioning – typically, the DW holds some very large database tables. Loading these tables can take excessive time; building indexes for large tables can also create problems sometimes. Therefore, another solution for performance tuning is data partitioning which mean deliberate splitting of a table and its index into manageable parts. parallel processing – major performance improvement can be achieved if the processing is split into components that are executed in parallel. The simultaneous concurrent executions will produce the results faster. Parallel processing techniques work in conjunction with data partitioning schemes. They are usually features of the used DBMS and some physical options are also critical for effective parallel processing. view materialization – many queries over DWs require summary data, and therefore use aggregates. Hence, besides the detailed data, the DW needs to contain summary data. Materializing summary data on different parameters can help to accelerate many common queries by significantly speeding up query processing, Hardware performance improvement – scale the DW server to match the query requirements, tune the DW computing platform (i.e.: a set of hardware components and the whole network). Specialized DW appliances or DW Cloud Computing – an overview on the former was given in 4.1.3. Cloud Computing is the latest trend in data warehousing/BI and it is not very mature yet. Some of the advantages of Cloud Computing are: performance - better query and data load performance; simplicity – rapid time to value and simple tools for agile provisioning and simplified management; elasticity – scale on demand; low acquisition and maintenance costs – price based on utilization. - 32 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Maturity Assessment Question(s) An organization that has a DW in place usually starts its performance tuning with the first category (i.e.: software tuning), and if this does not pay off, they continue with the second option (i.e.: hardware tuning). However, the organizations with a lot of experience in data warehousing understand that the best solution to improve performance is to buy a DW specialized appliance or to resort to the latest trend, DW cloud computing. Therefore, the maturity question for performance tuning can be depicted in the table below. 8) To what degree do you use methods to increase the performance of your DW? a) Very low – No methods to increase performance b) Low – Software performance tuning (e.g.: index management, parallelizing and partitioning system, views materialization) c) Moderate – Hardware performance tuning (e.g.: DW server) d) High – Software and hardware tuning e) Very high – Specialized DW appliances (e.g.: Netezza, Teradata) or cloud computing. Table 11: Performance Tuning Maturity Assessment Question. 4.1.8 DW Update Frequency The classical DW solutions were built for strategic and tactical BI that would help executives or line-ofbusiness mangers develop and assess progress in achieving long-term enterprise goals. This uses historical data which is one day to a few months or even years old. However, this tradition has been changing lately. With ever-increasing competition and rapidly changing customer needs and technologies, enterprise decision makers are no longer satisfied with scheduled analytics reports, pre-configured KPIs or fixed dashboards. They demand ad hoc queries to be answered quickly, they demand actionable information from analytic applications using real-time business performance data, and they demand these insights be accessible to the right people exactly when and where they need them (Azvine, 2005). Therefore, real time processing is an increasingly common requirement in data warehousing, as more and more business users expect the DW to be continuously updated throughout the day and grow impatient with stale data. However, building a real time DW/BI system requires gathering a very precise understanding of the true business requirements for real time data and identifying an appropriate ETL architecture that incorporates a variety of technologies integrated with a solid platform. Maturity Assessment Question(s) As a conclusion, one could say that an organization that does real-time data warehousing is a very mature one as it probably has optimized processes and ETL. Real time data warehousing is however, a very complex activity and it is hard to judge from a high level point of view if it is done successfully and with a high data quality. But, we will tackle this problem by including here a maturity question regarding the update frequency of the DW and another question in the ETL part that will assess its complexity and performance. 9) Which answer best describes the update frequency for your DW? a) Level 1 – Monthly update or less often b) Level 2 – Weekly update c) Level 3 – Daily update - 33 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu d) Level 4 – Inter-daily update e) Level 5 – Real-time update. Table 12: Update Frequency Maturity Assessment Question. 4.2 Data Modelling 4.2.1 Data Modelling Definition and Characteristics A data model is ―a set of concepts that can be used to describe the structure of and operations on a database‖ (Navathe, 1992). By structure of a database, (Navathe, 1992) refers to the data types, relationships and constraints that define the ―template‖ of that database. Hence, data modelling is the process of creating a data model. Furthermore, data modelling is very important for creating a successful information system as it defines not only data elements, but also their structures and relationships between them. Data modelling techniques are used to model data in a standard, consistent, predictable manner in order to manage it as a resource. Some authors like (Simsion & Witt, 2005) consider the data model to be ―the single most important component of an information system‘s design‖ due to several reasons: leverage – a small change to the data model may have major impact on the system as a whole. Problems with data organization arise not only from failing to meet initial business requirements, but also from expensive changes to the business after the database had been built. conciseness – a data model is a very powerful tool for expressing information systems requirements and capabilities whose value lies partly in conciseness. data quality – a data model plays a key role in achieving good data quality by establishing a common understanding o what is to be held in each table and column. 4.2.2 Data Models Classifications (Data Models Levels and Techniques) Throughout time, there were a lot of data models developed that can be classified mainly along two dimensions (Navathe, 1992): a) the first dimension deals with the steps of the overall database design activity to which the model applies. The classic database design process consists of mapping requirements of data and applications successively through the following steps (Navathe, 1992): conceptual design, logical design and physical design. (Golfarelli & Rizzi, 1998) and (Husemann et al., 2000) propose a DW design approach similar to the traditional database design. Hence, we will consider the three sequential phases/levels in figure 8 to serve as a reference for a complete DW design process model: conceptual design, logical design, physical design. - 34 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Figure 8: DW Design Process Levels (adapted from (Husemann et al., 2000)). Conceptual Design Conceptual design translates user requirements into an abstract representation understandable to the user that is independent of implementation issues, but is formal and complete, so that it can be transformed into the next logical schema without ambiguities (Tryfona et al., 1999). The conceptual data model is usually represented as a diagram with supporting documentation (Simsion & Witt, 2005) (e.g.: high level model diagram as described by (Kimball et al., 2008) for dimensional modelling). Logical Design Logical design models data using constructs that are easy for users to follow, avoid physical details of implementation, but typically depend on the kind of DBMS used in the implementation (e.g.: relational data model, dimensional data model, etc.) (Navathe, 1992). It is usually the most often implemented and it makes the connection between the conceptual design and the physical one. Logical design is still easily understood by users, and it does not deal with the physical implementation details yet. It only deals with defining the types of information that are needed. Physical Design Physical design incorporates any necessary changes to achieve adequate performance and consists of a variety of choices for storage of data in terms of clustering, partitioning, indexing, directory structure, access mechanisms, etc. (Navathe, 1992; Simsion & Witt, 2005). Some guidelines on developing concepts for describing physical implementations along the lines of a data model can be found in (Batory, 1988). b) the second dimension deals with the flexibility (i.e.: the ease with which a model can deal with complex application situations) and expressiveness of the data model (i.e.: the ease with which a model can bring out the different abstractions and relationships in an involved application) and it includes mainly the following types of models: record-based data models, semantic data models and object-based models. For an overview on these models, see (Navathe, 1992). - 35 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu In this section we will briefly describe two of the most often used data modelling techniques in data warehousing: entity-relationship data models, relational data models. We will have a separate paragraph for dimensional modelling because, as mentioned before, we will focus on this data modelling technique in our research and we will have several questions in the data modelling maturity assessment questionnaire dedicated to dimensional modelling. Entity-Relationship (ER) Data Models Entity-relationship (ER) model proposed by (Chen, 1975), is one of the most famous semantic data models, and it has been a precursor for many subsequent variations. It is used mainly for conceptual design and the basic constructs in the ER model are (Chen, 1975): entities – An entity is recognized as being capable of an independent existence which can be uniquely identified. It is an abstraction from the complexities of a certain domain and it can be a physical object, an event or a concept. Entities can be viewed as nouns. relationships – A relationship captures how two or more entities are related to one another. Relationships can be thought of as verbs, linking two or more nouns. attributes – An attribute expresses the information about an entity or a relationship which is obtained by observation or measurement. Moreover, entities, relationships and attributes are classified in sets and this is what ER models usually show. However, the distinction between entities and relationships or entities and attributes can sometimes be fuzzy and it should be clarified for each particular environment. In conclusion, the ER model is fairly simple to use, has been formalized and has a reasonably unique interpretation with an easy diagrammatic notation. It has remained a favourite as a means for conceptual design as an easy way of communication in the early stages of database design. It is also used for conceptual DW design (for both Inmon‘s and Kimball‘s views), but especially for enterprise-wide DWs when applying Inmon‘s view on developing DWs. Relational Data Models The relational data model is a record-base data model proposed by (Codd, 1970). It became a landmark development in this area because it provided a mathematical basis to the discipline of data modelling. The fundamental assumption is that all data are represented as mathematical n-ary relations, an n-ary relation being a subset of the Cartesian product of n domains given as sets (i.e.: S1, S2, …, Sn, not necessarily distinct). A relation on n sets can also be defined as a set of n tuples each of which has its first element from S1, its second element from S2, and so on (Codd, 1970). These relations are organized in the form of tables which consist of tuples (rows) of information defined over a set of attributes (columns). The attributes, in turn, are defined over a set of atomic domains of values. The data from the model are operated upon by means of a relational algebra, which includes operations of selection, projection, join as well as set operations of union, intersection, Cartesian product, etc. Moreover, there are two types of constraints that apply for this model: the entity integrity constraint – guarantees the uniqueness of a table‘s key; - 36 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu the referential integrity constraint – guarantees that whenever a column in one table derives values from a key of another table, those values must be consistent. Due to its simplicity of modelling, the relational data model gained a wide popularity among business applications developers. It is usually used to capture the microscopic relationships among data elements and eliminate data redundancies. It is extremely beneficial for transaction processing because it makes transaction loading and updating simple and fast. However, it is also used for DW design as the logical model when following Inmon‘s view on developing DWs. Maturity Assessment Question(s) As we are not going to judge which data modelling technique is better for data warehousing, we considered that two significant characteristics that could determine the maturity of this category for a DW are: the synchronization (i.e.: establishing consistency among data from a source to a target data storage and vice versa and the continuous harmonization of the data over time) between all the data models found in the DW (i.e.: ETL source and target models, DW and data marts models, BI models); and the differentiation between data models levels (i.e.: physical, logical and conceptual). Companies usually ignore the conceptual level as, at first, they do not see any benefits from it. However, in time, some of them realize that it is very important for a solid and consistent data modelling and start designing it. 1) Which answer best describes the degree of synchronization between the following data models that your organization maintains and the mapping between them: ETL source and target models; DW and data marts models; BI semantic or query object models? a) Automatic synchronization of all of the data models b) Manual synchronization of some of the data models c) No synchronization between data models d) Manual or automatic synchronization depending on the data models e) Automatic synchronization of most of the data models. 2) To what degree do you differentiate between data models levels: physical, logical and conceptual? a) No differentiation between data models levels b) All data models have conceptual, logical and physical levels designed c) Logical and physical levels designed for some data models d) Conceptual level also designed for some data models e) Logical and physical levels designed for all the data models. Table 13: Data Model Synchronization and Levels Maturity Assessment Questions. 4.2.3 Dimensional Modelling ER diagrams and relational modelling are popularly used for database design in OLTP environments, but also in DWs. However, the database designs recommended by ER diagrams are considered by some authors to be inappropriate for decision support systems where efficiency in querying and in loading data is very important (Chauduri & Dayal, 1997). Relational (i.e.: normalized) modelling has some characteristics that are appropriate for OLTP systems, but not for DWs: its structure is not easy for end-users to understand and use. In OLTP systems this is not a problem because, usually, end-users interact with the database through a layer of software. - 37 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu data redundancy is minimized. This maximizes efficiency of updates, but tends to penalize retrievals. Data redundancy is not a problem in DWs because data is not updated on-line. Dimensional modelling came as a solution to these problems. It was proposed by (Kimball, 1996) and has been adopted as the predominant approach to designing DWs and data marts in practice (Moody & Kortink, 2000). Dimensional modelling is a logical design technique for structuring data so that it is intuitive to business users and delivers fast query performance (Kimball, 1996). The main advantages of dimensional modelling are (Kimball et al., 2008; Ponniah, 2001): understandability, query performance and flexibility. Dimensional modelling divides the world into: measurements – Measurements are captured by the organization‘s business processes and their supporting operational source systems. They are usually numeric values and are called facts. context – Facts are surrounded by largely textual context that is true at the moment the fact is recorded. This context is intuitively divided into independent logical parts called dimensions. Each of the organization‘s business processes can be represented by a dimensional model that consists of a fact table containing the numeric measurements surrounded by several dimension tables containing the textual context. This star-like structure is often called a star join (Kimball et al., 2008). Dimensional models can be stored in: a relational database platform (i.e.: a ROLAP server) – they are typically referred to as star schemas. multidimensional online analytical structures (i.e.: MOLAP servers) – they are typically called cubes. An example of a star schema and a cube can be seen in the figure below. Figure 9: Star Schema vs. Cube (adapted from (Chauduri & Dayal, 1997)). However, star schemas do not explicitly provide support for attribute hierarchies and sometimes, snowflake schemas are used. They provide a refinement of star schemas where the dimensional hierarchy - 38 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu is explicitly represented by normalizing the dimension tables. This leads to advantages in maintaining the dimension tables. However, the denormalized structure of the dimensional tables in star schemas may be more appropriate for browsing the dimensions. There are also other structures used for dimensional modelling (e.g.: fact constellations), but the ones we presented are the most often implemented. For more information on dimensional modelling, see (Kimball, 1996) and (Kimball et al., 2008). Fact Tables Fact tables store the performance measurements generated by the organization‘s business activities or events. The value of fact is usually not known in advance because it is variable and the fact‘s valuation occurs at the time of the measurement event. Two aspects are important when analyzing fact tables (Kimball et al., 2008): Fact table keys – they are characterized by a multipart key made up of foreign keys coming from the intersection of the dimension tables involved in the business process. This shows that a fact table always expresses a many-to-many relationship. Fact table granularity – it refers to the level of detail of the data stored in a fact table. High granularity refers to data that is at or near the transaction level, data referred to as atomic level data. Low granularity refers to data that is summarized or aggregated, usually from the atomic level data. Dimension Tables In contrast to the rigid qualities of fact tables consisting of only keys and numeric measurements, dimension tables are filled with a lot of descriptive fields. In many ways, the power of the DW is proportional to the quality and depth of the dimension attributes as robust dimensions translate into robust querying and analysis capabilities. The most important aspects when analyzing dimension tables are (Kimball et al., 2008): Dimension table keys – whereas fact tables have a multipart key, dimension rows are uniquely identified by a single key field. It is recommended that surrogate keys should be used and not the keys that were used in the source systems. These surrogate keys are meaningless and they merely serve as join fields between the fact and dimension tables. For practical reasons, they are usually represented as simple integers assigned in sequence. Conformed dimensions – they are dimensions that adhere to the same structure and are shared across the enterprise‘s DW environment, joining to multiple fact tables representing various business processes. Conformed dimensions are either identical or strict mathematical subsets of the most granular, detailed dimension. Dimension tables are not conformed if the attributes are labeled differently or contain different values. Hierarchies – A hierarchy is a set of parent-child relationships between attributes within a dimension. These hierarchy attributes, called levels, roll up from child to parent, for example, Customer totals can roll up to Sub-region totals which can further rollup to Region totals. Another example would be: Daily sales roll up to Weekly sales, which rollup to Month to Quarter to Yearly sales. Slowly changing dimensions - the dimensional model needs to track time-variant dimension attributed as required by the business requirements. There are mainly three techniques for - 39 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu handling slowly changing dimensions (SCDs): type 1 – overwrite of one or more attributes in an existing dimension row; type 2 – copy the previous version of the dimension row and create a new row with a new surrogate key; type 3 – add and populate a new column of the dimension table with the previous values and populate the original column with the new values. Of course, these techniques will sometimes be used in a hybrid approach for better management. Special dimensions – These are dimensions that are only sometimes needed, but they involve knowledge and experience to be successfully built: mini dimensions (i.e.: dimensions created by the possible combination of the frequently analyzed or frequently changed attributes of the rapidly changing large dimensions); large dimensions (i.e.: dimensions with a very large number of rows or with a large number of attributes), junk dimensions (i.e.: structures that provide a convenient place to store junk attributes such as transactional codes, flags and/or text attributes that are unrelated to any particular dimension), etc. Maturity Assessment Question(s) The maturity assessment part for dimensional modelling includes three questions on the most important characteristics for fact and dimensional tables. The best approach for designing fact tables is to have a very high percentage of data at a low level of granularity in order to be able to do analysis at whichever level of aggregation. Regarding the dimension tables, the implementation of slowly changing dimensions and special dimensions implies advanced knowledge and experience, and are therefore specific to organizations on a higher maturity stage. 3) What percentage of all your fact tables has their granularity at the lowest level possible? a) Very few fact tables have their granularity at the lowest level possible b) Few fact tables have their granularity at the lowest level possible c) Some fact tables have their granularity at the lowest level possible d) Most fact tables have their granularity at the lowest level possible e) All fact tables have their granularity at the lowest level possible. 4) To what degree do you design conformed dimensions in your data models? a) No conformed dimensions b) Conformed dimensions for few business processes c) Enterprise-wide standardized conformed dimensions for most business processes; also making use of a high level design technique such as an enterprise bus matrix d) Conformed dimensions for some business processes e) Enterprise-wide standardized conformed dimensions for all business processes. 5) Which answer best describes the current state of your dimension tables modelling? a) Few dimensions designed; no hierarchies or surrogate keys designed b) Some dimensions designed with surrogate keys and basic hierarchies (if needed) c) Most dimensions designed with surrogate keys and basic/complex hierarchies (if needed) d) Slowly changing dimensions techniques (i.e.: type 2, 3 and more) also designed e) Besides regular dimensions and slowly changing dimensions techniques, special dimensions are also designed (e.g.: mini, monster, junk dimensions). Table 14: Dimensional Modelling Maturity Assessment Questions. - 40 - DWCMM: The Data Warehouse Capability Maturity Model 4.2.4 Catalina Sacu Data Modelling Tool Data models can be created by just drawing the models in different spreadsheets and documents. However, the more optimum solution is to use a data modelling tool. The main advantages of using a data modelling tool are (Kimball et al., 2008): It makes the connection and transition between all the data models levels easier. It integrates the DW model with other corporate data models. It helps assure consistency in naming and definition. It creates good documentation in a variety of useful formats. It makes metadata management for data modelling easier. However, the most important benefits of using a data modelling tool refer to making the design itself and metadata management easier and more efficient. Maturity Assessment Question(s) As the usage of a data modelling tool can be a differentiator for an organization developing a DW solution, we included in our assessment a maturity question derived from the information provided above: 6) Which answer best describes the usage of a data modelling tool in your organization? a) Level 1 – No data modelling tool b) Level 2 – Scattered data modelling tools used only for design c) Level 3 – Scattered data modelling tools used also for maintenance d) Level 4 – Standardized data modelling tool used only for design e) Level 5 – Standardized data modelling tool used for design and maintaining metadata. Table 15: Data Modelling Tool Maturity Assessment Questions. 4.2.5 Data Modelling Standards DW Standards Overview Standards in a DW environment are necessary and cover a wide range of objects, processes, and procedures. Standards range from how to name the fields in the database to how to conduct interviews with the user departments for requirements definition. Standards do not need only to be defined and documented, but it is very important to actually implement them and use them constantly. The definition of standards would also benefit if a person or a group in the DW would be designated to revise the standards and keep them up-to-date. By consistently applying standards, it will be much easier for the business users and developers to navigate the complex DW system. Standards also provide a consistent means for communication. Effective communication must take place among the members of the project and the users. Standards ensure consistency across the various areas leaving less room for ambiguity. Therefore, one could say that the importance of standards cannot be overemphasized (Ponniah, 2001). This is why many companies invest a lot of time and money to prescribe standards for their information systems and implicitly, for their DW. - 41 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu As can be seen, standards can be defined and implemented for every part of the DW architecture and processes and this is why we will include questions regarding the definition and implementation of standards for the maturity assessment of each of the major components – data modelling, ETL, BI applications. Data Modelling Standards With regard to data modelling, standards are many and diverse. They can be applied to all the data models levels (i.e.: conceptual, logical and physical) and most often standards like naming conventions for the objects and attributes in the data models take on special significance. Other standards here refer to the way one data model is derived from the other, the way metadata is documented or how data quality is taken care of in this phase. Maturity Assessment Question(s) All the maturity assessment questions related to standards will address general aspects such as the definition and documentation of standards and their actual implementation. The same principle applies for data modelling. There is an important distinction between having some standards defined and written down somewhere and actually following those standards. 7) To what degree have you defined and documented standards (e.g.: naming conventions, metadata, etc.) for your data models? a) Very low – No standards defined for data models b) Low – Solution-dependent standards defined for some of the data models c) Moderate – Enterprise-wide standards defined for some of the data models d) High – Enterprise-wide standards defined for most of the data models e) Very high – Enterprise-wide standards defined for all the data models. 8) To what degree have you implemented standards (e.g.: naming conventions, metadata, etc.) for your data models? a) Very low – No standards defined for data models b) Low – Solution-dependent standards defined for some of the data models c) Moderate – Enterprise-wide standards defined for some of the data models d) High – Enterprise-wide standards defined for most of the data models e) Very high – Enterprise-wide standards defined for all the data models. Table 16: Data Modelling Standards Maturity Assessment Questions. 4.2.6 Data Modelling Metadata Management Data models usually need a lot of metadata (business and technical) to be documented in order to create consistency and understandability for both developers and users. A common subset of business metadata components as they apply to data includes (Moss & Atre, 2003): data names, definitions, relationships, identifiers, types, lengths, policies, ownership, etc. The standardization of the metadata documentation is also critical for integration among data models. Hence, the maturity question depicted below. Maturity Assessment Question(s) 9) To what degree have you documented the metadata (e.g.: definitions, business rules, main values, data quality, - 42 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu etc.) in your data models? a) Very low – No documentation for any data models b) Low – Non standardized documentation for some of the data models c) Moderate – Standardized documentation for some of the data models d) High – Standardized documentation for most of the data models e) Very high – Standardized documentation for all the data models. Table 17: Data Modelling Metadata Management Maturity Assessment Questions. 4.3 Extract – Transform – Load (ETL) 4.3.1 What is ETL? The Extract-Transform-Load (ETL) process is part of the DW back room component. As the name shows, the ETL process involves the following activities: extracting data from outside sources; transforming data to fit the target‘s requirements; loading data into the target database. According to (Kimball et al., 2008), there is also a forth component of the ETL system, called managing the ETL environment. This component is very important as in order for the ETL processes to run consistently to completion and be available when needed, they have to be managed and maintained. These activities are also part of the DW maintenance and monitoring processes, but there are some important technical components that need to be implemented and this is why we will also elaborate on it in this paragraph. Moreover, (Kimball et al., 2008) propose 34 subsystems that form the ETL architecture and divide them for every ETL main activity (i.e.: extract, transform, load and manage). However, even if the name seems to be understood by everyone, nobody can say why the ETL system is so complex and resource demanding (Kimball et al., 2008). Easily, 60 to 80 percent of the time and effort of developing a DW project is devoted to the ETL system (Nagabhushana, 2006). Building an ETL system is very challenging because many outside constraints put pressure on the ETL design: business requirements, source data systems, budget, processing windows and available staff skills. Hence, designing ETL processes is extremely complex, often prone to failure, and time consuming (Simitsis et al., 2005). However, since it is extensively recognized that the design and maintenance of the ETL processes are a key factor in the success of a DW project (March & Hevner, 2007; Solomon, 2005), organizations put a lot of effort into implementing a powerful ETL system. In order to formulate the chosen maturity questions for this category, we would like to first give a short overview on each ETL component. 4.3.2 Extract The extraction system is the first of the ETL architecture. It addresses the issues of understanding the source data, extracting the data and transferring it to the DW environment where the ETL system can operate on them independent of the operational systems (Kimball et al., 2008). Depending on the DW - 43 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu architecture, the extracted data may go directly into the DW or in a data staging area. Extraction essentially resumes to two questions (Loshin, 2003): What data should be extracted? How should that data be extracted? The first question essentially relies on which results clients expect to see in their BI applications. However, the answer is not that simple as it depends on what source data we have and also, the data model that the architects had previously developed. The answer to the second question may depend on the scale of the project, the number and disparity of the data sources, and how far into the implementation the developers are. Extraction can be as simple as a collection of simple SQL queries or as complex as to require ad hoc, specially designed programs written in a proprietary programming language (Loshin, 2003). The other alternative is to use tools to help automate the process and obtain better results. Depending on the organization and the data warehouse project, data can be extracted from various source systems. Moreover, according to (Kimball et al., 2008), there are three subsystems that support the extraction process: Data profiling system – it does the technical analysis of data to describe its content, consistency and structure. It focuses on the instance analysis of individual attributes, providing information such as data type, length, value range, uniqueness, occurrence of null values, typical string pattern, etc. (Rahm & Hai Do, 2000). The profiling step protects the ETL team from dealing with dirty data and provides them guidance to set expectations regarding realistic development schedules, limitations in the source data and the need to invest in better source data capture practices. Change data capture (CDC) system – It will offer the capability to transfer only the source data that has changed since the last load. This is not important at the first historic load, but it will prove very useful from this point forward. Implementing the CDC system is not an easy task. For more information on how to capture source data changes, see (Kimball et al., 2008). Extract system – This is a fundamental component of the ETL architecture and it refers to the data extraction itself whether it is done by writing scripts or by using a tool. Sometimes, data has to be extracted from only one system, but most of the times, each source might be in a different system. There are two primary methods for getting data from a source system: as a file or as a stream that is constructing the extract system as a single process. Other two important aspects that need to be taken into consideration in the extraction phase are: data compression – important when large amounts of data have to be transferred through a public network; and data encryption – important for security reasons. 4.3.3 Transform The transformation step is where the ETL system adds value to the data through the changes it makes. Usually, this phase includes cleaning and transforming the data according to the business rules and standards that have been established for the DW. - 44 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Data Cleaning Data cleaning, also called data cleansing or scrubbing, is part of the complex and important data quality processes. It deals with detecting and removing errors and inconsistencies from data in order to improve their quality (Rahm & Hai Do, 2000). As DWs are used for decision making, the correctness of their data is very important to avoid wrong results. ―Dirty data‖ (e.g.: duplicates, missing data) will produce incorrect statistics proving the concept of ―garbage in, garbage out‖. Hence, due to the wide range of possible data inconsistencies and large data volume, data cleaning is considered to be one of the biggest problems in data warehousing. However, many organizations do not cleanse their data and believe that this is the responsibility of the source systems. Qualitative or accurate data means that data are (Kimball & Caserta, 2004): correct – the values and descriptions in data describe their associated objects truthfully and faithfully; unambiguous – the values and descriptions in data can be taken to have only one meaning; consistent – the values and descriptions in data use one constant notational a convention to convey their meaning; complete – the individual values and descriptions in data are defined (not null) for each instance; and the aggregate number of records is complete. Even if most often data cleansing is done manually or by low-level programs that are difficult to write and maintain, data quality tools are available to enhance the quality of the data at several stages in the process of developing a data warehouse. Cleansing tools can be useful in automating many of the activities that are involved in cleansing the data: parsing, standardizing, correction, matching and transformation. A part of the data quality process is represented by quality screens or tests that act as diagnostic filters in the data flow pipelines (Kimball et al., 2008). What is important here is the action taken when an error is thrown: 1) halting the process; 2) sending the offending record(s) to a suspense file for later processing; 3) merely tagging the data and passing it through to the next step in the pipeline in order. The last choice is of course the best one is it offers the possibility of taking care of data quality without aborting the job. Two other deliverables that can be of help in the data cleaning activities and are usually hard to implement are (Kimball & Caserta, 2004): the error-event schema – captures all error events that are vital inputs to data quality improvement. the audit dimension assembler – attaches metadata to each fact table as a dimension. This metadata is available to BI applications for visibility into data quality. Maturity Assessment Question(s) Data quality is very important for data warehousing due to the fact that if users do not trust data, they will not use the DW environment that will be considered a failure. At the same time, it is also one of the biggest DW challenges (Ponniah, 2001), as high data quality is very hard to achieve. Of course, when taking a first look at the DW, it is difficult to assess the actual data quality. This is why we included a question that checks whether a specific organization addresses data quality by identifying and solving - 45 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu data quality issues. The usage of data quality tools is of course a strong point and an organization that uses them will definitely get better results. 1) Which answer best describes the data quality system implemented for your ETL? a) Daily automation: yes / no; Specific data quality tools: yes / no; Identifying data quality issues: Solving data quality issues: no b) Daily automation: no; Specific data quality tools: no; Identifying data quality issues: yes; Solving quality issues: no c) Daily automation: yes / no; Specific data quality tools: yes / no; Identifying data quality issues: Solving data quality issues: yes d) Daily automation: no; Specific data quality tools: no; Identifying data quality issues: no; Solving quality issues: no e) Daily automation: yes; Specific data quality tools: yes; Identifying data quality issues: yes; Solving quality issues: yes. Table 18: Data Quality Maturity Assessment Questions. yes; data yes; data data Data Transformation Besides data cleaning, the transformation system literally transforms the data in accordance with the business rules and standards that have been established for the DW. Typical transformations that are implemented in a DW are (Nagabhushana, 2006): format changes - change data from different sources to a standard set of formats for the DW; de-duplication – compare records from multiple sources to identify duplicates and merge them into a unified one; splitting-up fields/integrating fields – split-up a data item from the source systems into one or more fields in the DW/integrate two or more fields from the operational systems into a DW field; derived values – compute derived values using agreed formulas (e.g.: averages, totals, etc.); aggregation – create aggregate records based on the atomic DW data; other transformations such as filtering, sorting, joining data from multiple source, transposing or pivoting, etc. 4.3.4 Load The DW load system takes the load images created by the extraction and transformation subsystems and loads these images directly into the DW. A good load system should be able to perform the following activities (Kimball et al., 2008; Nagabhushana, 2006): generate surrogate keys – create standard keys for the DW separate from the source systems keys; manage slowly changing dimensions (SCDs); handle late arriving data – apply special modifications to the standard processing procedures to deal with late-arriving fact and dimension data; drop indexes on the DW when new records are inserted; load dimension records; load fact records; compute aggregate records using base fact and dimension records; - 46 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu rebuild or regenerate indexes once all loads are complete; log all referential integrity violations during the load process. Maturity Assessment Question(s) The maturity assessment question for this category aims to give an overview on the general complexity and performance of ETL. Once again, we are not trying to judge how certain activities are done, but only if they exist. As mentioned before, the latest trend in this field is real-time data warehousing which puts a lot of pressure on ETL. Hence, the highest level of maturity for ETL involves real-time capabilities. 2) Which answer best describes the complexity of your ETL? a) Simple ETL that just extracts and loads data into the data warehouse b) Basic ETL with simple transformations such as: format changes, sorting, filtering, joining, deriving new calculated values, aggregation, etc and surrogate key generator c) Advanced ETL capabilities: slowly changing dimensions manager, reusability, change data capture system, de-duplication and matching system, data quality system d) More advanced ETL capabilities: error event table creation, audit dimension creation, late arriving data handler, hierarchy manager, special dimensions manager e) Optimized ETL for a real time DW (real-time ETL capabilities). Table 19: ETL Complexity Maturity Assessment Question. 4.3.5 Manage In order for the DW project to be a success, the ETL processes need to reliable, available and manageable. This is the reason why (Kimball et al., 2008) consider the management subsystem to be the forth component of the ETL system. They propose 13 subsystems to be included in this ETL component. Some of them can also be found in (Nagabhushana, 2006; Chauduri & Dayal, 1997), but they are not grouped into a separate subsystem of the ETL process. The most important capabilities for a successful management of the ETL system are: an ETL job scheduler; a backup system; a recovery and restart system – it can be manual or automatic; a workflow monitor – ensures that the ETL processes are operating efficiently and gathers statistics regarding ETL execution or infrastructure performance. a version control and migration system – helps archiving and recovering all the logic and metadata of the ETL process and then migrate this information to another environment (for example, from development to test and on to production). a data lineage and dependency system – identifies the source of a data element and all intermediate locations and transformations for that data element. a security system – security is an important consideration for the ETL system and the recommended method is role-based security on all data and metadata in the ETL system; a metadata repository management. - 47 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Maturity Assessment Question(s) In order to assess the maturity of the management and monitoring of ETL, we separated the necessary activities in two categories: simple monitoring which is usually done first; and advanced monitoring which is usually implemented by an organization that already has some experience in this field. A critical aspect for ETL that can really make the difference is the restart and recovery system. An organization usually evolves from not having a restart and recovery system at all to a completely automatic restart and recovery system. However, the latter is very complex and prone to error and, therefore, it is very hard to achieve. 3) Which answer best describes the management and monitoring of your ETL? (Definitions: Simple monitoring (i.e.: ETL workflow monitor – statistics regarding ETL execution such as pending, running, completed and suspended jobs; MB processed per second; summaries of errors, etc.); Advanced monitoring (i.e.: ETL workflow monitor – statistics on infrastructure performance like CPU usage, memory allocation, database performance, server utilization during ETL; job scheduler – time or event based ETL execution, events notification; data lineage and analyzer system)) a) Restart and recovery system: no; Simple monitoring: no; Advanced monitoring: no; Real-time monitoring: no b) Restart and recovery system: no; Simple monitoring: yes; Advanced monitoring: no; Real-time monitoring: no c) Manual restart and recovery system: yes; Simple monitoring: yes; Advanced monitoring: yes / no; Realtime monitoring: no d) Manual and automatic restart and recovery system: yes; Simple monitoring: yes; Advanced monitoring: yes / no; Real-time monitoring: no e) Completely automatic restart and recovery system: yes; Simple monitoring: yes; Advanced monitoring: yes; Real-time monitoring: yes. Table 20: ETL Management and Monitoring Maturity Assessment Question. 4.3.6 ETL Tools There is a constant debate whether an organization should deploy custom-coded ETL solutions or should buy an ETL tool suite (Kimball & Caserta, 2004). Using hand-coded ETL proves helpful sometimes because it offers: object-oriented techniques that can make all the transformations consistent for error reporting, validation and metadata updates; metadata can be more directly managed; in-house programmers might be available; unlimited flexibility. However, even if programmers can set up ETL processes using hand-coded ETL, building such processes from scratch can become complex. That is the reason why companies are buying more and more often ETL tools for this purpose. There are some advantages for buying an ETL tool such as: simpler, faster, cheaper development; users without professional programming skills can use them effectively; integrated metadata repository; automated generated metadata at every step of the ETL process; in-line encryption and compression capabilities; good performance for very large data sets; possibility of augmenting the ETL tool with selected processing modules hand coded in an underlying programming language. - 48 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Maturity Assessment Question(s) As already mentioned, ETL can be built by using a programming language or by using an ETL tool, the latter being the more optimum solution. A company that uses hand-coded ETL usually does not have a very complex ETL process which shows a low level of maturity regarding ETL capabilities. However, in both cases, some standard scripts are sometimes needed which can increase the performance of ETL. From the expert interviews we had and from the exploratory case study we did, we came up with another possibility of generating ETL: complete ETL generated from metadata. This is rarely applied in practice nowadays, but it is the desired solution for the future. 4) Which answer best describes the usage of an ETL tool in your organization? a) Level 1 – Only hand-coded ETL b) Level 2 – Hand-coded ETL and some standard scripts c) Level 3 – ETL tool(s) for all the ETL design and generation d) Level 4 – Standardized ETL tool and some standard scripts e) Level 5 – Complete ETL generated from metadata. Table 21: ETL Tools Maturity Assessment Question. 4.3.7 ETL Metadata Management ETL is responsible for the creation and use of much of the metadata describing the DW environment. Therefore, it is important to capture and manage all possible types of metadata for ETL: business, technical and process metadata. Nevertheless, not many organizations manage to do this and thus, we decided to include the following maturity question regarding ETL in our assessment. Maturity Assessment Question(s) 5) To what degree is your metadata management implemented for your ETL? a) Very low – No metadata management b) Low – Business and technical metadata for some ETL c) Moderate – Business and technical metadata for all ETL d) High – Process metadata is also managed for some ETL e) Very high – All types of metadata are managed for all ETL. Table 22: ETL Metadata Management Maturity Assessment Question. 4.3.8 ETL Standards A general overview on standards used in data warehousing was given in 4.2.5. Standards specific to ETL are related to: naming conventions, set-up standards, recovery and restart system, etc. The maturity questions and stages below are straightforward. Maturity Assessment Question(s) 6) To what degree have you defined and documented standards (e.g.: naming conventions, set-up standards, recovery process, etc.) for your ETL? a) Very low – No standards defined b) Low – Few standards defined for ETL - 49 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu c) Moderate – Some standards defined for ETL d) High – Most standards defined for ETL e) Very high – All the standards defined for ETL. 7) To what degree have you implemented standards (e.g.: naming conventions, set-up standards, recovery process, etc.) for your ETL? a) Very low – No standards defined b) Low – Few standards defined for ETL c) Moderate – Some standards defined for ETL d) High – Most standards defined for ETL e) Very high – All the standards defined for ETL. Table 23: ETL Standards Maturity Assessment Questions. 4.4 BI Applications 4.4.1 What are BI Applications? BI applications are part of the front-room component of the DW architecture (Kimball et al., 2008) and are sometimes referred to as ―front-end‖ tools (Chauduri & Dayal, 1997). They are what the end-users see and hence, are very important in order for a DW to be considered a successful one. According to (March & Hevner, 2007), a crucial point for achieving DW implementation success is the selection and implementation of appropriate end-user analysis tools, because business benefits of BI are only gained when the system is adopted by its intended end-users. This is why BI applications must meet several design requirements such as (Kimball et al., 2008): be correct – BI applications must provide accurate results; perform well – queries should have a satisfactory response time; be easy to use – BI applications should be customized for each category of users; have a nice interface – BI applications should be clear and have an attractive design; be a long-term investment – BI applications must be properly documented, maintained, enhanced and extended. 4.4.2 Types of BI Applications Throughout time, BI applications have evolved from (simple) predefined reporting to (advanced) datamining tools to fulfill users‘ analytical needs (Breitner, 1997). Also, according to (Azvine et al., 2006), traditional BI applications fall into the following categories sorted by ascending complexity: report what has happened – standard reporting and query applications (i.e.: static/preformatted reports; interactive/parameter-driven reports); analyze and understand why it has happened – ad-hoc reporting and online analytical processing (OLAP); visualization applications (i.e.: dashboards, scorecards); predict what will happen – predictive analytics (i.e.: data and text mining). However, in the last couple of years, due to the development of real-time data warehousing, a new category of BI applications has developed called operational BI and closed-loop applications (Kimball et al., 2008). As the complexity of the BI applications contributes to the maturity of a DW environment, we will include a maturity question regarding this aspect and therefore, we will give a short overview on each type of BI applications in the remainder of this paragraph. - 50 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Standard Reporting and Query Applications This category of BI applications is usually considered to be the entry level BI tooling, providing end users with a core set of information about what is happening in a particular area of the business (Kimball et al., 2008). Standard reports are the reports the majority of non-technical business users look at every day. They represent an easy-to-use means to get the needed information with a very short learning curve. As presented above, two types of standard reporting can be distinguished, based on the level of data interactivity: static/preformatted reporting – the most basic form of reporting which can be seen as a repeatable, pre-calculated and non-interactive request for information. It is characterized by rigid evaluations of business facts presented in a standard format on a routine basis to a targeted audience (usually represented by casual users) (Eckerson, 2009). interactive/parameter-driven reporting – this kind of reporting offers the possibility of creating reports with dynamic content. End users now have some flexibility as they can choose from a predefined set of parameters to filter reports content to their individual preferences and needs (Turban et al., 2007). Once users get the view of the data they want, they can save the view as a report and schedule it to run on a regular basis. This allows reports designers to create reports that can serve multiple end users categories. Analytic Applications Analytic applications are more complex than standard reports. Although the latter offer the possibility of creating reports of all shapes and detail levels, in many cases additional information is required (Varga & Vukovic, 2008). This places higher requirements on the DW architecture and also on the end-users IT and analytical skills. Analytic applications offer the possibility of ad-hoc (or online) data access and complex analysis through a user friendly interface based system. In this way, users can formulate their own queries directly into the data without the need of in-depth knowledge of SQL or other database query languages. Probably the best known analytic technique is the Online Analytical Processing (OLAP), term coined by E.F. Codd in 1993. OLAP interfaces provide a fairly simple, yet extremely flexible navigation and presentation environment that enables end-users to gain insight into data through fast, dynamic, consistent, interactive access to a wide variety of possible views of information. This is possible due to the fact that data is characterized by multidimensionality, being structured as a cube, designed with dimensions and facts. OLAP users can navigate through the data cube using several operations such as (Breitner, 1997): roll-up (increasing the level of aggregation) and drill-down (decreasing the level of aggregation or increasing detail) along one or more dimension hierarchies, slice and dice (selection and projection to a certain layer or sub-cube) or pivot (re-orienting the multidimensional view of data). It can be seen that although the data cube is a simple structure, the large number of alternatives, including many numeric facts and dimensions and many hierarchies or abstraction levels combine to form an immense universe of queries that can be explored via an OLAP interface (Tremblay et al., 2007). Through OLAP, users can generate fast reports regardless of database size and complexity and they are allowed to define new ad-hoc calculations in any desired way without having advanced knowledge of SQL. - 51 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Visualization Applications Due to the flood of data available from information systems, standard reporting and analytic applications are often not enough for business analysts and decision-makers to make sense out of the knowledge they contain. This is the reason why, especially when dealing with large amounts of data, visualization techniques can be very useful to facilitate data analysis. Information visualization is defined by (Chung et al., 2005) as ―a process of constructing a visual presentation of abstract quantitative data. The characteristics of visual perception enable humans to recognize patterns, trends and anomalies inherent in the data with little effort in a visual display.‖ The main visualization applications that are usually used in BI are dashboards and scorecards. According to (Eckerson, 2006), there are three types of performance dashboards: operational dashboards – used to track core operational processes; tactical dashboards – used by managers and analysts to track and analyze departmental activities, processes and projects; strategic dashboards – used by executives and staff to chart their progress toward achieving strategic objectives. (Eckerson, 2006) states that dashboards are part of the first two categories, whereas scorecards are use at the strategic level. Furthermore, it can be said that dashboards present various key performance indicators (KPIs) (i.e.: key measures crucial to business strategy that must link to the organization‘s performance) in one screen view with intuitive displays of information (e.g.: tables, graphs, charts, dials, gauges, etc.) similar to an automobile control panel. Dashboards support status reporting and alerts generation across multiple data sources at a high level, but also allow drill down to more specific data (Kimball et al., 2008). As said above, scorecards are actually dashboards developed at a strategic level. They help executives monitor their progress toward achieving strategic objectives. A scorecard can track an organization‘s performance by measuring business activity at a summarized level and comparing these values to predefined targets. In this way, executives can determine what actions should be taken in order to improve performance. There are more types of scorecards, but the most implemented one is the balanced scorecard defined by (Kaplan & Norton, 1992). Data and Text Mining Applications (Predictive Analytics) Data and text mining applications are sophisticated BI applications that involve advanced methods for data analysis. It is a process that requires a lot of data which need to be in a reliable state before it can be subjected to the data mining process. A newer technique is text mining which refers to the process of deriving high-quality information from text. Data and text mining can also be found under the name of knowledge discovery or the newer term, predictive analytics. Data mining is defined by (Holsheimer & Siebes, 1994) as being ―the search for relationships and global patterns that exist in large databases, but are „hidden‟ among the vast amount of data‖; these relationships can then offer valuable knowledge about the database and the objects in the database. However, other researchers such as (Fayyad et al., 1996) consider that actually knowledge discovery refers to the overall process of discovering useful knowledge from data; whereas data mining refers to a particular step in this process that consists of ―applying data analysis and discovery algorithms that, under acceptable computational efficiency limitations, produce a particular enumeration of patterns over the data‖ (Fayyad et al., 1996). - 52 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Data mining relies on known techniques from fields like machine learning, pattern recognition, and statistics. It also uses a variety of methods such as (Fayyad et al., 1996): classification, regression, clustering, summarization, dependency modelling, change and deviation detection. Operational BI and Closed-loop Applications This category of applications is part of the real-time data warehousing requirement. It includes the use of applications that are more sophisticated than typical operational reports, but leverage the rich historical context across multiple business processes available in the DW to guide operational decision making. These applications also frequently include transactional interfaces back to the source systems. The goal of operational BI applications is to reduce the analysis latency – the time it takes to inform the person in charge of data analysis that new data has to be analyzed, the time needed to choose appropriate analysis models and the time to process the data and present the results (Seufert & Schiefer, 2005). Sometimes, these applications may be produced by accessing live operational data. In other cases, when a certain degree of data latency can be tolerated, the reports are produced using the information collected by the (near) real-time DW. Hence, in order to get accurate operational results, activities and processes involved in a DW project have to be optimized. Maturity Assessment Question(s) As can be seen from the short overview on the BI applications, the types of BI applications supported by the DW environment are an important indicator on its maturity. For example, an organization that develops predictive analytics certainly has experience in developing less complex applications such as adhoc reports or visualization applications. The highest level of maturity refers to the development of closed-loop and operational (real-time) BI applications as it is the last trend in this field and not many organizations have the necessary skills and experience to develop them. As user requirements can change very often and the time to deliver the updated BI applications is rather short, a characteristic that can act as a differentiator is the usage of standardized objects (e.g.: KPIs, metrics, attributes, templates, etc.). This being said, the maturity questions and stages can be seen below. 1) Which types of BI applications best describe the highest level purpose of your DW environment? a) Level 1 – Static and parameter-driven reports and query applications b) Level 2 – Ad-hoc reporting; online analytical processing (OLAP) c) Level 3 – Visualization techniques: dashboards and scorecards d) Level 4 – Predictive analytics: data and text mining; alerts e) Level 5 – Closed-loop BI applications; real-time BI applications. 2) To what degree are standardized objects (e.g.: KPIs, metrics, attributes, templates) used in your BI applications? a) Very low – Objects defined for every BI application b) Low – Some reusable objects for similar BI applications c) Moderate – Some standard objects and templates for similar BI applications d) High – Most similar BI applications use standard objects and templates e) Very high – All similar BI applications use standard objects and templates. Table 21: Table 24: BI Applications Maturity Assessment Question. - 53 - DWCMM: The Data Warehouse Capability Maturity Model 4.4.3 Catalina Sacu BI Applications Delivery Method As end users are interested only in the results they get from the BI applications, the easiness of accessing and delivering these results is critical for the success of the DW solution. The main BI applications delivery methods are: Physically (e.g.: on paper) or electronically (e.g.: by e-mail) delivered reports. Even if this method is easy to implement, it is the least mature and efficient way of delivering BI applications. Reports can be delivered manually or automatically. Direct tool-based interface. This is a more evolved delivery method as it offers a better interface for the users to use when they want to access their reports. It involves developing a set of reports and providing them to the users directly using the standard data access tool interface (Kimball et al., 2008). However, there might be some integration or accessibility problems if an organization uses more BI tools. A BI portal. Lately, the Web has become a popular environment for BI applications. The result of this is the development of a new delivery method, the BI portal, which is also the most evolved and difficult to implement and maintain. A BI portal will give the users a well organized, useful, easily understood place to find the tools and information they need (Kimball et al., 2008; Ponniah, 2001). Besides the structured BI applications, the BI portal should also offer functions such as information center and help, discussion forum, alerting, metadata browser, etc. A successful BI portal also needs to be highly interactive and always up-to-date. Maturity Assessment Question(s) From the information presented on BI applications delivery method, the maturity question we created for assessing this characteristic is straightforward. 3) Which BI applications delivery method best describes the highest level purpose of your DW? a) Level 1 – Reports are delivered manually physically (e.g.: on paper) or electronically (e.g.: by e-mail) b) Level 2 – Reports are delivered automatically by email c) Level 3 – Direct tool-based interface d) Level 4 – A BI portal with basic functions: subscriptions, discussions forum, alerting e) Level 5 – Highly interactive, business process oriented, up-to-date portal (no differentiation between operational and BI portals). Table 25: BI Applications Delivery Method Maturity Assessment Question. 4.4.4 BI Applications Tools As we saw for data modelling and ETL, the usage of tool(s) can really make the difference between organizations. This is the reason why we decided to also include a question regarding this aspect for BI applications. After having the expert interviews, we decided that a very low maturity level is represented by the usage of different BI tools for each data mart, whereas the highest maturity stage is reached when there is one standardized tool for main stream BI applications (i.e.: reporting and visualization applications which are most often developed) and one for specific BI applications (i.e.: data mining, financial analysis which are harder to implement, and usually are specific to each department). - 54 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Maturity Assessment Question(s) 4) Which answer best describes your current BI tool usage? a) Level 1 – BI tool related to the data mart b) Level 2 – More than two tools for main stream BI (i.e.: reporting and visualization applications) c) Level 3 – One tool recommended for main stream BI, but each department can use their own tool d) Level 4 – One standardized tool for main stream BI, but each department can use their own tool for specific BI applications (i.e.: data mining, financial analysis, etc.) e) Level 5 – One standardized tool for main stream BI and one standardized tool for specific BI applications. Table 26: BI Tools Maturity Assessment Question. 4.4.5 BI Applications Metadata Management As BI applications are what the end user sees, an important aspect is the accessibility of metadata. An overview on how this can be achieved was offered in 4.1.4. Therefore, an organization can evolve from showing no metadata to users to completely integrate metadata with the BI applications (e.g.: metadata can be accessed through one button push on the attributes). Maturity Assessment Question(s) 5) Which answer best describes the metadata accessibility to users? a) Very low – No metadata available b) Low – Some incomplete metadata documents that users ask for periodically c) Moderate – Complete up-to-date metadata documents sent to users periodically or available on the intranet d) High – Metadata is always available through a metadata management tool, different from the BI tool e) Very high – Complete integration of metadata with the BI applications (e.g.: metadata can be accessed through one button push on the attributes, etc.). Table 27: BI Applications Metadata Management Maturity Assessment Question. 4.4.6 BI Applications Standards A general overview on standards used in data warehousing was given in 4.2.5. Standards specific to BI Applications include: naming conventions, generic transformations, logical structure of attributes and measures, etc. Once again, we will not assess what standards are defined or implemented, but if this is done. The maturity questions and stages below are straightforward. Maturity Assessment Question(s) 6) To what degree have you defined and documented standards (e.g.: naming conventions, generic transformations, logical structure of attributes and measures) for your BI applications? a) No standards defined b) Few standards defined for BI applications c) Some standards defined for BI applications d) Most standards defined for BI applications e) All the standards defined for BI applications. 7) To what degree have you implemented standards (e.g.: naming conventions, generic transformations, logical structure of attributes and measures) for your BI applications? - 55 - DWCMM: The Data Warehouse Capability Maturity Model a) b) c) d) e) Catalina Sacu No standards implemented Few standards implemented for BI applications Some standards implemented for BI applications Most standards implemented for BI applications All the standards implemented for BI applications. Table 28: BI Applications Standards Maturity Assessment Questions. 4.5 Summary In this chapter we took a closer look at the DW technical solution category and its main sub-categories: general architecture and infrastructure, data modelling, ETL and BI applications. For each of them we identified the most important characteristics that might influence the maturity of the DW solution and we introduced the maturity assessment questions. We will continue with the second category in our model – the DW organization and processes – in the next chapter. - 56 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu 5 DW Organization and Processes When assessing the maturity of a DW technical solution, the processes and roles involved in the project also need to be analyzed. A good technical solution cannot be developed without the processes surrounding it as there is a strong interconnection between the two parts. It is more probable that an organization with standardized processes and formalized development roles will develop a better DW solution. At the same time, an organization cannot improve its processes without having some experience with previous DW projects. Therefore, in this chapter we will take a closer look at the second part of the DW maturity assessment questionnaire, the one regarding the organizational roles and processes necessary to develop and maintain a DW solution. 5.1 DW Development Processes A DW solution can be considered a software engineering project with some specific characteristics. And, therefore, as any software engineering project, it will go through several development stages (Moss & Atre, 2003). There have been several models or paradigms of software development defined in literature and applied in practice. Some of the most known ones are: the waterfall model, spiral development, iterative and incremental development, agile development, etc. For an overview on these models, see (Sommerville, 2007). Since DW/BI is an enterprise-wide evolving environment that is continually improved and enhanced based on feedback from the business community, the best approach for its development is the iterative and incremental development (Kimball et al., 2008; Ponniah, 2001). Due to its complexity, the approach for a DW project has to include iterative tasks going through cycles of refinement. (Kimball et al., 2008) also suggest that agile techniques fit best with the development of BI applications. Designing and developing the analytic reports and analyses involve unpredictable, rapidly changing requirements. The BI team members need to work in close proximity to the business, so that they can be readily available and responsive in order to release incremental BI functionality in a matter of weeks. However, one size seldom fits all, and therefore, it is important for organizations to be able to address the right methodology for each DW layer. Maturity Assessment Question(s) As it is hard to judge which software development paradigm is better and more mature, the first maturity question on development processes is a more general one and it refers to how the DW development processes map to the CMM levels: whether they are done ad-hoc or they are standardized. And if they are standardized, it is important to know if they are measured against defined goals and continuously improving. 1) Which answer best describes the DW development processes in your organization? a) Level 1 – Ad-hoc development processes; no clearly defined development phases (i.e.: planning, requirements definition, design, construction, deployment, maintenance) b) Level 2 – Repeatable development processes based on experience with similar projects; some development phases clearly separated c) Level 3 – Standard documented development processes; iterative and incremental development processes with all the development phases clearly separated - 57 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu d) Level 4 – Development processes continuously measured against well-defined and consistent goals e) Level 5 – Continuous development process improvement by identifying weaknesses and strengthen the process proactively, with the goal of preventing the occurrence of defects. Table 29: DW Development Processes General Maturity Assessment Question. 5.1.1 DW Development Phases No matter of the chosen DW development model, a lifecycle approach is needed in order to accomplish all the major objectives in the system development process (Ponniah, 2001). A DW system consists of numerous tasks, technologies, and team member roles. It is not enough to have the perfect data model or best-of-breed technology. The many facets of a DW project need to be coordinated and the lifecycle approach can do that by breaking down the project complexity and enforcing orderliness and a systematic approach to building the DW (Kimball et al., 2008). However, a one-size fits all lifecycle approach will not work for a DW project. The lifecycle approach has to be adapted to the special needs of the organization‘s DW project. But, no matter of the situational factors, the main high level phases and tasks required for an effective DW implementation are (Kimball et al., 2008; Moss & Atre, 2003): Project planning and management Requirements definition Design Development Testing and acceptance Deployment/production Growth, maintenance and monitoring. As the DW environment is continuously changing and improving, the first six phases are usually considered to be project-based, whereas the maintenance and monitoring should be done on an ongoing basis. However, many authors report that even today, software organizations do not have any defined processes for their software maintenance activities (April et al. 2004). (van Bon, 2000) confirms the lack of process management in software maintenance and that it is a mostly neglected area. Traditionally, maintenance has been depicted as the final activity of the software development process (Schneidewind, 1987). (Bennett, 2000) has a historical view of this problem, tracing it back to the beginning of the software industry when there was no difference between software development and software maintenance. But, starting with the 1980s, software maintenance began to be treated as a sequence of activities and not as the final stage of a software development project. Several standards were developed especially for software maintenance and nowadays, many organizations make this distinction between the development phases and the maintenance and monitoring processes. This is also the reason why we make this distinction, especially that in a DW project, maintenance and monitoring activities take a lot of time and effort. Therefore, we will elaborate on the first six phases in this section and will continue with the maintenance and monitoring activities in the DW service processes part. - 58 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu 5.1.1.1 Project Planning and Management One of the reasons why so many DW projects fail is improper project planning and inadequate project management (Ponniah, 2001). DW project planning is not a one-time activity. Since project plans are usually based on estimates, they must be adjusted constantly. A solid and, at the same time, flexible DW project plan could be the foundation for a successful DW initiative. Project planning usually consists of several important activities (Lewis, 2001): create a work breakdown structure listing activities, tasks, and subtasks; estimate time, cost and resource requirements; determine the critical path based on the task and resource dependencies; create the detailed project plan. As a DW project is very complex and many risks can affect its development, a very important step here is project risk management. It involves three main activities: identify possible risks and threats; quantify threats and risks by assigning a risk priority number; and develop contingency plans to deal with risks that cannot be ignored. However, just planning the project is not enough for a successful DW implementation. The project also needs to be managed during its development. First, the DW project officially begins with the project kickoff meeting in order to get the entire project team on the same page in terms of where the project stands and where it plans to go (Kimball et al., 2008). Once the project has started, the project status must be regularly monitored (Lewis, 2001). The DW project lifecycle requires the integration of numerous resources and tasks that must be brought together at the right time to achieve success. Monitoring project status is key to achieving this coordination. Another important problem in project management is the management of scope changes. This is usually done by adopting issue tracking or change management methodologies. Of course, throughout the project, a lot of changes might happen and this is why it is a good idea to maintain the project plan by updating and evaluating it periodically. Moreover, consolidated project documentation will help ease the burden keeping pace with the unending nature of the DW project. Documenting project assumptions and decision points is also helpful in the event that the deliverables do not meet expectations. However, many organizations ignore the importance of documentation, and if time pressures mount, it will be the first item to be eliminated. Finally, in order to learn from previous mistakes, projects and project management should always be reviewed and evaluated. This will offer some lessons learned that will determine the same mistakes to be avoided in the future (Lewis, 2001). Maturity Assessment Question(s) As explained in this section, project planning and management is crucial for a DW project success. This is why we created a maturity question regarding this part of the development processes. We included in the answers the most important aspects: project planning and scheduling; project risk management; project tracking and control; documentation; and evaluation and assessment. Therefore, an organization which does not have any of these activities implemented is on the first level of maturity, whereas one that takes care of all these activities is on the highest level of maturity regarding project planning and management. 2) Which answer best describes your DW project management? a) Level 1 – Project planning and scheduling: no; project risk management: no; project tracking and control: no; standard and efficient procedure and documentation, evaluation and assessment: no b) Level 2 – Project planning and scheduling: yes; project risk management: no; project tracking and control: - 59 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu no; standard and efficient procedure and documentation, evaluation and assessment: no Level 3 – Project planning and scheduling: yes; project risk management: no; project tracking and control: yes; standard and efficient procedure and documentation, evaluation and assessment: no d) Level 4 – Project planning and scheduling: yes; project risk management: yes; project tracking and control: yes; standard and efficient procedure and documentation, evaluation and assessment: no e) Level 5 – Project planning and scheduling: yes; project risk management: yes; project tracking and control: yes; standard and efficient procedure and documentation, evaluation and assessment: yes. Table 30: Project Management Maturity Assessment Question. c) 5.1.1.2 Requirements Definition In a DW, users‘ business requirements represent the most powerful driving force (Ponniah, 2001) as they impact virtually every aspect of the project. Also, as end users alone are able to define the business goals of the DW systems correctly, they should be enabled to specify information requirements by themselves (Hansen, 1997). The DW environment is an information delivery system where the users themselves will access the DW repository and create their own outputs. It is therefore extremely important that the DW should contain the right elements of information in the most optimal formats in order for the users to get the results they want. Every task that is performed in every phase in the development of the DW is determined by the requirements. Every decision made during the design phase is totally influenced by the requirements. Because requirements form the primary driving force for every phase of the development process, special attention needs to be paid to the requirements definition phase in order to make sure that it contains adequate details to support each phase. Requirements are usually gathered from the user community using two basic interactive techniques (Kimball et al., 2008; Moss & Atre, 2003; Ponniah, 2001): interviews – they are conducted with individuals or small groups (i.e.: two or three persons at a time) and they represent a good approach when details are intricate. facilitated sessions – they are larger group sessions of ten to twenty people led by a facilitator and they more appropriate after getting a baseline understanding of the requirements, but useful information can also be extracted from the review of existing documentation from the user and IT departments. Another important aspect which is often neglected in the requirements definition phase is formal documentation (Kimball et al., 2008; Ponniah, 2001) which is essential for several reasons. First, the requirements definition document is the foundation for the next phases and it becomes the encyclopedia of reference material as resources are added to the DW team. If project team members have to leave the project for any reason at all, the project will not suffer from people walking away with the knowledge they have gathered. Second, documentation helps the team to crystallize and better understand the interview content. Finally, formal documentation will also validate the findings when reviewed with the users. - 60 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Maturity Assessment Question(s) As shown in this paragraph, the requirements definition phase is very important for the DW environment and special attention should be paid to it. A solid requirements definition follows a standard methodology and has a formal requirements document. Also, even if not usually done, causal analysis meetings to identify common bottlenecks causes in this step and subsequent elimination of these causes could be very beneficial for the DW development process. 3) Which answer best describes the requirements definition phase for your DW project? a) Level 1 – Ad-hoc requirements definition; no methodology used b) Level 2 – Methodologies differ from project to project; interviews with business users for collecting the requirements c) Level 3 – Standard methodology for all the projects; interviews and group sessions with both business and IT users for collecting the requirements d) Level 4 – Level 3) + qualitative assessment and measurement of the phase; requirements document also published e) Level 5 – Level 4) + causal analysis meetings to identify common bottlenecks causes and subsequent elimination of these causes. Table 31: Requirements Definition Maturity Assessment Question. 5.1.1.3 Design/ Development/ Testing and Acceptance/ Deployment Once the business requirements are gathered and defined, the DW team can continue with designing the data model and the physical database, designing and developing the ETL and the BI applications. Then, the developed DW with all its components needs to be tested, accepted by both the technical and business parts, and finally, the system can be deployed or put into production. We will give a short overview on each of these phases further in this paragraph and then present the maturity questions regarding this part of the DW development processes. Design The design phase refers to designing the data models with all three levels (i.e.: conceptual, logical, physical), the ETL and the BI applications. Most of the aspects related to the design phase were already mentioned in the DW technical solution part where we elaborated on each technical component. However, several things regarding the processes can be added here. First, the data modelling process itself starts during the business requirements activity when the preliminary requirements definition document is created. Based on this, the design team will first develop a high level conceptual model, and then continue with the logical and physical data models. What is important in this process is to remember that the data modelling is an iterative process and to have a preparation period beforehand which includes activities such as: identify the roles and participants required, review the business requirements document, set up the modelling environment, develop standards and obtain appropriate facilities and supplies. The design of ETL and BI applications also involves several activities in order to be successful: create a plan and documentation, do some resource planning, develop default strategies and standards (Kimball et al., 2008). - 61 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Development The development phase includes the building of the physical databases and the actual implementation of ETL and BI applications (Kimball et al., 2008). The physical databases are built when the data definition language (DDL) is run against the database management system (DBMS). ETL programs must be developed for the two sets of load processes: one-time historic load and incremental load. If DBMS load utility is used to populate the BI target databases, then only the extract and transformation programs need to be written, including the programs that create the final load files. If an ETL tool is used, the instructions (i.e.: technical metadata) for the ETL tool must be created. BI applications development involves using the front end tool building environment and writing the programs and scripts for the reports, queries, front-end interface, and online help function (Moss & Atre, 2003). Testing and Acceptance The DW system is a complex software project that needs to be tested extensively before put in production. However, even if testing is critical for DW success, many organizations underestimate the importance and the time needed for these tasks. The most important activities during this step are (Golfarelli & Rizzi, 2009; Kimball et al., 2008; Moss & Atre, 2003): unit testing – all ETL modules and BI applications must be unit tested to prove that they compile without errors, but also to verify that they perform their functions correctly, to trap all potential errors, and to see if they produce the right results. It is also recommended that a different role than the developer should do this unit testing. integration and regression testing – once all the individual ETL modules and BI applications have been unit tested, the entire system needs to be tested. This is done with integration testing on the first release and with regression testing on subsequent releases. In this way, the completely integrated system can be verified whether it meets its requirements or not. Regression testing focuses on finding defects after a major change has occurred and it uncovers all the test results that deviate from the correct answers. performance testing – a performance test will indicate whether the system performs well both for loads and queries and reports. acceptance testing – acceptance tests are done by the users of the DW in order to verify that the system meets the mutually agreed-upon requirements. The acceptance tests include the validation of the ETL process, but more importantly for the end users, they should determine the overall usability of the BI applications and whether the returned results are the desired ones. In order for these tests to be effective, users‘ training is usually done beforehand. Besides doing these activities, it is also important to formalize and follow a standard procedure for the testing and acceptance phase. In this way, it would be much easier to keep track of the tests and their results and, at the same time, evaluate the testing and acceptance phase (Kimball et al., 2008). Deployment (Production) The last step in order to finish the DW implementation is to deploy it by transferring the DW from testing to production. The first deployment is the easiest one. After this, things get a little bit more complicated - 62 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu as any modifications to the system should be accomplished with minimal disruption to the business user community. For more details on DW deployment techniques, see (Kimball et al., 2008). Maturity Assessment Question(s) As a lot of aspects regarding the design phase were analyzed in the DW technical part, and it is difficult to do a high level assessment for the development and deployment phases, we decided to assess the testing and acceptance phase, which is a critical one for DW success. It will show the main activities involved in this phase and offer the possibility for the organization to choose the ones implemented by them. The question will be scored through normalization as further explained in the expert evaluation chapter. 4) Which of the following activities are included in the testing and acceptance phase for your DW project? a) Unit testing by another person b) System integration testing c) Regression testing d) User training e) Acceptance testing f) Standard procedure and documentation for testing and acceptance g) External assessments and reviews of testing and acceptance. Table 32: Testing and Acceptance Maturity Assessment Question. Development/ Testing/ Acceptance/ Production Environments To support all the phases presented in this paragraph, organizations usually set up different environments for different purposes (Moss & Atre, 2003): The development environment, where the programs and scripts are written and tested by the developers. The testing environment, where the DW system with all its components is tested. The acceptance environment, where the users do acceptance tests. The production environment, where the DW actually runs after being rolled out. The implemented environments can influence the quality and performance of the DW. While smaller organizations may have only two environments (i.e.: development and production), others usually have at least three different environments. Another important aspect is the way the migration between environments is done: manually or automatically, the latter being of course the optimum one. Maturity Assessment Question(s) The maturity question chosen for this aspect is straightforward and self-explanatory after reviewing the arguments mentioned above. As standards are crucial for data warehousing, we also did an assessment for the 5) To what degree is there a separation between the development/test/acceptance/deployment environments in your organization? a) Very low – no separation between environments b) Low – two separate environments (i.e.: usually development and production) with manual transfer between them c) Moderate – some separation between environments (i.e.: at least three environments) with manual transfer - 63 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu between them d) High – some separation between environments (i.e.: at least two environments) with automatic transfer between them e) Very high – all the environments are distinct with automatic transfer between them. 6) To what degree has your organization defined and documented standards for developing, testing and deploying DW functionalities (i.e.: ETL and BI applications)? a) Very low – no standards defined b) Low – few standards defined c) Moderate – some standards defined d) High – a lot of the standards defined e) Very high – a comprehensive set of standards defined 7) To what degree has your organization implemented standards for developing, testing and deploying DW functionalities (i.e.: ETL and BI applications)? a) Very low – no standards implemented b) Low – few standards implemented c) Moderate – some standards implemented d) High – a lot of the standards implemented e) Very high – a comprehensive set of standards implemented. Table 33: Development/ Testing/ Acceptance/ Production Maturity Assessment Questions. 5.1.2 The DW/BI Sponsor As already mentioned, strong support and sponsorship from senior business management is critical for a successful DW initiative. However, many organizations seem to overlook this aspect and ignore its importance. No other venture unifies the information view of the entire corporation as the corporation‘s DW does. The entire organization is involved and positioned for strategic advantage. Therefore, it is important to have sponsorship from the highest levels of management to keep focus and satisfy conflicting requirements (Ponniah, 2001). The DW sponsor needs to be more than an IT project manager or IT director. Effective business sponsors share several characteristics (Kimball et al., 2008). First, they have a vision for the potential impact of a DW/BI solution and can visualize how improved access to information will result in incremental value to the business. Second, strong business sponsors are influential leaders within the organization and they are demanding, but at the same time, realistic and supportive. It is important that they have a basic understanding of DW/BI concepts, including the iterative development cycle to avoid unrealistic expectations. Effective sponsors are able to deal with short-term problems and project setbacks and they are willing to compromise. Maturity Assessment Question(s) All this being said, some conclusions can be drawn: The DW project sponsor needs to be from the business department. It is better to have multiple strong sponsors within the organization. The best sponsorship involves business-driven, cross-departmental sponsorship including top level management. Therefore, the DW/BI initiative is integrated in the company‘s strategy and processes with continuous support and budget. Therefore, the maturity question derived from these conclusions is: 8) Which answer best describes the sponsor for your DW project? - 64 - DWCMM: The Data Warehouse Capability Maturity Model a) b) c) d) e) Catalina Sacu Level 1 – No project sponsor Level 2 – Chief information officer (CIO) or an IT director Level 3 – Single sponsor from a business unit or department Level 4 – Multiple individual sponsors from multiple business units or departments Level 5 – Multiple levels of business-driven, cross-departmental sponsorship including top level management sponsorship (BI/DW is integrated in the company process with continuous budget). Table 34: DW/BI Sponsorship Maturity Assessment Question. 5.1.3 The DW Project Team and Roles As in any type of project, the success of a DW project also depends on the project team. A DW project is similar to other software projects in that it is human-intensive. It takes several trained and especially skilled persons to form the project team. Two of the factors that can break a project are: complexity overload and responsibility ambiguity. But, the bad influence of these factors can be overcome by putting the right person in the right job (Ponniah, 2001). Therefore, organizing the project team for a DW project has to do with matching diverse roles and responsibilities with proper skills and levels of experience. A DW project requires a number of different roles and skills from both the business and IT communities during its lifecycle. The main roles refer to (Kimball et al., 2008; Ponniah, 2001): Sponsorship and management (e.g.: business sponsor, project manager, etc.) Development roles (e.g.: business analyst, data steward, data quality analyst, data modeler, metadata manager, ETL architect, ETL developer, BI architect, BI developer, technical architect, security manager, DW tester, etc.) Monitoring and maintenance roles (e.g.: help desk, operations manager, etc.) However, there is seldom a one-to-one relationship between roles and individuals. It does not really matter whether a person fills multiple roles on the DW project. What really matters is to have these roles and responsibilities formalized and actually implemented. It is also important to do periodic evaluation and assessments of the performance of roles in order to check for training requirements and solve skillrole mismatches (Humphries et al., 1999; Nagabhushana, 2006). Maturity Assessment Question(s) As it is difficult to say whether a team with more roles is more mature than one with less roles, we will assess here whether the role definition and implementation has been done. Besides this, a company on a higher level of maturity would also do periodic assessment and evaluation of roles. 9) Which answer best describes the role division for the DW development process? a) Level 1 – No formal roles defined b) Level 2 – Defined roles, but not technically implemented c) Level 3 – Formalized and implemented roles and responsibilities d) Level 4 – Level 3) + periodic peer reviews (i.e.: review of each other‘s work) e) Level 5 – Level 4) + periodic evaluation and assessment of roles (i.e.: assess the performance of the roles and match the needed roles with responsibilities and tasks). Table 35: DW Project Team and Roles Maturity Assessment Question. - 65 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu 5.1.4 DW Quality Management The purpose of DW Quality Management is to provide management with appropriate visibility into the development process being used by the DW project and of the products being built. Organizations usually start by doing DW development quality assurance. This involves reviewing and auditing the data warehousing products and activities to verify that they comply with the applicable procedures and standards and providing the project and other appropriate managers with the results of these reviews and audits. In time, organizations learn how to manage this and implement DW quality management. This involves defining quality goals for the DW products and processes, establishing plans to achieve these goals, and monitoring and adjusting the plans, products, activities, and quality goals to satisfy the needs and desires of the customer and end user (Paulk et al., 1995). Maturity Assessment Question(s) The maturity assessment question and the characteristics specific for each stage can be depicted in the table below. 10) Which answer best describes the DW quality management? a) Level 1 – No quality assurance activities b) Level 2 – Ad-hoc quality assurance activities c) Level 3 – Standardized and documented quality assurance activities done for all the development phases d) Level 4 – Level 3) + measurable and prioritized goals for managing the DW quality (e.g.: functionality, reliability, maintainability, usability) e) Level 5 – Level 4) + causal analysis meetings to identify common defect causes and subsequent elimination of these causes; service quality management certification. Table 36: DW Quality Management Maturity Assessment Question. 5.1.5 Knowledge Management Knowledge management (KM) is an emerging discipline that promises to capitalize on organization‘s intellectual capital. KM implementation and use has rapidly increased since the 1990s as more and more companies understood the importance of the knowledge each individual possesses and can systematically share with an organization (Rus & Lindvall, 2002). KM is ―the practice of adding actionable value to information by capturing tacit knowledge and converting it to explicit knowledge; by filtering, storing, retrieving and disseminating explicit knowledge; and by creating and testing new knowledge‖ (Nemati et al., 2002). Explicit knowledge, also known as codified knowledge, is expressed knowledge. It corresponds to the information and skills that employees can easily communicate and document, such as processes, templates and data. Tacit knowledge is personal knowledge that employees gain through experience; this can be hard to express and is largely influenced by their beliefs, perspectives and values (Nonaka, 1991). DW development is a quickly changing, knowledge-intensive process involving people working in different phases and activities. Therefore, knowledge in data warehousing is diverse and an improved use of this knowledge is the basic motivation for KM in this field. KM is equally important for both the DW - 66 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu development processes and service processes. The general knowledge evolution cycle which defines the phases of organizational knowledge can also be applied for the specific field of DW (Agresti, 2000): originate / create knowledge – members of the DW team develop knowledge through learning, problem solving, innovation, creativity, and importation from outside sources. create / acquire knowledge – members acquire and capture information about knowledge in explicit forms. transform / organize knowledge – knowledge is organized, transformed or included in written material and knowledge bases. deploy / access knowledge – knowledge is distributed through education, training and mentoring programmes, automated knowledge-based systems or expert networks. apply knowledge – the organization‘s ultimate goal is applying the knowledge – this is the most important part of the life cycle. KM aims to make knowledge available whenever it is needed. In order to implement these phases systematically and successfully, it is very important for organizations to have a centralized KM strategy in place and not do everything ad-hoc (Rus & Lindvall, 2002). Maturity Assessment Question(s) (Klimko, 2001) proposed a KM maturity model based on CMM. By summarizing the characteristics provided by him for each maturity stage and also the information from the knowledge evolution cycle, we came up with the following maturity assessment question. The same maturity assessment is also done for the implementation of KM for Service Processes. 11) Which answer best describes the knowledge management in your organization for the DW development processes? a) Level 1 – Ad-hoc knowledge gathering and sharing b) Level 2 – Organized knowledge sharing through written documentation and technology (e.g.: knowledge databases, intranets, wikis, etc.) c) Level 3 – Knowledge management is standardized; knowledge creation and sharing through brainstorming, training and mentoring programs, and also through the use of technology d) Level 4 – Central business unit knowledge management; quantitative knowledge management control and periodic knowledge gap analysis e) Level 5 – Continuously improving inter-organizational knowledge management. Table 37: Knowledge Management Maturity Assessment Question. 5.2 DW Service Processes As already mentioned in the previous paragraph, in the last two decades, software maintenance began to be treated as a sequence of activities and not as the final stage of a software development project (April et al., 2004). Several standards and models have been developed especially for software maintenance and nowadays, more and more organizations make this distinction between the development phases and the maintenance and monitoring processes. These processes are very important after a DW has been deployed in order to keep the system up and running and to manage all the necessary changes. Software maintenance is defined as (IEEE, 1990): ―The process of modifying a software system or component after delivery to correct faults, improve performance or other attributes, or adapt to a changed environment‖. - 67 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu 5.2.1 From Maintenance and Monitoring to Providing a Service In the last couple of years, IT organizations made a transition from being pure technology providers to being service providers. This requires taking a different perspective on IT management, called IT Service Management (ITSM). ITSM puts the services delivered by IT at the center of IT management and it is commonly defined as (Young, 2004): ―a set of processes that cooperate to ensure the quality of live IT services, according to the levels of service agreed to by the customer.‖ This service oriented perspective on IT organizations can be best applied to the software maintenance field as it is an ongoing activity as opposed to the software development which is more project based. Therefore, software maintenance can be seen as providing a service, whereas software development is concerned with the delivery of products (Niessink & van Vliet, 2000). Consequently, customers will judge the quality of software maintenance differently from that of software development. In particular, service quality is assessed on two dimensions: the technical quality – what the result of the service is – and the functional quality – how the service is delivered. This means that in order to provide high-quality software maintenance, different and additional processes are needed than provided by a high-quality software development organization (Niessink & van Vliet, 2000). In order to have a clearer image on what a ―service‖ means, we can take a look at the service marketing literature where a wide range of definitions exists of what a service entails. Usually, a service is defined as an essentially intangible set of benefits or activities that are sold by one party to another (Grönroos, 1990). The main differences between products and services are (Zeithaml, 1996; van Bon, 2007): intangibility, heterogeneity, simultaneous production and consumption, perishability. However, the difference between products and services is not clear-cut and they can sometimes be intertwined. If we turn to the software engineering domain, we see that a major difference between software development and software maintenance is the fact that software development results in a product, whereas software maintenance results in a service being delivered to the customer. All types of maintenance are concerned with activities aimed at keeping the system usable and valuable for the organization. Hence, software maintenance has more service-like aspects than software development, because the value of software maintenance is in the activities that result in benefits for the customers, such as corrected faults and new features. This is in contrast with software development, where the development activities do not provide benefits for the customer, but instead it is the resulting software system that provides the benefits (Niessink & van Vliet, 2000). As DW can also be considered a software engineering project, the same concepts can be applied here as well. Also, as said above, the difference between products and services is not clear-cut and, consequently, this also goes for software development and software maintenance. 5.2.2 IT Service Frameworks Over the years, various IT service frameworks have been proposed: Information Technology Infrastructure Library (ITIL), BS 15000, HP ITSM Reference Model, Microsoft Operations Framework (MOF), IBM‘s Systems Management Solution Lifecycle (SMSL). However, in the ITSM landscape, ITIL acts as the de-facto standard for the definition of best practices and processes that pertain to the disciplines of service support and service delivery (Salle, 2004). BS 15000 extends ITIL, but at the same time it is tightly integrated with ITIL. The other frameworks extend and refine ITIL, sometimes with - 68 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu guidelines specific to the referenced technologies. Therefore, we will consider the service components from ITIL as a starting point for our analysis of the DW Service Processes part. Moreover, two maturity models related to IT maintenance and service also served as a foundation for developing this part of our DW maturity model: the Software Maintenance Maturity Model and the IT Service CMM. Inspired by other maturity models, they include several maturity stages and key process areas. An overview is depicted in table 38. A more detailed description of the three models is provided further in this paragraph. Authors Central Computer and Telecommunications Agency (CCTA) (1989) Niessink, Clerc & van Vliet (2002) Model Technology (ITIL) Infrastructure Library Main Idea service delivery processes and service support processes and functions IT Service CMM key practices intended to cover the activities needed to reach a certain level of service maturity while preserving a structure similar to CMM April, Hayes, Abran & Dumke Software Maintenance Maturity Model unique activities of software (2004) (SMmm) maintenance while preserving a structure similar to that of the CMMi Table 38: Overview of IT Service Frameworks. ITIL ITIL was established in 1989 by the United Kingdom‘s former Central Computer and Telecommunications Agency (CCTA) to improve its IT organization. ITIL consists of an inter-related set of best practices for lowering the cost, while improving the quality of IT services delivered to users. It is organized around six key areas: service support, service delivery, business perspective, application management, infrastructure management, security management and planning to implement service management. However, the core of ITIL comprises of five service delivery processes and five service support processes and one service support function (service desk). Service support processes apply to the operational level of the organization (i.e.: all aspects associated with the daily activities of IT service and maintaining the related processes), whereas the service delivery processes are tactical in nature (i.e.: the processes required for planning and delivery of quality services over the long term, with a goal of continual improvement of those services). An overview on ITIL‘s core components can be viewed in the table below. For more information about ITIL, see (Colin, 2004). Service Support Service Delivery Service Level Management Financial Management Capacity Management IT Service Continuity Management Availability Management Service Desk Incident Management Problem Management Change Management Release Management Configuration Management Table 39: ITIL‘s Core Components (adapted from (Cater-Steel, 2006)). Software Maintenance Maturity Model (SMmm) The SMmm was designed as a customer-focused reference model for either: - 69 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Auditing the software maintenance capability of a software maintenance service supplier or outsourcer, or Improving internal software maintenance organizations. The model has been developed from a customer perspective, as experienced in a competitive, commercial environment. A higher capability in the SMmm context means better services delivered for customer organizations and increased service performance for the maintenance organizations. The SMmm is based on the Capability Maturity Model Integration (CMMi), version 1.1 [sei02] and Camélia model [Cam94]. The model must be viewed as a complement to the CMMi, especially for the processes that are common to developers and maintainers. The architecture of the SMmm differs slightly from that of the CMMi version. The most significant difference is the inclusion of: A roadmap category to further define the key process areas (KPAs); Detailed references to papers and examples on how to implement the practices. The SMmm includes four process domains (i.e.: software maintenance process management, software maintenance request maintenance, software evolution engineering, support to software evolution enginerring), several KPAs, roadmaps and practices. While some KPAs are unique to maintenance, others were derived from the CMMi and other models, and have been modified slightly to map more closely to daily maintenance characteristics. For more details on the SMmm, see (April et al., 2004). IT Service CMM The IT Service CMM is based on the CMM, but it is adapted to the service processes. The model consists of five maturity levels which contain KPAs. For an organization to reside on a certain maturity level, it needs to implement all of the KPAs for that level and those of lower levels. An overview of the KPAs assigned to each maturity level can be seen in table 40. Level Initial Repeatable Defined Managed Key Process Area Ad hoc processes Service Commitment Management Service Tracking and Oversight Subcontract Management Service Delivery Planning Event Management Configuration Management Service Quality Assurance Organization Service Definition Organization Process Definition Organization Process Focus Integrated Service Management Service Delivery Resource Management Training Programme Intergroup Coordination Problem Management Quantitative Process Management Service Quality Management - 70 - DWCMM: The Data Warehouse Capability Maturity Model Optimizing Catalina Sacu Process Change Management Technology Change Management Problem Prevention Table 40: IT Service CMM‘s Key Process Areas (adapted from (Paulk et al., 1995)). The objective of the IT Service CMM is twofold: to enable IT service providers to assess their capabilities with respect to the delivery of IT services, and to provide IT service providers with directions and steps for further improvement of their service capability. There are a number of characteristics of the IT Service CMM that are important for understanding its nature. The main focus of the model is the complete service organization, the scope of the model encompasses all service delivery activities (i.e.: those activities which are key to improving the service delivery capability of service organizations), the model is strictly ordered (i.e.: the key process areas are assigned to different maturity levels in such a way that lower level processes provide a foundation for the higher level processes), and the model is a minimal model in different senses (i.e.: the model only prescribes the key processes and activities that are needed to reach a certain maturity level and it does not show how to implement them, what organization structure to use, etc.). For a broader image on the IT Service CMM, see (Niessink & van Vliet, 1999). 5.2.3 DW Service Components Now that we have given a short overview on the most important frameworks related to IT service, we can present the main elements we chose for our DW service processes maturity assessment. Once the DW project has been deployed, ongoing maintenance and monitoring work is required to keep the DW system operating in great shape. As the scope of the maintenance and monitoring activities in the DW extend over many features and functions, it is important to have a plan and do these activities in a formalized manner. The results of this phase offer the data needed to plan for growth and to improve performance. The most important activities involved in the DW maintenance and monitoring are (Kimball et al., 2008; Ponniah, 2001): collection of statistics regarding the utilization of the hardware and software resources (e.g.: memory management, physical disk storage space utilization, processor usage, report usage, number of completed queries by time slots during the day, time each user stays online with the data warehouse, total number of distinct users per day, etc.) user support BI applications maintenance and monitoring security administration performance monitoring and tuning data reconciliation and data growth management ETL monitoring and management resource monitoring and management infrastructure management - 71 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu backup and recovery management, etc.. Maturity Assessment Question(s) As can be seen, the DW software maintenance and monitoring involves a lot of activities, but it is critical to include at least the most important ones. This is the reason why we developed a high level maturity question regarding DW software maintenance and monitoring processes. The question is a multiple choice one where the answers are the main activities included in this part of the DW solution. It will be scored similar through normalization similar to the question for the testing and acceptance phase. However, this question was not included in the questionnaire for the first case study and thus, will not be taken into consideration when doing the scoring for the case studies. 1) Which of the following activities are included in the maintenance and monitoring phase for your DW project? a) Collection of statistics regarding the utilization of the hardware and software resources (e.g.: memory management, physical disk storage space utilization, processor usage, BI applications usage, number of completed queries by time slots during the day, time each user stays online with the data warehouse, total number of distinct users per day, etc.) b) BI applications maintenance and monitoring c) User support d) ETL monitoring and management e) data reconciliation and data growth management f) Security administration g) Resource monitoring and management h) Infrastructure management i) Backup and recovery management j) Performance monitoring and tuning. Table 41: Maintenance and Monitoring Maturity Assessment Question. As already presented in the previous paragraphs, DW maintenance and monitoring are more and more often considered to be service processes as they are offered on an ongoing basis to the customers. From the presented IT service frameworks, it can be seen that some elements appear in more than one model or some elements from one model can be mapped to elements from another one. If we also take into consideration the changing nature of a DW and all the aspects that DW maintenance and monitoring processes entail, we decided to consider the following components when assessing the maturity of DW service processes: service quality management, service level management, incident management, change management, technical resource management, availability management, release management and knowledge management. Each of these elements and the correspondent maturity assessment question(s) will be further elaborated on in this paragraph. 5.2.3.1 Service Quality Management The purpose of Service Quality Management is to provide management with the appropriate visibility into the processes being used and the services being delivered. This process entails service quality assurance activities which involve the reviewing and auditing of working procedures, DW service delivery activities and work products to see that they comply with applicable standards and procedures. Management and relevant groups are provided with the results of the reviews and audits. An organization with experience - 72 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu in service processes also develops a quantitative understanding of the quality of the services delivered in order to achieve specific quality goals (Niessink & van Vliet, 1999). If these goals are not reached, causal analysis meetings should be held to identify the defect causes and subsequently eliminate them. In order to get better results, many organizations with a high maturity in DW service delivery also try to obtain external service quality certification (e.g.: ISO certification, etc.). Maturity Assessment Question(s) Therefore, an organization on the first maturity stage will not have any service quality management activities; whereas one on the highest maturity level will have not only a standard procedure, but also quantitative service quality evaluation and causal analysis meetings. Their purpose is to identify common defect causes and try to eliminate them in the future; in this way continuous service quality management improvement is achieved. 2) Which answer best describes the DW service quality management in your organization? a) Level 1 – No service quality management activities b) Level 2 – Ad-hoc service quality management c) Level 3 – Proactive service quality management including a standard procedure d) Level 4 – Level 3) + service quality measurements periodically compared to the established goals to determine the deviations and their causes e) Level 5 – Levels 4) + causal analysis meetings to identify common defect causes and subsequent elimination of these causes; service quality management certification. Table 42: Service Quality Management Maturity Assessment Question. 5.2.3.2 Service Level Management Service Level Management ensures continual identification, monitoring and reviewing of the optimally agreed levels of IT services as required by the business. It negotiates service level agreements (SLAs) with the suppliers and customers and ensures that they are met (Cater-Steel, 2006). It is responsible for ensuring that all DW service management processes, operational level agreements and underpinning contracts are appropriate for the agreed service level targets. This is done in close cooperation between the DW service providers and the customers. Some examples of SLA performance criteria for a DW are: 50 concurrent queries processed with an average query time of no more than five minutes. Less than four hours of planned downtime per week. Less than six hours of unplanned downtime per month. Data refreshed weekly. The high level activities for Service Level Management are: document customer service needs, implement SLAs, SLAs reviewed with the customer/supplier on a periodic or event-driven basis, actual service delivery continuously monitored and evaluated with the customer/supplier (SLAs with penalties). - 73 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Maturity Assessment Question(s) From the high level activities, one could say that usually an organization evolves from documenting all customer/supplier service needs in an ad-hoc manner to using a standard procedure and continuously monitoring, evaluating and improving the actual service delivery. 3) Which answer best describes the DW service level management in your organization? a) Level 1 – Customer/supplier service needs documented in an ad-hoc manner; no service catalogue compiled b) Level 2 – Some customer/supplier service needs documented and formalized based on previous experience c) Level 3 – All the customer/supplier service needs documented and formalized according to a standard procedure into service level agreements (SLAs) d) Level 4 – SLAs reviewed with the customer/supplier on both a periodic and event-driven basis e) Level 5 – Actual service delivery continuously monitored and evaluated with the customer/supplier on both a periodic and event-driven basis for continuous improvement (SLAs including penalties). Table 43: Service Level Management Maturity Assessment Question. 5.2.3.3 Incident Management ITIL defines an incident as a deviation for the (expected) standard operation of a system or a service. The objective of Incident Management is to provide continuity by restoring the service in the quickest way possible by whatever means necessary (Salle, 2004). Also, a problem is considered in ITIL as a condition that has been defined, identified from one large incident or many incidents exhibiting common symptoms for which the cause is unknown (Salle, 2004). As a DW is a very complex system, many incidents and problems can occur, and therefore, this process is very important. Incidents and problems can arise on the side of the users or that of the system. Given the frequency of changes in a DW, many complex problems are likely to occur very often. The objective of Incident and Problem Management is to provide continuity by restoring the service as quickly as possible and to prevent and minimize the impact of incidents. This is why it is critical to have a solid Incident and Problem Management that also needs to be in a close relationship with Change Management. The high level activities for Incident Management are: detection, recording, classification, investigation, diagnosis, resolution and recovery. Maturity Assessment Question(s) 4) Which answer best describes the DW incident management in your organization? a) Level 1 – Incident management is done ad-hoc with no specialized ticket handling system or service desk to assess and classify them prior to referring them to a specialist b) Level 2 – A ticket handling system is used for incident management and some procedures are followed, but nothing is standardized or documented c) Level 3 – A service desk is the recognized point of contact for all the customer queries; incidents assessment and classification is done following a standard procedure d) Level 4 – Level 3) + standard reports concerning the incident status including measurements and goals (e.g.: response time) are regularly produced for all the involved teams and customers; an incident management database is established as a repository for the event records e) Level 5 – Level 4) + trend analysis in incident occurrence and also in customer satisfaction and value perception of the services provided to them. Table 44: Incident Management Maturity Assessment Question. - 74 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu 5.2.3.4 Change Management Change Management is described as a regular task for immediate and efficient handling of changes that might occur in a DW environment. The main input to the Change Management process is a request for change (RFC) (Salle, 2004). This can be done by an outcome of a process relating to Incident and Problem Management or by extending the service through Service Level Management. The objective of Change Management is to ensure that standardized methods and techniques are used for efficient and immediate handling of all the changes to the DW system while minimizing change related incidents. The changes that can frequently occur in a DW environment concern: Changes in the contents of the DW. Changes in the functionality of BI applications. Changes in a source system with direct implications for ETL, etc. The high level for Change Management activities are: acceptance and classification, assessment and planning, authorization of changes, control and coordination, evaluation. Maturity Assessment Question(s) As in the case of Incident Management, at first an organization takes care of change requests in an ad-hoc manner. Then, an electronic change management system is usually introduced for storing and solving the requests for change and some policies and procedures for change management are beginning to be established. Once a standard procedure for approving, verifying, prioritizing and scheduling changes is put in place, organizations start moving towards the high end of the maturity development. And some of them manage to reach the last maturity stage of continuous improvement of Change Management. 5) Which answer best describes the DW change management in your organization? a) Level 1 – Change requests are made and solved in an ad-hoc manner b) Level 2 – A change management system is used for storing and solving the requests for change; some policies and procedures for change management established, but nothing is standardized c) Level 3 – A standard procedure is used for approving, verifying, prioritizing and scheduling changes d) Level 4 – Standard reports concerning the change status including measurements and goals (e.g.: response time) are regularly produced for all the involved teams and customers; standards established for documenting changes e) Level 5 – Trend analysis and statistics regarding change occurrence, success rate, customer satisfaction and value perception of the services provided to them. Table 45: Change Management Maturity Assessment Question. 5.2.3.5 Technical Resource Management The purpose of Resource Management is to maintain control of the necessary hardware and software resources needed to deliver the agreed DW services level targets (Niessink & van Vliet, 1999). Before commitments are made to customers, resources are checked. If not enough resources are available, either the commitments are adapted or extra resources are installed. It also involves monitoring the ETL and BI applications in order to see if the current resources are enough for the desired DW performance. - 75 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Maturity Assessment Question(s) Similar to the other service processes, DW Technical Resource Management also evolves from ad-hoc activities to resource trend analysis and monitoring to determine the most common bottlenecks and make sure that there is sufficient capacity to support planned services. The intermediate phases can be depicted from the answers to the maturity assessment question below. 6) Which answer best describes the DW technical resource management in your organization? a) Level 1 – Ad-hoc resource management activities (only when there is a problem) b) Level 2 – Resource management is done following some procedures, but nothing is standardized or documented c) Level 3 – Resource management is done constantly following a standardized documented procedure d) Level 4 – Level 3) + standard reports concerning performance and resource management including measurements and goals are done on a regular basis e) Level 5 – Level 4) + resource management trend analysis and monitoring to make sure that there is sufficient capacity to support planned services. Table 46: Incident Management Maturity Assessment Question. 5.2.3.6 Availability Management Availability Management allows organizations to ensure that all DW infrastructure, processes, tools and roles are according to the SLAs by using appropriate means and techniques. It should also manage risks that could seriously impact DW services by reducing the risks to an acceptable level and planning for the recovery of DW services. Availability Management also tries to proactively manage continual improvement efforts by measuring and tracking metrics for availability, reliability, maintainability, serviceability, and security (Colin, 2004). In order to have better results, Availability Management that also needs to be in a close collaboration with Resource Management. Maturity Assessment Question(s) The maturity assessment question for Availability Management follows the same structure as the one for Resource Management as these activities are in close collaboration, but there is a very important characteristic for the former which can really make a difference: risk assessment. An organization that follows a standardized procedure for availability management and that also pays serious attention to risk assessment has a very high change of delivering the agreed service level targets. The maturity question for this aspect can be seen below. 7) Which answer best describes the availability management in your organization? a) Level 1 – Ad-hoc availability management b) Level 2 – Availability management is done following some procedures, but nothing is standardized or documented c) Level 3 – Availability management documented and done using a standardized procedure (all elements are monitored) d) Level 4 – Level 3) + risk assessment to determine the critical elements and possible problems e) Level 5 – Level 4) + availability management trend analysis and planning to make sure that all the elements are available for the agreed service level targets. Table 47: Availability Management Maturity Assessment Question. - 76 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu 5.2.3.7 Release Management As a DW is continuously changing and evolving over time, organizations need to embrace the release concept. Any incomplete functionality or necessary change will be bundled in future releases. Therefore, the objective of Release Management is to ensure that only authorized and correct versions of DW are made available for operation (Salle, 2004). It can be seen as a collection of hardware, software, documentation, processes or other components required to implement approved changes to a DW (CaterSteel, 2006). In order to have a successful Release Management, it is very important to have a solid release planning; and to document and follow a standardized procedure for this process. A solid Release Management also implies standardized release naming and numbering conventions; assigned roles and responsibilities; and a release database with master copies of previous DW versions. Maturity Assessment Question(s) Therefore, the maturity assessment question for this component of the DW service processes is straightforward and can be seen below. 8) Which answer best describes the release management in your organization? a) Level 1 – Ad-hoc changes solving and implementation; no release naming and numbering conventions b) Level 2 – Release management is done following some procedures, but nothing is standardized or documented; release naming and numbering conventions c) Level 3 – Release management is documented and done following a standardized procedure; assigned release management roles and responsibilities d) Level 4 – Level 3) + standard reports concerning release management including measurements and goals are done on a regular basis; master copies of all software in a release secured in a release database e) Level 5 – Level 4) + release management trend analysis, statistics and planning. Table 48: Release Management Maturity Assessment Question. 5.3 Summary This chapter has offered a detailed image of the DW organization and processes benchmark variable and its main sub-categories: DW development processes and DW service processes. Just like in the previous chapter, we identified the main characteristics for each sub-category for each maturity stage and we presented the underlying maturity assessment questions. Now that the DWCMM with its main components and maturity assessment questionnaire have been presented, we will continue with presenting the results of the evaluation phase from our research process in the next chapter. - 77 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu 6 Evaluation of the DWCMM This section presents the results of two activities aimed at evaluating the model presented in the previous chapters. Chapter 6.1 is a report of the review of the model by five DW/BI experts from practice, and emphasizes the validity of the model. Chapter 6.2 describes an assessment of the case study results gotten by testing the model in four organizations. 6.1 Expert Validation To evaluate the utility and further revise the DWCMM, expert validation was applied. An ―expert‖ is defined by (Hoffman et al., 1995) as a person ―highly regarded by peers, whose judgements are uncommonly accurate and reliable, whose performance shows consummate skill and economy of effort, and who can deal effectively with rare or tough cases. Also, an expert is one who has special skills or knowledge derived from extensive experience with subdomains.‖ Therefore, eliciting knowledge from experts is very important and useful and can be done using several methods, one of them being structured and unstructured interviews (Hoffman et al., 1995). More information on interview techniques is given in 6.2.1. Moreover, five experts in data warehousing and BI were interviewed and asked to give their opinions about the content of the model we have developed. The interviews were structured, but consisted of open questions, in order to capture the knowledge of respondents. This offered the possibility of enabling the experts to liberally state their opinions and ideas for improvement. The expert panel consists of five experts from practice, each of them having at least 10 years of experience in the DW/BI field. All of them are DW/BI consultants at different organizations in The Netherlands (local or multinational). An overview of the experts and their affiliations (figures are taken from 2009 annual reports) is depicted in table 49.The expert interview protocol and questions can be seen in appendix D. ID Job Position 1 CI/BI consultant Industry DW/BI Consulting B2B ≈ 45 Market Employees Respondent 2 3 Principal consultant/ BI consultant Thought leader BI/CRM Respondent Affiliation IT Services BI Consulting B2B ≈ 49000 B2B ≈ 35 Table 49: Expert Overview. 4 Principal consultant BI 5 BI consultant IT Services DW Consulting B2B ≈ 38000 B2B ≈1 6.1.1 Expert Review Results and Changes DWCMM First, the experts were asked to propose some categories that they would find important when assessing the maturity of a DW solution. Among the proposed categories we can mention: data structure, data - 78 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu architecture, metadata, masterdata, hardware, infrastructure, report architecture, security, management and maintenance, traceability within the DW. One expert said that other important aspects to be analyzed refer to whether the organization is doing ETL or ELT, and whether they are using real time data warehousing. Another critical point for success was considered to be the alignment between business and IT. As can be seen, some of the categories proposed by the experts can be found or are included in the categories from our model. The others (i.e.: data architecture, data governance, masterdata, traceability) can be considered to be further researched in the future. Furthermore, all reviewers gave positive feedback for their first impression of the DWCMM, said it made sense and it could be applied for assessing an organization‘s current DW solution. One of the experts noticed that the main sub-categories from the DW technical solution part were not on the same level because ―architecture‖ is usually a superset that includes data modelling, ETL and BI applications. Some experts said that in general the model seemed to be complete, but that of course, probably small changes could be made or new categories/sub-categories should be added. Three of five reviewers stated that ―infrastructure‖ should also be added as a sub-category for the DW technical solution or should replace ―architecture‖. However, as already explained in the previous chapters, in literature, infrastructure usually refers to the hardware and software supporting architecture. Also, architecture usually refers to the logical architecture (i.e.: the data storage layers), application architecture (i.e.: ETL, BI applications), data architecture, technical architecture (i.e.: infrastructure). Our sub-category refers more to the logical architecture and some other elements such as: metadata, security, infrastructure, etc. Therefore, we agree that maybe the name ―architecture‖ could be a little bit confusing and we decided to change the name to ―General Architecture and Infrastructure‖ for the final model. One expert suggested that ―data modelling‖ should be changed to ―data management‖ that is a broader category which includes: data modelling, data quality and data governance. However, due to time constraints and the fact that we do assess data modelling and a little bit the data quality in our current model, we leave the data governance part to future research. The last comments regarding the structure of the DWCMM were related to ETL. One of the experts suggested that this sub-category should be called ―data logistics‖ as it could involve ETL or ELT. But, as we believe that ETL is the more common name and easier to be understood by the respondents who would take the assessment, we decided to leave it unchanged. Another expert proposed that a new subcategory called ―ETL Workflow‖ should be added. It would include the way fault tolerance is addressed, the ETL technical administration and generally how ETL processes are managed. We consider that this sub-category would not be on the same level as the other ones, and some of its elements are addressed in the ETL sub-category, and therefore, we decided not to include it in our model for the time being. The DWCMM Condensed Maturity Matrix All reviewers said they got a good first impression of the DWCMM Condensed Maturity Matrix and that it gives a good overview of the main goal of the model. Some experts pointed out that the characterization of ―architecture‖ for the highest maturity stage was not on the same level as the previous ones. After also doing the test case studies, we decided to change the fifth stage of maturity to ―DW/BI service that federates a central enterprise DW and other data sources via a standard interface‖. More comments on this change are given in the case study results paragraph. - 79 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Moreover, several suggestions were made regarding the ETL characterization for each stage. One expert suggested that more information should be given for each stage of ETL. Another one proposed that the characterization of ETL for the last level of maturity should be changed as ―real-time ETL‖ seems not to be on the same page as the ETL characterization done for the previous stages (i.e.: basic ETL, advanced ETL, etc.). The redefined matrix after the expert interviews can be depicted in figure 6. DW Maturity Assessment Questionnaire As with the previous two deliverables, all reviewers gave positive feedback for their first impression of the DW maturity assessment questionnaire. Some of the experts pointed out that, even if the chosen characteristics and questions are representative for the problem we would to address, they might not be enough to do an in-depth assessment of a specific DW environment. Therefore, most of the experts suggested that, when testing the model in practice, it would be very important to clearly state that the main goal of the model is to do a high-level DW maturity assessment and that the focus of the questionnaire is represented by the technical aspects of a DW solution. Furthermore, each expert had its own view on data warehousing and BI, and hence, it was difficult to sum up all their comments and integrate them in useful changed for our maturity questionnaire. Finally, we decided to split their feedback into two categories: proposed changes that due to time constraints and scope limitation were not implemented in the final version of the model, but should be considered for future research; and implemented improvement suggestions that involved some rephrasing or complete changing of questions and answers. We will give a short overview on the former further in this paragraph. The actually implemented changes can be seen in the redefined questionnaire in appendix C. The questions and answers that suffered changes are written in red so that the differences from the first version of the questionnaire can be better depicted. Also, the main questions and answers that were redefined can be seen in the table below. Category Rephrased Questions Questions Whose Answers Rephrased or Changed 1 Data Modelling 2, 4 – split in two questions 1, 8 ETL 2, 5 – split in two questions 3 – split in two questions 1, 2, 4 3 – split in two questions 2, 5, 8 3, 4, 5, 6, 7, 8 Architecture BI Applications Development Processes Service Processes Were Table 50: Rephrased or Changed Questions and Answers. Moreover, here are the main changes that were proposed by the experts, but were not implemented. All the experts suggested a version for the DW architecture found on the highest maturity level. We combined all the opinions and came up with the answer as shown later in this chapter. Three of five experts suggested that more attention should be paid on data quality as it is a very important issue nowadays, and that more questions should be added for tackling this problem. We have a question regarding data quality in the ETL section, but apparently, data quality should also be taken into consideration in the data modelling part. Therefore, due to time constraints and to the high level nature of our assessment, we leave this topic open for future research. One expert suggested that we should dive a - 80 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu little bit into cloud computing for data warehousing. We find this topic very interesting and important for the future of data warehousing and BI. However, due to time constraints, we could not find an efficient way to include this in our current model and we will leave it to future research. One expert said that we should analyze in more details how the actual monitoring and management of ETL is done, not just ask if they do this. But, as already mentioned, our assessment is a high level one and it tries to capture what is done and not how it is done. Hence, we decided not to implement this suggested change. Other proposals that we found interesting for assessing the maturity of a DW, but very hard to include in a model and questionnaire like ours refer to: judge how mature the organization is in adapting to new situations; address the right DW development methodology (e.g.: waterfall, iterative and incremental development, agile development, etc.) for the right category; or have a good strategy for tool management (e.g.: aspects related to pricing, licensing, etc.) and understand that there are more tools, but you need an integrated view to be successful. The last important suggestion was that a question for problem management should be added to the DW service processes category. We must admit that we also thought about it when developing the model, but we finally decided not to include this in our questionnaire. Problem is usually defined in ITIL as ―a condition that has been defined, identified from one large incident or many incidents exhibiting common symptoms for which the cause is unknown‖. Therefore, we believe that problem management is positively correlated with incident and change management, and it can be included in these processes. That is why for the time being, we will leave this question out. Besides the questions regarding the DWCMM, we also asked the experts whether weighting coefficients should be considered for computing the maturity scores. Two of them said that no general weights should be used as this would make the scoring rather difficult and these weighting coefficients should be situational, depending on each organization. One expert suggested that it would be interesting to have weights for both individual questions and sub-categories/categories. One expert believed that weighting factors should be used for the main sub-categories/categories. The last expert was not really sure about this aspect, saying that it would be interesting to have weights for each question, but the scoring will become very complicated. Therefore, due to the lack of unanimity regarding the adding of weighting coefficients to the questionnaire, and for the clarity of scoring, we decided to leave weights out for the time being. 6.2 Multiple Case Studies Depending on the nature of a research topic and the goal of a researcher, different research methods are appropriate to be used (Benbasat et al., 1987; Yin, 2009). One of the most commonly used ways to classify research methods is the distinction between qualitative and quantitative research methods. The research method applied here is case study research, a qualitative one. It is the most widely used qualitative research method in information systems (IS) research and is well suited to understanding the interactions between information technology (IT)-related innovations and organizational contexts (Darke et al., 1998). (Runeson & Host, 2009) suggest that case study research is also a suitable research methodology for software engineering since it studies contemporary phenomena in its natural context. Therefore, as research in data warehousing can be considered to be at the border between IS and software engineering, case study research is also appropriate here. According to (Yin, 2009), ―the essence of a case - 81 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu study is that it tries to illuminate a decision or set of decisions: why they were taken, how they were implemented, and with what result.‖ Hence, the case study method allows investigators to retain the holistic and meaningful characteristics of real-life events, such as organizational and managerial processes, for example. (Benbasat et al., 1987) consider that a case study examines a phenomenon in its natural setting, employing multiple methods of data collection to gather information from one or a few entities (i.e.: people, groups or organizations). As our research is developing a DWCMM, it is part of the IS/software engineering field and it is suited for both technical and organizational issues. Therefore, case study research seems as an appropriate choice that will help us capture knowledge from practitioners, test and validate the created models and theories. In order to enrich and validate our model in practice, four organizations were chosen to take our DW maturity assessment and the results will be furthered presented in this chapter. 6.2.1 Case Study Approach Case study research can be used to achieve various research aims: to provide descriptions of phenomena, develop theory and test theory (Darke et al., 1998). But, no matter of its final goal, preliminary theory development as part of the case study design phase is essential (Yin, 2009). In our research, we will use it in order to test theory which in this case is the DWCMM we developed. The use of case study research to test theory requires the specification of theoretical propositions derived from an existing theory. The results of case study data collection and analysis are used to compare the case study findings with the expected outcomes predicted by the propositions (Cavaye, 1996). The theory is either validated or found to be inadequate in some way, and may then be further refined on the basis of the case study findings. Case study research may adopt single or multiple case designs. A single case study is appropriate where it represents a critical case (i.e.: it meets all the necessary conditions for testing a theory), where it is an extreme or unique case, or it is a revelatory case (Yin, 2009). Multiple case designs allow cross-case analysis and comparison, and the investigation of a particular phenomenon in diverse settings. Multiple case studies may also be selected to predict similar results (i.e.: literal replication) or to produce contrasting results for predictable reasons (i.e.: theoretical replication) (Yin, 2009). As according to (Benbasat et al., 1987) and (Yin, 2009), multiple case studies are preferred over single case studies designs to get better results and analytic conclusions, we decided to conduct a multiple case study research following (Yin, 2009) case study approach as depicted in figure 10. - 82 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Figure 10: Case Study Method (adapted from (Yin, 2009)). Therefore, the main steps in case study research are (Runeson & Host, 2009; Yin, 2009): Case study design – research objectives are defined and the case study is planned. This is also where theoretical development is done, as described in chapters 3-5. Preparation for data collection – procedures and protocols for data collection are defined. This is also where cases are found and selected to evaluate and test the theory. The main criterion used in the search for suitable organizations was that all approached organizations had a professionally DW/BI system in place whose maturity could be assessed by applying the DWCMM. Furthermore, an important criterion for the selection of respondent per case was that the interviewed respondents had an overall view on the technical and organizational aspects for the DW/BI solution implemented in their organization. As (Yin, 2009) suggests that at least three case studies should be used, four test organizations have been found that agreed on cooperating in our research and taking the maturity assessment (an overview is provided in paragraph 6.2.2). While selecting the case studies, the case study and data collection protocols were also defined. The protocol contains the instrument, but also the procedures and general rules to be followed. It is essential when doing a multiple-case study to increase the reliability of the case study research and guide the investigator in carrying out the data collection. Collecting evidence – execution with data collection on the studied case. Typically, multiple data collection methods are employed in case study research to increase the validity of the results. Also, multiple sources of evidence are used such as (Yin, 2009): documentation – various written material; archival records – organizational charts, service, personnel and financial records; interviews – open, structured or semi-structured interviews; direct observation – observe and absorb details, actions or subtleties of the environment; physical artifacts – devices, tools, instruments. In this research, the data collected in the cases consists of four interviews and documentation. The interview lasts around 1.5 hours and consists mainly of the maturity assessment questionnaire itself, but it also has a few open questions. This allows the researcher to guide and control the interview, while leaving some room available for discussions in order to capture the suggestions and questions that the respondents might have. The purpose was not only for organizations to take the maturity assessment, but also to use the knowledge of the respondents in order to improve the questionnaire. For analyzing purposes, interviews have been - 83 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu digitally recorded, transcribed and validated by the respondents. For purposes of consistency, the interview protocol is enclosed in appendix E. Mainly, the interview consisted of three parts: General questions about the organization and the respondent‘s role in the DW/BI project. The DW maturity assessment questionnaire (i.e.: DW General Questions; DW Technical Solution; DW Organization and Processes). Final questions and suggestions. Analysis of collected data – data analysis can be quantitative and qualitative. In this research, we will do a qualitative data analysis to derive conclusions from the interviews and improve our model. The remainder of this chapter discusses the overall findings of the case studies, including a short description per case and analysis of the results. Despite the fact that all individual cases are interesting, we will focus on the overall results. Reporting – the report communicates the findings of the study, but it is also the main source of information for judging the quality of the study. Therefore, the master thesis document itself will serve as the case study report. 6.2.2 Case Overview The case studies have been conducted at four organizations of different sizes, operating in several types of industries and offering a wide variety of products and services. An overview of the case study organizations (figures are taken from 2009 annual reports) and respondents is depicted in table 51. As the technologies used for developing each component of the DW can help us shape a better image on the DW solution and its maturity, an overview of these technologies is also offered in table 52. For each case, a short description is provided in the following subparagraphs. A short analysis on the maturity scores each organization got after taking the assessment is also given further in this chapter. However, due to confidentiality reasons, the individual answers for each question and the feedback given to each organization are not published in the official version. Organization Industry Market Revenue Employees Respondent Function A B Retail Insurance B2C 19.94 billion € ≈ 138000 BI consultant C Retail B2B & B2C B2C 4.87 billion € 780 million € ≈ 4500 ≈ 3660 DW/BI technical BI manager architect Table 51: Case and Respondent Overview. D Maintenance & Servicing B2B NA ≈ 3500 BI consultant & DW lead architect - 84 - DWCMM: The Data Warehouse Capability Maturity Model Organization Developing Category Data Modelling NA Power Designer SAP Extract/Transform/Load (ETL) IBM InfoSphere DataStage IBM InfoSphere DataStage SAP BI Applications Microstrategy & Business Objects Database Organization A Organization B Catalina Sacu Organization C Cognos & SAS; QlikView in-house Business Objects IBM DB2 Oracle DB IBM DB2 Table 52: Technologies Usage Overview. Organization D Visio, Word, PowerPoint Oracle Warehouse Builder Oracle BI Enterprise Edition Oracle DB 6.2.2.1 Organization A Organization A is an international food retailer headquartered in Western Europe. It has leading positions in food retailing in key markets. These positions are built through strong regional companies going to market in a variety of food store formats. The operating companies benefit from the Group‘s global strength and best practices. Their strategy remains organized around the three pillars of profitable top-line growth, the pursuit of excellence in execution and corporate citizenship. Organization A considers that in a high-volume industry characterized by low margins such as food retail, excellent execution offers a true competitive advantage. This is the reason why sophisticated tools, state-of-the-art systems and streamlined processes are implemented to serve as the foundation for profitable growth and good returns. Connecting and converging tools, systems, processes and people help the operating companies to address both current and future challenges with cost-effective and integrated solutions. DW General Information The main drivers for developing a DW at organization A were to improve managerial decisions and increase profit. For a supermarket it is very important to store data at a high granularity in the DW. In this way operations can be closely monitored and different types of BI applications can be developed. The main activities done using the DW/BI project are: reporting and dashboarding on the main KPIs on profit margins, store usage, store losses, etc. Also, some data mining to determine which products are most often sold together. In this way a better product placing and promotion decisions can be achieved. Organization A has been using DW/BI for almost 10 years and executives perceive the DW/BI environment as a tactical resource (i.e.: a tool to assist decision making) and their goal is for the DW/BI to become a competitive differentiator (i.e.: key to gaining or keeping customers and/or market share) in the future. In general, the returns gotten from the DW are higher than its costs, data quality is high which can also be seen in the relatively high end-user adoption. As the DW environment is considered an important factor for success, the budget owner is the business part and a relatively high percent of the annual IT budget is allocated to the DW/BI department. 6.2.2.2 Organization B Organization B, situated in Western Europe, is a major player in the insurance market. Established in the eighteenth century, it has a century long tradition and experience in this field. It offers both private - 85 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu individuals and companies a wide array of life, non-life medical and disability insurances, and also mortgage, savings and investment products. The distribution is made via several channels: brokers, consultants working on commission, banks, independent intermediaries and direct contact. DW General Information The main driver for developing a DW at organization B was that business analysts and controllers needed integrated data in order to make their own reports, analyses, etc. Another requirement came from the consumer intelligence department that wanted to have a broader view on the whole company portfolio and on each customer portfolio. Organization B has been using DW/BI for almost 10 years and the DW solution is perceived as an operational cost center (i.e.: an IT system needed to run the business) and in certain situations, as a tactical resource. Data quality is considered to be a business responsibility and therefore the ―garbage in, garbage out‖ principle is applied. However, a lot of attention is paid to the software and development processes quality in the technical department. The DW solution is not a big success in organization B as the end-user adoption is not high due to distrust in the data and the DW solution itself. DW/BI has decentralized budget owners as business lines have their own budgets and each one of them decides what to spend on BI, but generally around 5% of the IT budget is allocated for this department. 6.2.2.3 Organization C Organization C is a supermarket chain with local activities in a western European country. They are a unique and ambitious organization due to their cooperative nature: an intense cooperation between the organization and its members. They are a flat organization with short communication lines, a complete and modern logistics system and a low cost structure. This organization offers a wide diversity of food related products and their goal is an optimal service to customers. They try to optimize their business outcomes by stacking purchasing volumes and ongoing cost control. DW General Information The main driver for developing a DW at organization C was that management could make better decisions based on the right data. As organization C is a supermarket, some of the drivers are the same as the ones for organization A. A DW/BI solution is very important for food retailers as they have many and diverse products and a lot of transactions take place every day. At organization C, the DW is considered the only viable solution for reporting and data analysis. Organization C has been using DW/BI for almost 15 years and it has evolved a lot since the first solution was implemented. Executives perceive the DW/BI environment as a tactical resource and that is why data quality is considered very important. It could be said that the returns are higher than the costs as, due to high data quality and ―one version of the truth‖, end user adoption is good and executives are able to make better and faster decisions. The budget owner of the DW is the Chief Financial Officer (CFO) and usually, 10% of the annual IT budget is allocated to the DW/BI environment for continuous improvement. - 86 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu 6.2.2.4 Organization D Organization D is one of the leading providers of rolling stock maintenance in Western Europe. They provide rolling stock availability and reliability for numerous passenger and freight carriers from across Europe. In addition to short-term maintenance, organization D also offers routine servicing. This covers minor repairs as well as the cleaning of interiors and exteriors, including the removal of graffiti. Customer and performance focus play an essential part in the business partnership between organization D and its customers. That is why they are closely involved in all stages of a customer‘s project in order to avoid unnecessary work being carried out. Another important aspect for organization D is innovation. They invest in high-technology workshops and state-of-the-art equipment. Only by keeping up with the latest technology can they offer specialized services to customers at the forefront of the rapidly changing rail transport market. DW General Information The main driver for developing a DW at organization D was the need for high data quality and consistency in order for the business (especially middle and higher management) to make the right decisions. In the beginning, it was focused on the operational side, but now the main focus is on the financial one; and the main goal for this year is to integrate the two solutions in a single DW. Organization D has been using DW for 3.5 years and it started out as a tactical resource. Nowadays, executives perceive it as a mission-critical resource and the goal for the near future is for the DW to become a strategic resource. Therefore, the DW solution and the way it is perceived in the organization have developed a lot since it was first implemented. In general, there is a positive net result when comparing the returns and costs of the DW. The main benefits include a high data quality and end-user adoption. From this point of view, the DW/BI solution has achieved its goal. The budget owner of the DW is the Chief Financial Officer (CFO) and usually, less than 5% of the annual IT budget is allocated to the DW/BI environment. 6.2.2.5 Case Study Analysis In this section, a short analysis of the results gotten by all the organizations after filling in the assessment questionnaire will be given. The maturity scores regarding the implemented DW solution obtained by the organizations can be seen in the table below. Maturity Score Benchmark Category Architecture Data Modelling ETL BI Applications Development Processes Service Processes Organization A Organization B Organization C 2.67 2.56 2.17 3.44 3.14 3.29 2.71 2.71 2.90 3.19 2.63 3.00 Table 53: Organizations‘ Maturity Scores. 3.89 3.00 3.71 3.43 3.66 2.87 Organization D 3.55 4.11 2.86 3.57 3.02 3.12 - 87 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu As shown in the picture depicting our model, a better way to see the alignment between the maturity scores for the six categories is by drawing the radar graph. The radar graphs for all the organizations can be seen in the figures below. Service Processes Architecture 5 4 3 2 1 0 Development Processes Data Modelling Organization A Ideal Situation ETL BI Applications Figure 11: Alignment Between Organization A‘s Maturity Scores. Service Processes Architecture 5 4 3 2 1 0 Development Processes Data Modelling Organization B Ideal Situation ETL BI Applications Figure 12: Alignment Between Organization B‘s Maturity Scores. Service Processes Architecture 5 4 3 2 1 0 Development Processes Data Modelling Organization C Ideal Situation ETL BI Applications Figure 13: Alignment Between Organization C‘s Maturity Scores. - 88 - DWCMM: The Data Warehouse Capability Maturity Model Service Processes Architecture 5 4 3 2 1 0 Development Processes Data Modelling Catalina Sacu Organization D Ideal Situation ETL BI Applications Figure 14: Alignment Between Organization D‘s Maturity Scores. Some more information regarding the maturity scores for all the four case studies can be seen in the table below. Organization Maturity Score Total Maturity Score for DW Technical Solution Total Maturity Score for DW Organization & Processes Overall Maturity Score Highest Score Lowest Score A B C D 2.67 3.00 3.51 3.52 2.77 3.10 3.26 3.07 2.72 ETL - 3.14 3.05 Data Modelling 3.44 Architecture - 2.56 3.38 Architecture 3.89 Service Processes - 2.87 3.29 Data Modelling 4.11 ETL - 2.86 Data Modelling 2.17 Table 54: Maturity Scores Analysis. As can be seen from table 53, maturity scores for each sub-category are usually between 2 and 4, with one exception: organization D scored 4.11 for Data Modelling. Thus, the overall maturity scores and the total score per category also ranged between 2 and 4 which shows that most organizations are probably somewhere between the second and fourth stage of maturity. The highest maturity score was gotten by organization C, and the lowest one by organization A. Apparently, an overall score close to 4 or 5 is quite difficult to achieve. This is usually normal in maturity assessments, as in practice, nobody is so close to the ideal situation. It will be interesting to see the range of scores after the questionnaire will have been filled in by a large number of organizations. From table 54 it can be seen that the categories with the highest and lowest scores are diverse depending on the organization. For example, organization A scored lowest for Data Modelling, whereas Data Modelling was the most mature variable for organization D. Interesting conclusions can also be drawn if comparing the scores for organizations A and C as they are part of the same industry. The former is an international food retailer and has more experience in this industry, whereas the latter is a local one with less experience. However, organization A got a quite low DW maturity score. Thus, experience in the industry does not also mean maturity in data warehousing. Of course, more factors can influence this - 89 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu difference in scores: size, the way data warehousing/BI is embedded in the organizational culture, the percentage from the IT budget for BI, etc. As presented in the previous chapters, the goal of our model is not only to give a maturity score to a specific organization, but also provide them with some feedback and the necessary steps for reaching a higher maturity stage. For example, the overall maturity score for organization A is 2.72, which leaves a lot of room for improvement. Moreover, as the lowest score is for Data Modelling, a good starting point would be this category. Due to confidentiality reasons, more details regarding the maturity scores and feedback cannot be offered here. The template used for giving feedback to the case studies can be seen in appendix F. 6.2.2.6 Benchmarking As already mentioned in the previous chapters, the DWCMM can serve as a benchmarking tool for organizations. The DW maturity assessment questionnaire provides a quick way for organizations to assess their DW maturity and, at the same time, compare themselves in an objective way against others in the same industry or across industries. Of course, better results will be achieved for benchmarking after more organizations will take the maturity assessment. However, in order to have a better image on how the graph will look like when doing benchmarking, we will give an example here using the data from the case studies we performed. A bar chart comparing organization A‘s scores with the best practice and with the average maturity score is shown below. Service Processes Development Processes BI Applications Average Score Best Practice ETL Organization A Data Modelling Architecture 0 1 2 3 4 5 Figure 15: Benchmarking for Organization A. 6.2.3 Case Studies Results and Conclusions From the results gotten from the case studies, it can be said that the DWCMM could be successfully applied in practice. However, this part of the validation process had a multiple goal: First we wanted to see if organizations can understand the questions and the answers match to their specific situation. Second we wanted to see if the scoring method works and if we can offer specific feedback for each organization to achieve a higher maturity level. - 90 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Last, but not least, we wanted to receive feedback from them regarding the questions and their answers. Therefore, depending on the suggestions of each interviewee, we made the following changes and drew some conclusions. An overview of these changes and conclusions is given further in this paragraph. The final version of the questionnaire is shown in appendix B. The main changes were done after the first case study. The changes proved to be successful as the same problems were not met again at the following case studies. The first interviewee suggested that in order to judge the maturity of DW/BI in an organization, it is also critical to see how strongly it is embedded in the organizational culture and how important it is considered for the organization. As this is very hard to assess, a first step was to add the following question to the DW General Questions: What percentage of the IT department is taking care of BI? (i.e.: how many people from the total number of IT employees?). Moreover, the answers from questions 3 and 4 regarding ETL suffered some minor changes as it was hard for the respondent to choose the most appropriate answer for his organization. A little bit of confusion was also created by the answers of questions 1 (i.e.: Which types of BI applications best describe the highest level purpose of your DW environment?) and 6 (i.e.: Which BI applications delivery method best describes the highest level purpose of your DW environment?) from the BI applications part. This is the reason why we decided to arrange the answers in a hierarchical order so that it would be clear that even if more answers match the company‘s current situation, only the one with the highest complexity will be scored. Several questions from the DW Organization and Processes part also suffered minor changes. For example, answer d) from question 2 regarding the DW development processes changed from ―some separation between environments (i.e.: at least three environments) with automatic transfer between them‖ to ―some separation between environments (i.e.: at least two environments) with automatic transfer between them‖. An important change was made to the last question from DW development processes regarding the testing and acceptance phase. As it proved to be difficult for the interviewee to match an answer to the current organizational situation, we decided to change the layout of the question to a multiple choice one. Therefore, we consider that there are seven main elements that determine the maturity of the testing and acceptance phase: unit testing by another person, system integration testing, regression testing, user training, acceptance testing, standard procedure and documentation, external assessments and reviews. Respondents can now choose all the elements characteristics to their organization and we will give a normalized score, between 1 and 5, in order to match the overall scoring method. The last change we made after the first case study was to arrange the answers from the DW service processes questions in a hierarchical order as organizations usually need to fulfill the requirements for an inferior level in order to get to a higher one. Apparently, the mixed answers created a little bit of confusion for the respondent. The problem with the hierarchical order was that respondents might give a biased response, but when tested among the other three case studies, this did not seem to happen as we got diverse scores depending on the question and the organization. Of course, more feedback regarding this aspect will be received after testing the questionnaire in more organizations. - 91 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu As already mentioned, most of the changes were made after the first case study. However, after receiving the feedback from all the four respondents, we decided that further changes are needed for improving the DW maturity assessment questionnaire. First, the answer we proposed for the highest level of maturity for the first question regarding the predominant architecture of the DW: ―a virtual integrated DW‖ will be changed to ―a DW/BI service that federates a central enterprise DW and other data sources via a standard interface‖. To further accelerate development and adapt quickly to changing business needs, mature organizations can redistribute some development tasks to the business units and departments. However, a central DW is needed as a repository for information shared across business units. Distributed groups are just allowed to build their own applications within a framework of established standards, often maintained by a center of excellence. In order to be successful, DW/BI solutions have to first be centralized and later federated (Eckerson, 2004). Another way to accelerate the development of BI-enabled solutions is for organizations to use service oriented architecture (SOA). By wrapping BI functionality and query object models with Web services interfaces, developers can make DW/BI capabilities available to any application regardless of the platform it runs on or programming language it uses. As the previous answer was not very clear to the interviewees, we believe that the latter will provide the right meaning for future respondents. Furthermore, question 2 from Data Modelling is rather complex as it involves the synchronization between a wide range of data models and maybe in the future it would be better if separated into more questions. However, so far we have not come to an agreement how a better question and answers would look like. Minor changes were made to the answers found on the second stage of maturity for questions 4, 5 and 6 regarding the standards and metadata for data modelling. For questions 4 and 5, we decided to change the characteristic for the second level of maturity from ―solution dependent standards implemented for some of the data models‖ to ―solution dependent standards‖ to make the distinction between this maturity stage and the next one even stronger, as the next stage of maturity already involves enterprise-wide (or team-wide) standards for some data models. A similar argument stands for changing the characteristic on the second maturity stage for question 6 (regarding metadata management for data models) from ―non standardized documentation for some of the data models‖ to ―non standardized documentation‖. Another question whose answers created some confusion was the question related to metadata management for ETL. From the literature study we have done, the conclusion was drawn that usually organization manage the business and technical metadata for some or all ETL, and usually the ones with a broad experience in this field also manage the process metadata. However, one of the respondents answered that they manage process and technical metadata for all ETL and business metadata only for some ETL. Therefore, we consider that the answers to the question: “To what degree is your metadata management implemented for your ETL?” may be something like the ones proposed here: a) no metadata management; b) only one type of metadata managed (i.e.: business, technical or process); c) two types of metadata managed (i.e.: whichever combination between business, technical and process); d) all three types of metadata (i.e.: business, technical and process) managed for some ETL; d) all three types of metadata managed for all ETL. However, these new characteristics need to be further tested in practice to be validated. - 92 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Moreover, another change to be considered is for question 7 from the DW development processes regarding the DW project management. There are usually five main elements that determine the maturity of a DW project management: project planning and scheduling, project tracking and control, project risk management, standard procedure and documentation, and evaluation and assessment. Therefore, we believe that a better layout and scoring for this question would be one similar to the one proposed for the testing and acceptance phase. While doing the case studies, we came up with a general question regarding the DW service processes that includes the main activities from this phase. The concept of this question is the same as the one proposed for the testing and acceptance phase from the DW development processes. We had the opportunity to test it and score it for the last three case studies, but we did not include its score in the final result in order to be able to compare all the four case studies on the same level. The question seems to work in practice, although, as with the other questions with similar layout (i.e.: the ones for testing and acceptance, project management, etc.), it is quite difficult to judge which characteristics should be on which maturity stage. A last remark is related to the questions on the defined, documented and implemented standards. As one of the experts suggested, we divided the questions related to standards into two separate ones: the first regarding the definition and documentation of standards, and the second one regarding the actual implementation and following of standards. After testing the model, we saw that some organizations consider these two aspects synonyms and sometimes fill in the same answers. However, as we believe that this distinction should be made and, as we cannot generalize how other organizations would see this problem, we will leave the questions separated for the time being. To sum up, the DW maturity assessment questionnaire can be successfully applied in practice. We generally received positive feedback regarding the questions and their answers from the case study interviewees. In this way, we could test whether the questions and their answers are representative for assessing the current DW solution for a specific organization and if they can be mapped to any organization depending on the situational factors. We also had the chance to apply the scoring method and give appropriate feedback for each case study. Finally, we combined all the feedback received from the case studies and did some small, but valuable changes to some questions and answers which improved our DWCMM as a whole. 6.3 Summary This chapter presented the results of the two main activities done for evaluating our model: five expert interviews and four case studies. We started by giving an overview of the experts and their affiliations, and then we showed the main changes that resulted after the five expert interviews. We continued with presenting the four case studies and the underlying respondents. Finally, we did an analysis of the maturity scores gotten by the cases and we illustrated the changes made to the questionnaire that followed the case studies. In the next chapter, we conclude our research and present some discussion points and possible future work. - 93 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu 7 Conclusions and Further Research In this section, the main conclusions of this study are presented. Subsequently, some critical analysis of the results is done and finally, recommendations for future research are made. 7.1 Conclusions This research has been triggered by the estimates made by (Gartner, 2007) and other researchers that more than fifty percent of DW projects have limited acceptance or fail. Therefore, we developed a Data Warehouse Capability Maturity Model (DWCMM) that would help organizations assess their current DW solution and provide guidelines for future improvements. The main elements that usually influence the success of a DW environment are: technical components, end user adoption and usage, and business value. However, we limited our research to the technical components, due to time constraints and the fact that a solid technical solution usually is the foundation for the other two elements to be successful. In this way we attempted to answer the main research question for our study: How can the maturity of a company’s data warehouse technical aspects be assessed and acted upon? The main conclusion from our study is that, even if our maturity model could help organizations improve their DW solutions, there is no ―silver bullet‖ for a successful development of DW/BI solutions. The DWCMM provides a quick way for organizations to assess their DW/BI maturity and compare themselves in an objective way against others in the same industry or across industries. It received positive feedback from the five experts that reviewed and validated it and it also resonated well with the audiences from our four case studies. However, it is critical to emphasize the fact that the model only does a high-level assessment. In order to truly assess the maturity of their DW/BI solutions and discover the strong and weak variables, organizations should use our assessment as a starting point for a more thorough analysis. Our research also showed that the model can be applied to a wide diversity of organizations from different industries, but the results and guidelines for future improvement depend on some situational factors specific for each organization. According to the experts that validated our model, some important situational factors are: whether data warehousing and BI can act as a differentiator in their specific industry, the size of the organization, their budget (especially the one for DW/BI), the organizational culture regarding DW/BI, etc. Moreover, our main research question is split into several more specific sub-questions: 1) What are data warehouses and business intelligence? 2) What do maturity models represent and which are the most representative ones for our research? The answers to these two questions are the foundation for our research and the main ―artifact‖ we have delivered – the DWCMM. In this way, we presented some theoretical background on the key concepts of the study – data warehousing, business intelligence, maturity modelling – and discovered the research gap that our model could fill in – the lack of a maturity model that would help organizations take a snapshot of their current DW/BI technical solution and provide them systematic steps for achieving a higher maturity stage, and thus, a DW/BI environment that would deliver better results. - 94 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu 3) What are the most important variables and characteristics to be considered when building a data warehouse? This question is addressed by the DWCMM and its components as presented in chapter 3. As already mentioned, the model we developed is limited to the technical aspects and it considers two main benchmark variables/categories for analysis, each of them having several sub-categories (i.e.: DW Technical Solution – Architecture, Data Modelling, ETL, BI Applications; and DW Organization and Processes – Development Processes, Service Processes). 4) How can we design a capability maturity model for a data warehouse assessment? The answer to this question is offered by the DW Maturity Assessment Questionnaire and the underlying Maturity Scoring and Maturity Matrices. The questionnaire includes several questions for each benchmark category and sub-categories. In the end, a maturity score is given for each sub-category and category, and of course, the end result is an overall maturity score. Depending on this, an informative maturity stage can be pointed out for a specific organizations and some general feedback regarding the maturity scores and future steps for improvement will be outlined. 5) To which extent does the data warehouse capability maturity model result in a successful assessment and guideline for the analyzed organizations? Once we developed the model, an evaluation phase was necessary to test its validity and depending on the results, make the necessary adjustments. The DWCMM with all its components was initially reviewed by five notable experts in this field, and then, tested in four organizations to see whether it can achieve its goal or not. The model received positive feedback in general from the experts, and several minor changes were made as can be seen in appendix C. Furthermore, the experts pointed out that even if the model succeeds to emphasize the most important aspects involved in the development of a DW/BI project, it might not be complete. Other benchmark categories and sub-categories should be added in the future. Also, the DWCMM can serve as a high level technical assessment, but more questions and thorough analysis are needed to dig deeper into the strengths and weaknesses of the DW/BI environment. The four case studies offered us the possibility to test the model in practice. Generally speaking, the model seemed to deliver the desired results. The respondents identified the categories and sub-categories from the model and the questions and answers were usually well understood. Depending on their comments, we did several readjustments, so that the assessment would be clearer and better understood by future respondents. Moreover, the scoring method seemed to work well, and we were also able to offer feedback to our respondents. Of course, we believe that more valuable feedback could be given in the future by someone with more experience in this field. An observation at this point is that we cannot track what the organizations are going to do with the results from this assessment and whether they are actually going to take action to improve their DW/BI solution. 7.2 Limitations and Further Research For every scientific research project, it is important to elaborate on its objectivity and limitations. First of all, a limitation of this study is that it is based on the design science research which answers to research questions in the form of design artifacts. Being a qualitative research method, a risk for objectivity might arise. Hence, a certain influence of the experiences, opinions and feelings of the researcher on the analysis - 95 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu is possible. In our study, the main deliverables were developed by doing thorough literature study, but also in collaboration with a Dutch organization as described in the acknowledgements. Therefore, some slightly noticeable lack of impartiality might have slipped into the initial structure of the model. However, this point of weakness was minimized by doing the validation of the model with several experts in the field. Another limitation is the fact that our model was evaluated by conducting case study research. The DWCMM was tested in four organizations where the position of the respondents in the organization and their viewpoints might have biased the validation. For future reference, it would probably be advisable for at least two respondents from one organization to take the assessment. There is a higher chance that the results would be more objective. Also, due to the fact that the model was tested only in four cases, it is not possible to generalize the findings to any given similar situation. Therefore, for further research, it would be interesting to validate the model using quantitative research methods. An example would be to have the assessment questionnaire filled in by a large number of organizations in order to be able to do some statistical analysis on the data, more valuable benchmarking and improvements on the whole structure of the model. Another interesting approach would be to interview more experts from different organizations in order to come up with a different structure for the model, new benchmark categories and sub-categories and of course, new maturity questions and answers. Moreover, as suggested by the experts, new elements that could be analyzed further in the future are: data quality which is currently one of the most important reasons for DW/BI failure and data governance. These two elements could be both part of a bigger category, called data management. An important aspect to mention here is the fact that our research is limited to the technical aspects of a DW/BI project. Therefore, a point for future research would be to extend the model to the analysis of DW/BI end user adoption and business value. New benchmark categories and maturity assessment questions could be added regarding these two problems. Another future extension that would increase the value of the model could include questions and analysis for other types of data modelling (e.g.: normalized modelling, data vault, etc.) because, as stated earlier in this thesis, we limited our maturity assessment only to dimensional modelling. Last, but not least, as already mentioned, our model is a high level one. In the future, several questions could be added for a more detailed analysis of the current DW/BI environment and more valuable feedback offered to organizations. To sum up, this study can be seen as a contribution to understanding the main categories and elements that determine the maturity of a DW/BI project. The developed model serves as an assessment for the current DW/BI solution for a specific organization and offers guidelines for future improvements. As shown, the model received positive feedback when validated, but there is always room for improvement. And, due to the current economic situation, data warehousing and BI could really make a difference. According to (Gartner, 2009), in the near future, organizations will have high expectations from their BI and performance management initiatives to help transform and significantly improve their business. As can be seen, data warehousing, BI and performance management are becoming more and more valuable to organizations and a lot of developments could and will be done in this field and in our model, as a consequence. - 96 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu 8 References Aamodt, A., & Nygård, M. (1995). Different Roles and Mutual Dependencies of Data, Information and Knowledge. Data and Knowledge Engineering, 16 , 191-222. AbuAli, A., & Abu-Addose, H. (2010). Data Warehouse Critical Success Factors. European Journal of Scientific Research, 42 , (2), 326-335. AbuSaleem, M. (2005). The Critical Success Factors of Data Warehousing. Retrieved June 24, 2010, from Master's Degree Programme in Advanced Financial Information Systems: http://www.pafis.shh.fi/graduates/majabu03.pdf Ackoff, R. (1989). From Data to Wisdom. Journal of Applies Systems Analysis, 16 , 3-9. Agresti, W. (2000). Knwoledge Management. In M. Zelkowitz, Advances in Computers (pp. 171-283). London: Academic Press. Aldrich, H., & Mindlin, S. (1978). Uncertainty and Dependence: Two Perspectives on Environment. In L. Karpik, Organization and Environment: Theories, Issues and Reality (pp. 149-170). London: Sage Publications Inc. April, A., Hayes, J., Abran, A., & Dumke, R. (2004). Software Maintenance Maturity Model: the Software Maintenance Process Model. Journal of Software Maintenance and Evolution: Research and Practice, 17 , (3), 197 - 223. Arnott, D., & Pervan, G. (2005). A Critical Analysis of Decision Support Systems Research. Journal of Information Technology, 20 , (2), 67-87. Azvine, B. C. (2005). Towards Real-Time Business Intelligence. BT Technology Journal, 23 , (3), 214-225. Batory, D. (1988). Concepts for a Database System Synthesizer. Proceedings of International Conference on Principles of Database Systems. Paris. Becker, J., Knackstedt, R., & Poppelbus, J. (2009). Developing Maturity Models for IT Management: A Procedure Model and its Application. Business & Information Systems Engineering, 1 , (3), 213-222. Benbasat, I., Goldstein, D., & Mead, M. (1987). The Case Research Strategy in Studies of Information Systems. MIS Quarterly, 11 , (3), 369-386. Bennett, K. (2000). Software Maintenance: a Tutorial. In M. Dorfman, & R. Thayer, Software Engineering (pp. 289303). Los Alamitos: IEEE Computer Society Press. Blumberg, R., & Atre, S. (2003). The Problem with Unstructured Data. Retrieved July 23, 2010, from Information Management: http://www.information-management.com/issues/20030201/6287-1.html Boehm, B. (1988). A Spiral Model for Software Development and Enhancement. IEEE, 21 , (5), 61-72. Boisot, M., & Canals, A. (2004). Data, Information and Knowledge: Have We Got It Right? Journal of Evolutionary Economics, 14 , (1), 43-67. Breitner, C. (1997). Data Warehousing and OLAP: Delivering Just-in-Time Information for Decision Support. Proceeding of the 6th International Workshop for Oeconometrics. Karlsruhe, Germany. Breslin, M. (2004). Data Warehousing - Battle of the Giants: Comparing the Basics of the Kimball and Inmon Models. Business Intelligence Journal, 9 , (1), 6-20. - 97 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Bruckner, R., List, B., & Schiefer, J. (2002). Striving Towards Near Real-Time Data Integration for Data Warehouses. In Lecture Notes in Computer Science (pp. 173-182). Berlin: Springer. Cater-Steel, A. (2006). Transforming IT Service Management - the ITIL Impact. Proceedings of the 17th Australasian Conference on Information Systems. Adelaide, Australia. Cavaye, A. (1996). Case Study Research: A Multifaceted Research Approach for Information Systems. Information Systems Journal, 6 , 227-242. Chamoni, P., & Gluchowski, P. (2004). Integrationstrends bei Business-Intelligence-Systemen, Empirische Untersuchung auf Basis des Business Intelligence Maturity Model. Wirtschaftsinformatik, 46 , (2), 119-128. Chauduri, S., & Dayal, U. (1997). An Overview of Data Warehousing and OLAP Technology. ACM Sigmod Record, 26 , (1), 65-74. Chen, P. (1975). The Entity-Relationship Model — Toward a Unified View of Data. Proceedings of the International Conference on Very Large Data Bases, (pp. 9-36). Framingham, Massachusetts, USA. Choo, C. (1995). Information Management for the Intelligent Organization. Medford, NJ: Information Today, Inc. Chung, W., Chen, H., & Nunamaker Jr., J. (2005). A Visual Framework for Knowledge Discovery on the Web: An Empirical Study of Business Intelligence Exploration. Journal of Management, 21 , (4), 57-84. Codd, E. (1970). A Relational Model for Large Shared Data Banks, 13. Communications of the ACM, 13 , 377-387. Colin, R. (2004). An Introductory Overview of ITIL. Reading, United Kingdom: itSMF Publications. Darke, P., Shanks, G., & Broadbent, M. (1998). Successfully Completing Case Study Research: Combining Rigour, Relevance and Pragmatism. Information Systems Journal, 8 , (4), 273-289. Davenport, T., & Prusak, L. (2000). Working Knowledge: How Organizations Manage What They Know. Harvard: Harvard Business Press. Dayal, U., Castellanos, M., Simitsis, A., & Wilkinson, K. (2009). Data Integration Flows for Business Intelligence. Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology (pp. 1-11). Saint Petersburg, Russia: ACM. de Bruin, T., Freezey, R., Kulkarniz, U., & Rosemann, M. (2005). Understanding the Main Phases of Developing a Maturity Assessment Model. Proceedings of the 16th Australasian Conference on Information Systems. Sydney, Australia. Devlin, B., & Murphy, P. (1988). An Architecture for a Business and Information Systems. IBM Systems Journal, 27 , (1). Drucker, P. (1999). Management Challenges for the 21st Century. Oxford: Butterworh-Heinemann. Eckerson, W. (2009). Delivering Insights with Next Generation Analytics. Retrieved April 23, 2010, from The Data Warehousing Institute: http://tdwi.org/research/2009/07/beyond-reporting-delivering-insights-with-nextgenerationanalytics.aspx?tc=page0 Eckerson, W. (2004). Gauge Your Data Warehousing Maturity. Retrieved July 3, 2010, from The Data Warehousing Institute: http://tdwi.org/Articles/2004/10/19/Gauge-Your-Data-Warehousing-Maturity.aspx?Page=2 - 98 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Eckerson, W. (2006). Performance Dashboards. New Jersey: John Wiley & Sons, Inc. Eckerson, W. (2007). Predictive Analytics: Extending the Values of Your Data Warehousing Investment. Retrieved June 30, 2010, from SAS: http://www.sas.com/feature/analytics/102892_0107.pdf Fayyad, U., Gregory, P., & Padhraic, S. (1996). From Data Mining to Knowledge Discovery in Databases. The AI Magazine, 17 , (3), 37-54. Feinberg, D., & Beyer, M. (2010). Magic Quadrant for Data Warehouse Database Management Systems. Retrieved July 21, 2010, from Business Intelligence: http://www.businessintelligence.info/docs/estudios/Gartner-MagicQuadrant-for-Datawarehouse-Systems-2010.pdf Ferguson, R., & Jones, C. (1969). A Computer Aided Decision System. Management Science, 15 , (10), B550-B561. Fitzgerald, G. (1992). Executive Information Systems and Their Development in the U.K.: A Research Study. International Information Systems, 1 , (2),1-35. Galliers, R., & Sutherland, A. (1991). Information Systems Management and Strategy Formulation: the "Stages of Growth". Information Systems Journal, 1 , (2), 89-114. Gartner. (2007, February 1). Creating Enterprise Leverage: The 2007 CIO Agenda . Retrieved June 24, 2010, from Gartner: http://www.gartner.com/DisplayDocument?id=500835 Gartner. (2009). Gartner Reveals Five Business Intelligence Predictions for 2009 and Beyond. Retrieved August 6, 2010, from Gartner: http://www.gartner.com/it/page.jsp?id=856714 Golfarelli, M., & Rizzi, S. (2009). A Comprehensive Approach to Data Warehouse Testing. Proceeding of the ACM twelfth international workshop on Data warehousing and OLAP, (pp. 17-24). Hong Kong. Golfarelli, M., & Rizzi, S. (1998). A Methodological Framework for DW Design. Proceedings ACM First International Workshop on Data Warehousing and OLAP (DOLAP). Washington, D.C., USA. Golfarelli, M., & Rizzi, S. (1999). Designing the Data Warehouse: Key Steps and Crucial Issues. Journal of Computer Science and Information Management, 2 , (3). Golfarelli, M., Rizzi, S., & Cella, I. (2004). Beyond Data Warehousing - What's Next in Business Intelligence? Proceedings of the 7th ACM International Workshop on Data Warehousing and OLAP, (pp. 1-6). Washington, D.C., USA. Gorry, A., & Morton, S. (1971). A Framework for Information Systems. Sloan Management Review, 13 , 56-79. Gray, P., & Negash, S. (2003). Business Intelligence. Proceedings of the 9th Americas Conference on Information Systems, (pp. 3190-3199). Tampa, Florida, USA. Grönroos, C. (1990). Service Management and Marketing - Managing the Moments of Truth in Service Competition. Lexington: Lexington Books. Hakes, C. (1996). The Corporate Self Assessment Handbook, 3rd edition. London: Chapman & Hall. Hansen, W. (1997). Vorgehensmodell zur Entwicklung Einer Data Warehouse Losung. In H. Mucksch, & W. Behme, Das Data Warehouse Konzept (pp. 311-328). Wiesbaden: Gabler. - 99 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Hayen, R., Rutashobya, C., & Vetter, D. (2007). An Investigation of the Factors Affecting Data Warehousinf Success. Issues in Information Systems, 8 , (2), 547-553. Hevner, A., March, S., Park, J., & Ram, S. (2004). Design Science in Information Systems Research. Management Information Systems Quarterly, 28 , (1), 75-106. Hey, J. (2004). The Data, Information, Knowledge, Wisdom Chain: the Metaphorical Link. Retrieved June 27, 2010, from http://best.berkeley.edu/~jhey03/files/reports/IS290_Finalpaper_HEY.pdf Hoffman, R., Shadbolt, N., Burton, A., & Klein, G. (1995). Eliciting Knowledge from Experts: A Methodological Analysis. Organizational Behaviour and Human Decision Processes, 62 , (2), 129-158. Holsheimer, M., & Siebes, A. (1994). Data Mining: the Search for Knowledge in Databases (9406). Amsterdam: Centrum voor Wiskunde en Informatica. Hostmann, B. (2007). BI Competency Centers: Bringing Intelligence to the Business. Retrieved July 3, 2010, from Business Performance Management: http://bpmmag.net/mag/bi_competency_centers_intelligence_1107/index2.html Huber, G. (1984). Issues in the Design of Group Decision Support Systems. MIS Quarterly, 8 , (3), 195-204. Humphries, M., Hawkins, M., & Dy, M. (1999). Data Warehousing: Architecture and Implementation. New Jersey: Prentice Hall PTR. Husemann, B., Lechtenborger, J., & Vossen, G. (2000). Conceptual Data Warehouse Design. Proceedings of the International Workshop on Design and Management of Data Warehouses. Stockholm, Sweden. Hwang, H., Ku, C., Yen, D., & Cheng, C. (2005). Critical Factors Influencing the Adoption of Data Warehouse Technology: A Study of the Banking Industry in Taiwan. Decision Support Systems, 37 , 1-21. IEEE. (1990). Standard Glossary of Software Engineering Terminology (IEEE STD 610.12). New York: Institute of Electrical and Electronics Engineers, Inc. Inmon, W. (1992). Building the Data Warehouse. Indianapolis: John Wiley and Sons, Inc. Inmon, W. (2005). Building the Data Warehouse, 4th edition. Indianapolis: Wiley Publishing, Inc. Jashapara, A. (2004). Knowledge Management: An Integrated Approach. Harlow: Finance Times Prentice Hall. Kaplan, R., & Norton, D. (1992). The Balanced Scorecard - Measure that Drive Performance. Harvard Business Review, 70 , (1), 71-79. Kaula, R. (2009). Business Rules for Data Warehouse. International Journal of Information Technology, 5 , 58-66. Kaye, D. (1996). An Information Model of Organization. Managing Information, 3 , (6),19-21. Kimball, R. (1996). The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses. New York: John Wiley & Sons, Inc. Kimball, R., & Caserta, J. (2004). The Data Warehouse ETL Toolkit. Indianapolis: Wiley Publishing, Inc. Kimball, R., Ross, M., Thornthwaite, W., Mundy, J., & Becker, B. (2008). The Data Warehouse Lifecycle Toolkit, 2nd Edition. Indianapolis: John Wiley. - 100 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Kimball, R., Ross, M., Thornthwaite, W., Mundy, J., & Becker, B. (2008). The Data Warehouse Lifecycle Toolkit, 2nd Edition. Indianapolis: Wiley Publishing, Inc. Klimko, G. (2001). Knowledge Management and Maturity Models: Building Common Understanding . Proceedings of the 2nd European Conference on Knowledge Management, (pp. 269-278). Bled, Slovenia. Kraemer, K., & King, J. (1988). Computer-Based Systems for Cooperative Work and Group Decision Making. ACM Computing Surveys , (2), 115-146. Kuhn, T. (1974). Second Thoughts on Paradigms. In F. Suppe, The Structure of Scientific Theories. Urbana: The University of Illinois Press. Lewis, J. (2001). Project Planning, Scheduling and Control, 3rd Edition. New York: McGraw-Hill. Loshin, D. (2003). Business Intelligence: the Savvy Manager's Guide. San Francisco: Morgan Kaufmann Publishers. Loshin, D. (2003). Business Intelligence: The Savvy Manager's Guide. San Francisco: Morgan Kaufmann Publishers. Luhn, H. (1958). A Business Intelligence System. IBM Journal of Research and Development, 2 , (4), 314-319. Madden, S. (2006). Rethinking Database Appliances. Retrieved July 21, 2010, from Information Management: http://www.information-management.com/specialreports/20061024/1066827-1.html?pg=1 March, S., & Hevner, A. (2007). Integrated Decision Support Systems: A Data Warehousing Perspective. Decision Support Systems, 43 , (3), 1031-1043. Moody, D., & Kortink, M. (2000). From Enterprise Models to Dimensional Models: A Methodology for Data Warehouse and Data Mart Design. Proceedings of the International Workshop on Design and Management of Data Warehouses, (pp. 1-12). Stockholm. Moss, L., & Atre, S. (2003). Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications. Boston: Addison Wesley. Mukherjee, D., & D'Souza, D. (2003). Think Phased Implementation for Successful Data. Information Systems Management, 20 , (2), 82-90. Munoz, L., Mazon, J., Pardillo, J., & Trujillo, J. (2008). Modelling ETL Processes of Data Warehouses with UML Activity Diagrams. Proceedings of the OTM Workshops (pp. 44-53). Monterrey, Mexico: Springer. Murtaza, A. (1998). A Framework for Developing Enterprise Data Warehouses. Information Systems Management, 15 , (4), 21-26. Nagabhushana, S. (2006). Data Warehousing. OLAP and Data Mining. New Delhi: New Age International Limited. Navathe, S. B. (1992). Evolution of Data Modelling for Databases. Communications of the ACM, 35 , (9), 112-123. Negash, S., & Gray, P. (2003). Business Intelligence. Proceedings of the 9th Americas Conference on Information Systems, (pp. 3190-3199). Tampa, Florida, USA. Niessink, F., & van Vliet, H. (2000). Software Maintenance from a Service Perspective. Journal of Software Maintenance: Research and Practice, 12 , (2), 103-120. - 101 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Niessink, F., & van Vliet, H. (1999). The IT Service Capability Maturity Model (IR-463). Amsterdam: Division of Mathematics and Computing Science, Vrije Universiteit. Nolan, R. (1973). Managing the Computer Resource: A Stage Hypothesis. Communications of the ACM, 16 , (7), 399-405. Nonaka, I. (1991). The Knowledge-Creating Company. Harvard Business Review, 79 , (6), 96-104. O'Reilly, C. (1980). Individuals and Information Overload in Organizations: Is More Necessarily Better? Academy of Management Journal, 23 , (4), 684-696. Paulk, M., Weber, C., Curtis, B., & Chrissis, M. (1995). The Capability Maturity Model: Guidelines for Improving the Software Process. Boston: MA: Addison-Wesley. Ponniah, P. (2001). Data Warehousing Fundamentals. New York: John Wiley & Sons, Inc. Porter, M. (1985). Competitive Advantage. New York: The New Press. Power, D. (2003). A Brief History of Decision Support Systems. Retrieved June 30, 2010, from Decision Support Systems Resources: http://dssresources.com/history/dsshistoryv28.html Prakash, N., & Gosain, A. (2008). An Approach to Engineering the Requirements of Data Warehouses. Requirements Engineering, 13 , (1), 49-72. Rahm, E., & Hai Do, H. (2000). Data Cleaning: Problems and Current Approaches. Bulletin of the Technical Committee on Data Engineering, 23 , (4), 3-13. Rangaswamy, A., & Shell, G. (1997). Using Computers to Realize Joint Gains in Negotiations: Toward an ‗Electronic Bargaining Table‘. Management Science, 43 , (8), 1147-1163. Royce, W. (1970). Managing the Development of Large Software Systems. Proceedings of the Western Electronic Show and Convention (WesCon). Los Angeles. Runeson, P., & Host, M. (2009). Guidelines for Conducting and Reporting Case Study Research in Software Engineering. Empirical Software Engineering, 14 , (2), 131-164. Rus, I., & Lindvall, M. (2002). Knowledge Management in Software Engineering. IEEE Software, 19 , (3), 26-38. Salle, M. (2004). IT Service Management and IT Governance: Review, Comparative. Retrieved July 16, 2010, from HP Technical Reports: http://www.hpl.hp.com/techreports/2004/HPL-2004-98.pdf Schneidewind, N. (1987). The State of Maintenance. IEEE Transactions on Software Engineering, 13 , (3), 303-310. Schwaninger, M. (2001). Intelligent Organizations: An Integrative Framework. SystemsResearch and Behavioral Science, 18 , 137-158. Sen, A., & Sinha, A. (2005). A Comparison of Data Warehousing Methodologies. Communications of the ACM, 48 , (3), 79-84. Seufert, A., & Schiefer, J. (2005). Enhanced Business Intelligence - Supporting Business Processes with Real-Time Business Analytics. Proceedings of the 16th International Workshop on Database and Expert Systems Applications, (pp. 919-925). Copenhagen, Denmark. - 102 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Shankaranarayanan, G., & Even, A. (2004). Managing Metadata in Data Warehouses: Pitfalls and Possibilities. Communications of the Association for Information Systems, 14 , 247-274. Simitsis, A. (2004). Modelling and Optimization of ETL Processes in Data Warehouse Environments. Athens: National Technical University of Athens. Simitsis, A., Vassiliadis, P., & Sellis, T. (2005). Optimizing ETL Processes in Data Warehouses. Proceedings of the 21st International Conference on Data Engineering (pp. 564-575). Tokyo, Japan: IEEE Computer Science. Simitsis, A., Vassiliadis, P., & Sellis, T. (2005). State-Space Optimization of ETL Workflows. IEEE Transactions on Knowledge and Data Engineering, 17 , (10), 1404-1419. Simsion, G. C., & Witt, G. C. (2005). Data Modelling Essential, 3rd Edition. San Francisco: Morgan Kaufmann Publishers. Solomon, M. (2005). Ensuring a Successful Data Warehouse Initiative. Information Systems Management, 22 , (1), 26-36. Sommerville, I. (2007). Software Engineering, 8th Edition. Harlow: Addison-Wesley. Thomas, J. (2001). Business Intelligence - Why? eAI Journal , 47-49. Tijsen, R., Spruit, M., van Raaij, B., & van de Ridder, M. (2009). BI-FIT: The Fit between Business Intelligence, End-Users, Tasks and Technologies. Utrecht: Utrecht University. Tremblay, M., Fuller, R., Berndt, D., Studnicki, & J. (2007). Doing More with More Information: Changing Healthcare Planning. Decision Support Systems, 43 , 1305-1320. Tryfona, N., Busborg, F., & Christiansen, J. (1999). Data Warehousing and OLAP. Proceedings of the 2nd ACM international workshop on Data warehousing and OLAP, (pp. 3-8). Kansas City, Missouri, United States . Turban, E., Aronson, J., Liang, T., & Sharda, R. (2007). Business Intelligence and Decision Support Systems. New Jersey: Pearson Education International. Vaishnavi, V., & Kuechler, W. (2008). Design Science Research Methods and Patters: Innovating Information and Communication Technology. Boca Raton, Florida: Auerbach Publications Taylor & Francis Group. van Bon, J. (2007). IT Service Management: An Introduction. Zaltbommel: Van Haren Publishing. van Bon, J. (2000). World Class IT Service Management Guide . The Hague: ten Hagen & Stam Publishers. Vanichayobon, S., & Gruenwald, L. (2004). Indexing Techniques for Data Warehouses’ Queries. Retrieved July 3, 2010, from Univerisyt of Oklahoma Database: http://www.cs.ou.edu/~database/documents/vg99.pdf Varga, M., & Vukovic, M. (2008). Feasability of Investment in Business Analytics. Journal of Information and Organizational Sciences, 31 , (2), 50-62. Vitt, E., & Luckevich, M. &. (2002). Business Intelligence: Making Better Decisions Faster. Redmond: Microsoft Press. Walker, D. (2006). Overview Architecture for Enterprise Data Warehouses. Retrieved July 23, 2010, from Data Management & Warehousing : http://www.datamgmt.com/ - 103 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Watson, H., Ariyachandra, T., & Matyska, R. (2001). Data Warehousing Stages of Growth. Information Systems Management, 18 , (3), 42-50. Winter, R., & Stauch, B. (2003). A Method for Demand-driven Information Requirements Analysis in Data Warehousing Projects. Proceedings of the 36th Hawaii International Conference on System Sciences. Big Island: IEEE Computer Society. Wixom, B., & Watson, H. (2001). An Empirical Investigation of the Factors Affecting Data Warehousing Success. MIS Quarterly, 25 , (1). Yin, R. (2009). Case Study Research Design and Methods. Thousand Oaks, California: SAGE Inc. Young, C. (2004). An Introduction to IT Service Management. Research Note, COM-10-8287 . Zeithaml, V., & Bitner, M. (1996). Service Marketing. New York: McGraw-Hill. Zins, C. (2007). Conceptual Approaches for Defining Data, Information, and Knowledge. Journal of the American Society for Information Science and Technology, 58 , (4), 479-493. - 104 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Appendix A: DW Detailed Maturity Matrix Initial (1) DW Technical Solution Architecture Repeatable (2) Defined (3) Managed (4) Optimized (5) Desktop data marts (e.g.: Excel sheets) Multiple independent data marts Multiple independent data warehouses A single, central DW with multiple data marts (Inmon) or conformed data marts (Kimball) A DW/BI service that federates a central enterprise DW and other data sources via a standard interface No business rules defined or implemented Few business rules defined or implemented Some business rules defined or implemented Most business rules defined or implemented All business rules defined or implemented No metadata management Non-integrated by solution Central metadata repository separated by tools Central up-to-date metadata repository Web-accessed central metadata repository with integrated, standardized, up-to-date metadata No security implemented Authentication security Independent authorization for each tool Role-level security at database level Integrated companywide authorization security CSVs files Operational databases ERP and CRM systems; XML files Unstructured data sources (e.g.: text or documents) Various types of unstructured data sources (e.g.: images, videos) and Web data sources No methods to increase performance Software performance tuning (e.g.: index management, parallelizing and partitioning system, views materialization) Shared OLTP systems and DW environment Hardware performance tuning (e.g.: DW server) Software hardware tuning and Specialized appliances DW Separate OLTP systems and DW environment Separate servers for OLTP systems, DW, ETL and BI applications Specialized appliances DW Weekly update Daily update Inter-daily update Real-time update Desktop platform Monthly update or less often Initial (1) metadata Repeatable (2) Data Modelling Defined (3) Managed (4) Optimized (5) - 105 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu No data modelling tool Data modelling tools used only for design Data modelling used also maintenance tools for Standardized data modelling tool used for design Standardized data modelling tool used for design and maintaining metadata No synchronization between data models Manual synchronization of some of the data models Manual or automatic synchronization depending on the data models Automatic synchronization of most of the data models Automatic synchronization of all the data models No between levels differentiation data models Logical and physical levels designed for some data models Logical and physical levels designed for all the data models Conceptual level also designed for some data models All data models have conceptual, logical and physical levels designed No standards defined or implemented for data models Enterprise-wide standards defined or implemented for some of the data models Standardized documentation for some of the data models Enterprise-wide standards defined or implemented for most of the data models Standardized documentation for most of the data models Enterprise-wide standards defined or implemented for all the data models Standardized documentation for all the data models Very few fact tables have their granularity at the lowest level possible Solution-dependent standards defined or implemented for some of the data models Non standardized documentation for some of the data models Few fact tables have their granularity at the lowest level possible Some fact tables have their granularity at the lowest level possible Most fact tables have their granularity at the lowest level possible All fact tables have their granularity at the lowest level possible No dimensions conformed Conformed dimensions for few business processes Conformed dimensions for some business processes Enterprise-wide standardized conformed dimensions for most business processes; also making use of a high level design technique such as an enterprise bus matrix Enterprise-wide standardized conformed dimensions for all business processes Few dimensions designed; no hierarchies or surrogate keys designed Some dimensions designed with surrogate keys and basic hierarchies Most dimensions designed with surrogate keys and complex hierarchies Slowly changing dimensions techniques (i.e.: type 2, 3 and more) also designed Besides regular dimensions and slowly changing dimensions, special dimensions are also designed (e.g.: mini, monster, junk dimensions) Initial (1) Repeatable (2) Managed (4) Optimized (5) No documentation for any data models ETL Defined (3) Only hand-coded ETL Hand-coded ETL and some standard scripts ETL tool(s) for all the ETL design and generation Simple ETL that just extracts and loads data into the data warehouse Basic ETL with simple transformations such as: format changes, sorting, filtering, joining, deriving new calculated values, aggregation, etc and surrogate key generator Advanced ETL capabilities: slowly changing dimensions manager, reusability, change data capture system, de-duplication and matching system, Standardized ETL tool and some standard scripts for better performance More advanced ETL capabilities: error event table creation, audit dimension creation, late arriving data handler, hierarchy manager, Complete generated metadata ETL from Optimized ETL for a real time DW (realtime ETL capabilities) - 106 - DWCMM: The Data Warehouse Capability Maturity Model data quality system Catalina Sacu Daily automation: yes/no; Specific data quality tools: yes/no; Identifying data quality issues: yes; Solving data quality issues: no Manual restart and recovery system: yes; Simple monitoring: yes; Advanced monitoring: yes; Real-time monitoring: no special dimensions manager Daily automation: yes/no; Specific data quality tools: yes/no; Identifying data quality issues: yes; Solving data quality issues: yes Manual and automatic restart and recovery system: yes; Simple monitoring: yes; Advanced monitoring: yes; Real-time monitoring: no Few standards defined or implemented for ETL Some standards defined or implemented for ETL Most standards defined or implemented for ETL Business and technical metadata for some ETL Business and technical metadata for all ETL Process metadata is also managed for some ETL Daily automation: no; Specific data quality tools: no; Identifying data quality issues: no; Solving data quality issues: no Daily automation: no; Specific data quality tools: no; Identifying data quality issues: yes; Solving data quality issues: no Restart and recovery system: no; Simple monitoring: no; Advanced monitoring: no; Real-time monitoring: no Restart and recovery system: no; Simple monitoring: yes; Advanced monitoring: no; Real-time monitoring: no No standards No metadata management BI Applications Defined (3) Daily automation: yes; Specific data quality tools: yes; Identifying data quality issues: yes; Solving data quality issues: yes Completely automatic restart and recovery system: yes; Simple monitoring: yes; Advanced monitoring: yes; Real-time monitoring: yes All the standards defined or implemented for ETL All types of metadata are managed for all ETL Initial (1) Repeatable (2) Managed (4) Optimized (5) Static and parameterdriven reports and query applications Ad-hoc reporting; online analytical processing (OLAP) Visualization techniques: dashboards and scorecards Predictive analytics: data and text mining; alerts Closed loop BI applications; real-time BI applications BI tool related to the data mart More than two tools for main stream BI (i.e.: reporting and visualization applications) One tool recommended for main stream BI, but each department can use their own tool One tool for main stream BI, but each department can use their own tool for specific BI applications (e.g.: data mining, financial analysis, etc.) One tool for main stream BI and one tool for specific BI applications No standards Few standards defined or implemented for BI applications Some standards defined or implemented for BI applications Most standards defined or implemented for BI applications All the standards defined or implemented for BI applications Objects defined for every BI application Some reusable objects for similar BI applications Some standard objects and templates for similar BI applications Most similar applications standard objects templates All similar BI applications use standard objects and templates BI use and - 107 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Reports are delivered manually on paper or by email Reports are delivered automatically by email Direct interface tool-based A BI portal with basic functions: subscriptions , discussions forum, alerting No metadata available Some incomplete metadata documents that users ask for periodically Complete up-to-date metadata documents sent to users periodically or available on the intranet Metadata is always available through a metadata management tool, different from the BI tool Initial (1) DW Organization & Processes Development Processes Repeatable (2) Defined (3) Managed (4) Ad-hoc development processes; no clearly defined development phases (i.e.: planning, requirements definition, design, construction, deployment, maintenance) Repeatable development processes based on experience with similar projects; some development phases clearly separated Standard documented development processes; iterative and incremental development processes with all the development phases clearly separated Development processes continuously measured against well-defined and consistent goals No separation between environments Two separate environments (i.e.: usually development and production) with manual transfer between them Some separation between environments (i.e.: at least three environments) with manual transfer between them Some separation between environments (i.e.: at least two environments) with automatic transfer between them No standards defined or implemented Few standards defined or implemented Some standards defined or implemented A lot of the standards defined or implemented No quality activities Ad-hoc quality assurance activities Standardized and documented quality assurance activities done for all the development phases Level 3) + measurable and prioritized goals for managing the DW quality (e.g.: functionality, reliability, maintainability, usability) Chief information officer (CIO) or an IT director Single sponsor from a business unit or department Multiple individual sponsors from multiple business units or assurance No project sponsor Highly interactive, business process oriented, up-to-date portal (no differentiation between operational and BI portals) Complete integration of metadata with the BI applications (metadata can be accessed through one button push on the attributes, etc.) Optimized (5) Continuous development process improvement by identifying weaknesses and strengthen the process proactively, with the goal of preventing the occurrence of defects All the environments are distinct with automatic transfer between them A comprehensive set of standards defined or implemented Level 4) + causal analysis meetings to identify common defect causes and subsequent elimination of these causes; service quality management certification Multiple levels of business-driven, crossdepartmental - 108 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu departments sponsorship including top level management sponsorship (BI/DW is integrated in the company process with continuous budget) Project planning and scheduling; project risk management; project tracking and control; standard and efficient procedure and documentation; evaluation and assessment Level 4) + periodic evaluation and assessment of roles (i.e.: assess the performance of the roles and match the needed roles with responsibilities and tasks) Continuously improving interorganizational knowledge sharing No project management activities Project planning scheduling and Some of the main project management activities (project planning and scheduling; project risk management; project tracking and control) Some project management activities; standard and efficient procedure and documentation No formal roles defined Defined roles, but not technically implemented Formalized and implemented roles and responsibilities Level 3) + periodic peer reviews (i.e.: review of each other‘s work) Ad-hoc knowledge gathering and sharing Organized knowledge sharing through written documentation and technology (e.g.: knowledge databases, intranets, wikis, etc.), and also through training and mentoring programs Knowledge management is standardized; knowledge creation and sharing through brainstorming, training and mentoring programs, and also through the use of technology Central business unit knowledge management; quantitative knowledge management control and periodic knowledge gap analysis Ad-hoc requirements definition; no methodology used Methodologies differ from project to project; interviews with business users for collecting the requirements Level 3) + qualitative assessment and measurement of the phase; requirements document also published Level 4) + causal analysis meetings to identify common bottlenecks causes and subsequent elimination of these causes Only unit testing is done; no standards or documentation Other types of testing are beginning to be done (some of the following: unit testing by another person; system integration testing; regression testing; Standard methodology for all the projects; interviews and group sessions with both business and IT users for collecting the requirements Diverse types of testing; some standards Diverse types of testing; standard procedure and documentation All the main types testing (unit testing by another person; system integration testing; regression testing; acceptance testing); - 109 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu acceptance testing) Initial (1) user training; standard procedure and documentation; external assessments and reviews Repeatable (2) Service Processes Defined (3) Managed (4) Optimized (5) level 4) + causal analysis meetings to identify common defect causes and subsequent elimination of these causes; service quality management certification Continuously improving interorganizational knowledge management No service quality management activities Ad-hoc service management quality Proactive service quality management including a standard procedure level 3) + service quality measurements periodically compared to the established goals to determine the deviations and their causes Ad-hoc knowledge gathering and sharing Organized knowledge sharing through written documentation and technology (e.g.: knowledge databases, intranets, wikis, etc.) Knowledge management is standardized; knowledge creation and sharing through brainstorming, training and mentoring programs Customer and suppliers service needs documented in an ad-hoc manner; no service catalogue compiled Some customer and supplier service needs documented and formalized based on previous experience All the customer and supplier service needs documented and formalized according to a standard procedure into service level agreements (SLAs) Central business unit knowledge management; quantitative knowledge management control and periodic knowledge gap analysis SLAs reviewed with the customer and supplier on both a periodic and event-driven basis Incident management is done ad-hoc with no specialized ticket handling system or service desk to assess and classify them prior to referring them to a specialist A ticket handling system is used for incident management and some procedures are followed, but nothing is standardized or documented A service desk is the recognized point of contact for all the customer queries; incidents assessment and classification is done following a standard procedure Change A ticket handling system is A standard procedure is requests are Level 3) + standard reports concerning the incident status including measurements and goals (e.g.: response time) are regularly produced for all the involved teams and customers; an incident management database is established as a repository for the event records Level 3) + standard Actual service delivery continuously monitored and evaluated with the customer on both a periodic and eventdriven basis for continuous improvement (SLAs including penalties) Level 4) + trend analysis in incident occurrence and also in customer satisfaction and value perception of the services provided to them Level 4) - 110 - + trend DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu made and solved in an ad-hoc manner used for storing and solving the requests for change and some procedures are followed, but nothing is standardized or documented used for approving, verifying, prioritizing and scheduling changes reports concerning the change status including measurements and goals (e.g.: response time) are regularly produced for all the involved teams and customers; standards established for documenting changes Level 3) + standard reports concerning performance and resource management including measurements and goals are done on a regular basis Ad-hoc resource management activities (only when there is a problem) Resource management is done following some procedures, but nothing is standardized or documented Resource management is done constantly following a standardized documented procedure Ad-hoc availability management Availability management is done following some procedures, but nothing is standardized or documented Availability management documented and done using a standardized procedure (all elements are monitored) Level 3) + risk assessment to determine the critical elements and possible problems Ad-hoc changes solving and implementation; no release naming and numbering conventions Release management is done following some procedures, but nothing is standardized or documented; release naming and numbering conventions Release management is documented and done following a standardized procedure; assigned release management roles and responsibilities Level 3) + standard reports concerning release management including measurements and goals are done on a regular basis; master copies of all software in a release secured in a release database analysis and statistics regarding change occurrence, success rate, customer satisfaction and value perception of the services provided to them Level 4) + resource management trend analysis and monitoring to determine the most common bottlenecks and make sure that there is sufficient capacity to support planned services Level 4) + availability management trend analysis and planning to determine the most common bottlenecks and make sure that all the elements are available for the agreed service level targets Level 4) + release management trend analysis, statistics and planning - 111 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Appendix B: The DW Maturity Assessment Questionnaire (Final Version) Data Warehouse (DW) Maturity Assessment Questionnaire The filling in of the questionnaire will take approximately 50 minutes and in the end a maturity score for each benchmark category/sub-category and an overall maturity score will be provided. The questions from the first part of the questionnaire (i.e.: DW General Questions) are not scored and their answers will serve as input for shaping a better image of the DW solution maturity. The questions from the second and third part of the questionnaire (i.e.: DW technical solution; and DW organization and processes) are scored from 1 to 5 and they are multiple choice questions with only one possible answer (except questions 3.1 – 11 and 3.2 – 1 where more answers may be circled). 1 DW General Questions 1) Could you elaborate on the main drivers for implementing a BI/DW solution in your organization? 2) How long has your organization been using BI/DW? 3) Could you elaborate on the success of the BI/DW solution in your organization, in terms of: a) Returns vs. Costs b) Time (Intended vs. Actual) c) Quality d) End-user adoption. 4) Which answer best describes how executives perceive the purpose of your organization‘s BI/DW environment? a) Operational cost center – An IT system needed to run the business b) Tactical resource - Tools to assist decision making c) Mission-critical resource - A system that is critical to running business operations d) Strategic resource – Key to achieving performance objectives and goals e) Competitive differentiator – Key to gaining or keeping customers and/or market share. 5) What percentage of the annual IT budget for your organization does the BI/DW budget represent? 6) What percentage of the IT department is taking care of BI (i.e.: how many people from the total number of IT employees)? 7) Who is the budget owner of the BI/DW solution in your organization (i.e.: who is responsible for paying the invoice)? 8) Which technologies do you use for developing the BI/DW solution in your organization? Developing Category Technology Data Modelling Extract/Transform/Load (ETL) BI Applications Database - 112 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu 9) What data modelling technique do you use for your BI/DW solution (e.g.: dimensional modelling, normalized modelling, data vault, etc.)? 2 DW Technical Solution 2.1 General Architecture and Infrastructure 1) What is the predominant architecture of your DW? a) Multiple independent data marts b) A virtual integrated DW or real-time DW c) Multiple independent data warehouses d) A single, central DW with multiple data marts (Inmon) or conformed data marts (Kimball) e) Desktop data marts (e.g.: Excel sheets) 2) To what degree have you defined and documented definitions and business rules for the necessary transformations, key terms and metrics? a) No business rules defined b) Most of the business rules defined and documented c) Few business rules defined and documented d) All business rules defined and documented e) Some business rules defined and documented 3) To what degree have you implemented definitions and business rules for the necessary transformations, key terms and metrics? a) No business rules implemented b) Most of the business rules implemented c) Few business rules implemented d) All business rules implemented e) Some business rules implemented 4) To what degree is your metadata management implemented? a) Web-accessed central metadata repository with integrated, standardized, up-to-date metadata b) Non-integrated metadata by solution c) Central up-to-date metadata repository d) No metadata management e) Central metadata repository separated by tools 5) To what degree is security implemented in your DW architecture? a) No security implemented b) Integrated company wide security c) Independent authorization for each tool d) Authentication security e) Role-level security at database level 6) What types of data sources does your DW support at the highest level? a) CSVs files b) Operational databases c) ERP and CRM systems; XML files - 113 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu d) Unstructured data sources (e.g.: text or documents) e) Various types of unstructured data sources (e.g.: images, videos) and Web data sources 7) To what degree do you use methods to increase the performance of your DW? a) Specialized DW appliances (e.g.: Netezza, Teradata) or cloud computing b) No methods to increase performance c) Software performance tuning (e.g.: index management, parallelizing and partitioning system, views materialization) d) Hardware performance tuning (e.g.: DW server) e) Software and hardware tuning 8) To what degree is your infrastructure specialized for a DW? a) Desktop platform b) Specialized DW appliances (e.g.: Netezza, Teradata) c) Separate OLTP systems and DW environment d) Separate servers for OLTP systems, DW, ETL and BI applications e) Shared OLTP systems and DW environment 9) Which answer best describes the update frequency for your DW? a) Daily update b) Monthly update or less often c) Real-time update d) Inter-daily update e) Weekly update 2.2 Data Modelling 1) Which answer best describes the usage of a data modelling tool in your organization? a) No data modelling tool b) Scattered data modelling tools used only for design c) Standardized data modelling tool used for design and maintaining metadata d) Standardized data modelling tool used only for design e) Scattered data modelling tools used also for maintenance 2) Which answer best describes the degree of synchronization between the following data models that your organization maintains and the mapping between them: ETL source and target models; DW and data marts models; BI semantic or query object models? a) Automatic synchronization of all of the data models b) Manual synchronization of some of the data models c) No synchronization between data models d) Manual or automatic synchronization depending on the data models e) Automatic synchronization of most of the data models 3) To what degree do you differentiate between data models levels: physical, logical and conceptual? a) No differentiation between data models levels b) All data models have conceptual, logical and physical levels designed c) Logical and physical levels designed for some data models d) Conceptual level also designed for some data models e) Logical and physical levels designed for all the data models - 114 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu 4) To what degree have you defined and documented standards (e.g.: naming conventions, metadata, etc.) for your data models? a) No standards defined for data models b) Enterprise-wide standards defined for some of the data models c) Enterprise-wide standards defined for most of the data models d) Solution-dependent standards defined for some of the data models e) Enterprise-wide standards defined for all the data models 5) To what degree have you implemented standards (e.g.: naming conventions, metadata, etc.) for your data models? a) No standards implemented for data models b) Enterprise-wide standards implemented for some of the data models c) Enterprise-wide standards implemented for most of the data models d) Solution-dependent standards implemented for some of the data models e) Enterprise-wide standards implemented for all the data models 6) To what degree have you documented the metadata (e.g.: definitions, business rules, main values, data quality, etc.) in your data models? a) No documentation for any data models b) Standardized documentation for some of the data models c) Standardized documentation for all the data models d) Non standardized documentation for some of the data models e) Standardized documentation for most of the data models If you use dimensional modelling, please answer the following three questions: 7) What percentage of all your fact tables has their granularity at the lowest level possible? a) Very few fact tables have their granularity at the lowest level possible b) Few fact tables have their granularity at the lowest level possible c) Some fact tables have their granularity at the lowest level possible d) Most fact tables have their granularity at the lowest level possible e) All fact tables have their granularity at the lowest level possible 8) To what degree do you design conformed dimensions in your data models? a) No conformed dimensions b) Conformed dimensions for few business processes c) Enterprise-wide standardized conformed dimensions for most business processes; also making use of a high level design technique such as an enterprise bus matrix d) Conformed dimensions for some business processes e) Enterprise-wide standardized conformed dimensions for all business processes 9) Which answer best describes the current state of your dimension tables modelling? a) Few dimensions designed; no hierarchies or surrogate keys designed b) Some dimensions designed with surrogate keys and basic hierarchies (if needed) c) Most dimensions designed with surrogate keys and basic/complex hierarchies (if needed) d) Slowly changing dimensions techniques (i.e.: type 2, 3 and more) also designed e) Besides regular dimensions, special dimensions are also designed (e.g.: mini, monster, junk dimensions) - 115 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu 2.3 ETL 1) Which answer best describes the usage of an ETL tool in your organization? a) Only hand-coded ETL b) Complete ETL generated from metadata c) Hand-coded ETL and some standard scripts d) Standardized ETL tool and some standard scripts e) ETL tool(s) for all the ETL design and generation 2) Which answer best describes the complexity of your ETL? a) Simple ETL that just extracts and loads data into the data warehouse b) Basic ETL with simple transformations such as: format changes, sorting, filtering, joining, deriving new calculated values, aggregation, etc and surrogate key generator c) Advanced ETL capabilities: slowly changing dimensions manager, reusability, change data capture system, de-duplication and matching system, data quality system d) More advanced ETL capabilities: error event table creation, audit dimension creation, late arriving data handler, hierarchy manager, special dimensions manager e) Optimized ETL for a real time DW (real-time ETL capabilities) 3) Which answer best describes the data quality system implemented for your ETL? a) Daily automation: yes / no; Specific data quality tools: yes / no; Identifying data quality issues: Solving data quality issues: no b) Daily automation: no; Specific data quality tools: no; Identifying data quality issues: yes; Solving quality issues: no c) Daily automation: yes / no; Specific data quality tools: yes / no; Identifying data quality issues: Solving data quality issues: yes d) Daily automation: no; Specific data quality tools: no; Identifying data quality issues: no; Solving quality issues: no e) Daily automation: yes; Specific data quality tools: yes; Identifying data quality issues: yes; Solving quality issues: yes yes; data yes; data data 4) Which answer best describes the management and monitoring of your ETL? (Definitions: Simple monitoring (i.e.: ETL workflow monitor – statistics regarding ETL execution such as pending, running, completed and suspended jobs; MB processed per second; summaries of errors, etc.); Advanced monitoring (i.e.: ETL workflow monitor – statistics on infrastructure performance like CPU usage, memory allocation, database performance, server utilization during ETL; job scheduler – time or event based ETL execution, events notification; data lineage and analyzer system)) a) Restart and recovery system: no; Simple monitoring: no; Advanced monitoring: no; Real-time monitoring: no b) Restart and recovery system: no; Simple monitoring: yes; Advanced monitoring: no; Real-time monitoring: no c) Manual restart and recovery system: yes; Simple monitoring: yes; Advanced monitoring: yes / no; Realtime monitoring: no d) Manual and automatic restart and recovery system: yes; Simple monitoring: yes; Advanced monitoring: yes / no; Real-time monitoring: no e) Completely automatic restart and recovery system: yes; Simple monitoring: yes; Advanced monitoring: yes; Real-time monitoring: yes - 116 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu 5) To what degree have you defined and documented standards (e.g.: naming conventions, set-up standards, recovery process, etc.) for your ETL? a) No standards defined b) Few standards defined for ETL c) Some standards defined for ETL d) Most standards defined for ETL e) All the standards defined for ETL 6) To what degree have you implemented standards (e.g.: naming conventions, set-up standards, recovery process, etc.) for your ETL? a) No standards implemented b) Few standards implemented for ETL c) Some standards implemented for ETL d) Most standards implemented for ETL e) All the standards implemented for ETL 7) To what degree is your metadata management implemented for your ETL? a) No metadata management b) Business and technical metadata for some ETL c) All types of metadata (i.e.: business, technical, process) are managed for all ETL d) Process metadata is also managed for some ETL e) Business and technical metadata for all ETL 2.4 BI Applications 1) Which types of BI applications best describe the highest level purpose of your DW environment? a) Static and parameter-driven reports and query applications b) Ad-hoc reporting; online analytical processing (OLAP) c) Visualization techniques: dashboards and scorecards d) Predictive analytics: data and text mining; alerts e) Closed-loop BI applications; real-time BI applications 2) Which answer best describes your current BI tool usage? (Definitions: main stream BI applications (i.e.: reporting and visualization applications); specific BI applications (i.e.: data mining, financial analysis, etc.)) a) b) c) d) e) One standardized tool for main stream BI and one standardized tool for specific BI applications BI tool related to the data mart One tool recommended for main stream BI, but each department can use their own tool More than two tools for main stream BI One standardized tool for main stream BI, but each department can use their own tool for specific BI applications 3) To what degree have you defined and documented standards (e.g.: naming conventions, generic transformations, logical structure of attributes and measures) for your BI applications? a) No standards defined b) Few standards defined for BI applications c) Some standards defined for BI applications - 117 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu d) Most standards defined for BI applications e) All the standards defined for BI applications 4) To what degree have you implemented standards (e.g.: naming conventions, generic transformations, logical structure of attributes and measures) for your BI applications? a) No standards implemented b) Few standards implemented for BI applications c) Some standards implemented for BI applications d) Most standards implemented for BI applications e) All the standards implemented for BI applications 5) To what degree are standardized objects (e.g.: KPIs, metrics, attributes, templates) implemented in your BI applications? a) Objects defined for every BI application b) All similar BI applications use standard objects and templates c) Some reusable objects for similar BI applications d) Most similar BI applications use standard objects and templates e) Some standard objects and templates for similar BI applications 6) Which BI applications delivery method best describes the highest level purpose of your DW? a) Reports are delivered manually on paper or by email b) Reports are delivered automatically by email c) Direct tool-based interface d) A BI portal with basic functions: subscriptions, discussions forum, alerting e) Highly interactive, business process oriented, up-to-date portal (no differentiation between operational and BI portals) 7) Which answer best describes the metadata accessibility to users? a) No metadata available b) Some incomplete metadata documents that users ask for periodically c) Complete integration of metadata with the BI applications (metadata can be accessed through one button push on the attributes, etc.) d) Complete up-to-date metadata documents sent to users periodically or available on the intranet e) Metadata is always available through a metadata management tool, different from the BI tool 3 DW Organization and Processes 3.1 Development Processes 1) Which answer best describes the DW development processes in your organization? a) Ad-hoc development processes; no clearly defined development phases (i.e.: planning, requirements definition, design, construction, deployment, maintenance) b) Repeatable development processes based on experience with similar projects; some development phases clearly separated c) Standard documented development processes; iterative and incremental development processes with all the development phases clearly separated d) Development processes continuously measured against well-defined and consistent goals - 118 - DWCMM: The Data Warehouse Capability Maturity Model e) Catalina Sacu Continuous development process improvement by identifying weaknesses and strengthen the process proactively, with the goal of preventing the occurrence of defects 2) To what degree is there a separation between the development/test/acceptance/deployment environments in your organization? a) No separation between environments b) Two separate environments (i.e.: usually development and production) with manual transfer between them c) All the environments are distinct with automatic transfer between them d) Some separation between environments (i.e.: at least two environments) with automatic transfer between them e) Some separation between environments (i.e.: at least three environments) with manual transfer between them 3) To what degree has your organization defined and documented standards for developing, testing and deploying DW functionalities (i.e.: ETL and BI applications)? a) No standards defined b) Few standards defined c) Some standards defined d) A lot of the standards defined e) A comprehensive set of standards defined 4) To what degree has your organization implemented standards for developing, testing and deploying DW functionalities (i.e.: ETL and BI applications)? a) No standards implemented b) Few standards implemented c) Some standards implemented d) A lot of the standards implemented e) A comprehensive set of standards implemented 5) Which answer best describes the DW quality management? a) No quality assurance activities b) Ad-hoc quality assurance activities c) Standardized and documented quality assurance activities done for all the development phases d) c) + measurable and prioritized goals for managing the DW quality (e.g.: functionality, reliability, maintainability, usability) e) d) + causal analysis meetings to identify common defect causes and subsequent elimination of these causes; service quality management certification 6) Which answer best describes the sponsor for your DW project? a) Multiple levels of business-driven, cross-departmental sponsorship including top level management sponsorship (BI/DW is integrated in the company process with continuous budget) b) No project sponsor c) Single sponsor from a business unit or department d) Chief information officer (CIO) or an IT director e) Multiple individual sponsors from multiple business units or departments 7) Which answer best describes your DW project management? (Definitions: - 119 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu project planning and scheduling (i.e.: work breakdown structure, time, costs and resources estimates, planning and scheduling; project tracking and control (i.e.: milestone tracking, change control)) a) Project planning and scheduling: no; project risk management: no; project tracking and control: no; standard and efficient procedure and documentation, evaluation and assessment: no b) Project planning and scheduling: yes; project risk management: no; project tracking and control: no; standard and efficient procedure and documentation, evaluation and assessment: no c) Project planning and scheduling: yes; project risk management: no; project tracking and control: yes; standard and efficient procedure and documentation, evaluation and assessment: no d) Project planning and scheduling: yes; project risk management: yes; project tracking and control: yes; standard and efficient procedure and documentation, evaluation and assessment: no e) Project planning and scheduling: yes; project risk management: yes; project tracking and control: yes; standard and efficient procedure and documentation, evaluation and assessment: yes 8) Which answer best describes the role division for the DW development process? a) No formal roles defined b) Defined roles, but not technically implemented c) Formalized and implemented roles and responsibilities d) c) + periodic peer reviews (i.e.: review of each other‘s work) e) d) + periodic evaluation and assessment of roles (i.e.: assess the performance of the roles and match the needed roles with responsibilities and tasks) 9) Which answer best describes the knowledge management in your organization for the DW development processes? a) Ad-hoc knowledge gathering and sharing b) Organized knowledge sharing through written documentation and technology (e.g.: knowledge databases, intranets, wikis, etc.) c) Knowledge management is standardized; knowledge creation and sharing through brainstorming, training and mentoring programs, and also through the use of technology d) Central business unit knowledge management; quantitative knowledge management control and periodic knowledge gap analysis e) Continuously improving inter-organizational knowledge management 10) Which answer best describes the requirements definition phase for your DW project? a) Ad-hoc requirements definition; no methodology used b) Methodologies differ from project to project; interviews with business or IT users for collecting the requirements c) Standard methodology for all the projects; interviews and group sessions with both business and IT users for collecting the requirements d) c) + qualitative assessment and measurement of the phase; requirements document also published e) d) + causal analysis meetings to identify common bottlenecks causes and subsequent elimination of these causes 11) Which of the following activities are included in the testing and acceptance phase for your DW project? a) Unit testing by another person b) System integration testing c) Regression testing - 120 - DWCMM: The Data Warehouse Capability Maturity Model d) e) f) g) Catalina Sacu User training Acceptance testing Standard procedure and documentation for testing and acceptance External assessments and reviews of testing and acceptance 3.2 Service Processes (Maintenance and Monitoring Processes) 1) Which of the following activities are included in the maintenance and monitoring phase for your DW project? a) Collection of statistics regarding the utilization of the hardware and software resources (e.g.: memory management, physical disk storage space utilization, processor usage, BI applications usage, number of completed queries by time slots during the day, time each user stays online with the data warehouse, total number of distinct users per day, etc.) b) BI applications maintenance and monitoring c) User support d) ETL monitoring and management e) data reconciliation and data growth management f) Security administration g) Resource monitoring and management h) Infrastructure management i) Backup and recovery management j) Performance monitoring and tuning 2) Which answer best describes the DW service quality management in your organization? a) No service quality management activities b) Ad-hoc service quality management c) Proactive service quality management including a standard procedure d) c) + service quality measurements periodically compared to the established goals to determine the deviations and their causes e) d) + causal analysis meetings to identify common defect causes and subsequent elimination of these causes; service quality management certification 3) Which answer best describes the knowledge management in your organization for the DW development processes? a) Ad-hoc knowledge gathering and sharing b) Organized knowledge sharing through written documentation and technology (e.g.: knowledge databases, intranets, wikis, etc.) c) Knowledge management is standardized; knowledge creation and sharing through brainstorming, training and mentoring programs, and also through the use of technology d) Central business unit knowledge management; quantitative knowledge management control and periodic knowledge gap analysis e) Continuously improving inter-organizational knowledge management 4) Which answer best describes the DW service level management in your organization? a) Customer and suppliers service needs documented in an ad-hoc manner; no service catalogue compiled b) Some customer and supplier service needs documented and formalized based on previous experience c) All the customer and supplier service needs documented and formalized according to a standard procedure into service level agreements (SLAs) d) SLAs reviewed with the customer and supplier on both a periodic and event-driven basis - 121 - DWCMM: The Data Warehouse Capability Maturity Model e) Catalina Sacu Actual service delivery continuously monitored and evaluated with the customer on both a periodic and event-driven basis for continuous improvement (SLAs including penalties) 5) Which answer best describes the DW incident management in your organization? a) Incident management is done ad-hoc with no specialized ticket handling system or service desk to assess and classify them prior to referring them to a specialist b) A ticket handling system is used for incident management and some procedures are followed, but nothing is standardized or documented c) A service desk is the recognized point of contact for all the customer queries; incidents assessment and classification is done following a standard procedure d) c) + standard reports concerning the incident status including measurements and goals (e.g.: response time) are regularly produced for all the involved teams and customers; an incident management database is established as a repository for the event records e) d) + trend analysis in incident occurrence and also in customer satisfaction and value perception of the services provided to them 6) Which answer best describes the DW change management in your organization? a) Change requests are made and solved in an ad-hoc manner b) A ticket handling system is used for storing and solving the requests for change and some procedures are followed, but nothing is standardized or documented c) A standard procedure is used for approving, verifying, prioritizing and scheduling changes d) c) + standard reports concerning the change status including measurements and goals (e.g.: response time) are regularly produced for all the involved teams and customers; standards established for documenting changes e) d) + trend analysis and statistics regarding change occurrence, success rate, customer satisfaction and value perception of the services provided to them 7) Which answer best describes the DW technical resource management in your organization? a) Ad-hoc resource management activities (only when there is a problem) b) Resource management is done following some procedures, but nothing is standardized or documented c) Resource management is done constantly following a standardized documented procedure d) c) + standard reports concerning performance and resource management including measurements and goals are done on a regular basis e) d) + resource management trend analysis and monitoring to determine the most common bottlenecks and make sure that there is sufficient capacity to support planned services 8) Which answer best describes the availability management in your organization? a) Ad-hoc availability management b) Availability management is done following some procedures, but nothing is standardized or documented c) Availability management documented and done using a standardized procedure (all elements are monitored) d) c) + risk assessment to determine the critical elements and possible problems e) d) + availability management trend analysis and planning to determine the most common bottlenecks and make sure that all the elements are available for the agreed service level targets 9) Which answer best describes the release management in your organization? a) Ad-hoc changes solving and implementation; no release naming and numbering conventions b) Release management is done following some procedures, but nothing is standardized or documented; release naming and numbering conventions - 122 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu c) Release management is documented and done following a standardized procedure; assigned release management roles and responsibilities d) c) + standard reports concerning release management including measurements and goals are done on a regular basis; master copies of all software in a release secured in a release database e) d) + release management trend analysis, statistics and planning - 123 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Appendix C: DW Maturity Assessment Questionnaire (Redefined Version) Data Warehouse (DW) Maturity Assessment Questionnaire The filling in of the questionnaire will take 50 minutes and in the end a maturity score for each benchmark category/sub-category and an overall maturity score will be provided. The questions from the first part of the questionnaire (i.e.: DW General Questions) are not scored and their answers will serve as input for shaping a better image of the DW solution maturity. The questions from the second and third part of the questionnaire (i.e.: DW technical solution; and DW organization and processes) are scored from 1 to 5 and they are multiple choice questions with only one possible answer (except questions 3.1 – 11 and 3.2 – 1 where more answers may be circled). 1 DW General Questions 1) Could you elaborate on the main drivers for implementing a BI/DW solution in your organization? 2) How long has your organization been using BI/DW? 3) Could you elaborate on the success of the BI/DW solution in your organization, in terms of: a) Returns vs. Costs b) Time (Intended vs. Actual) c) Quality d) End-user adoption. 4) Which answer best describes how executives perceive the purpose of your organization‘s BI/DW environment? a) Operational cost center – An IT system needed to run the business b) Tactical resource - Tools to assist decision making c) Mission-critical resource - A system that is critical to running business operations d) Strategic resource – Key to achieving performance objectives and goals e) Competitive differentiator – Key to gaining or keeping customers and/or market share. 5) What percentage of the annual IT budget for your organization does the BI/DW budget represent? 6) Who is the budget owner of the BI/DW solution in your organization (i.e.: who is responsible for paying the invoice)? 7) Which technologies do you use for developing the BI/DW solution in your organization? Developing Category Technology Data Modelling Extract/Transform/Load (ETL) BI Applications Database 8) What data modelling technique do you use for your BI/DW solution? - 124 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu 2 DW Technical Solution 2.1 Architecture/ General Architecture and Infrastructure 1) What is the predominant architecture of your DW? a) Level 1 – Desktop data marts (e.g.: Excel sheets) b) Level 2 – Multiple independent data marts c) Level 3 – Multiple independent data warehouses d) Level 4 – A single, central DW with multiple data marts (Inmon) or conformed data marts (Kimball) e) Level 5 – A virtual integrated DW or real-time DW 2) To what degree have you defined, documented and implemented definitions and business rules for the necessary transformations, key terms and metrics? a) Very low – No business rules defined b) Low – Few business rules defined and implemented c) Moderate – Some business rules defined and implemented d) High – Most of the business rules defined and implemented e) Very high – All business rules defined and implemented 3) To what degree is your metadata management implemented? a) Very low – No metadata management b) Low – Non-integrated metadata by solution c) Moderate – Central metadata repository separated by tools d) High – Central up-to-date metadata repository e) Very high – Web-accessed central metadata repository with integrated, standardized, up-to-date metadata 4) To what degree is security implemented in your DW architecture? a) Very low – No security implemented b) Low – Authentication security c) Moderate – Independent authorization for each tool / Target audience authorization d) High – Role-level security at database level e) Very high – Integrated companywide authorization security 5) What types of data sources does your DW support at the highest level? a) Level 1 – CSVs files b) Level 2 – Operational databases c) Level 3 – ERP and CRM systems; XML files d) Level 4 – Unstructured data sources (e.g.: text or documents) e) Level 5 – Various types of unstructured data sources (e.g.: images, videos) and Web data sources 6) To what degree do you use methods to increase the performance of your DW? a) Very low – No methods to increase performance b) Low – Software performance tuning (e.g.: index management, parallelizing and partitioning system, views materialization) c) Moderate – Hardware performance tuning (e.g.: DW server) d) High – Software and hardware tuning e) Very high – Specialized DW appliances (e.g.: Netezza, Teradata) / cloud computing 7) To what degree is your infrastructure specialized for a DW? - 125 - DWCMM: The Data Warehouse Capability Maturity Model a) b) c) d) e) Catalina Sacu Very low – Desktop platform Low – Shared OLTP systems and DW environment Moderate – Separate OLTP systems and DW environment High – Separate servers for OLTP systems, DW, ETL and BI applications Very high – Specialized DW appliances (e.g.: Netezza, Teradata) 8) Which answer best describes the update frequency for your DW? a) Level 1 – Monthly update or less often b) Level 2 – Weekly update c) Level 3 – Daily update d) Level 4 – Inter-daily update e) Level 5 – Real-time update 2.2 Data Modelling Data quality? 1) Which answer best describes the usage of a data modelling tool in your organization? a) Level 1 – No data modelling tool b) Level 2 – Scattered data modelling tools used only for design c) Level 3 – Scattered data modelling tools used also for maintenance d) Level 4 – Standardized data modelling tool used only for design e) Level 5 – Standardized data modelling tool used for design and maintaining metadata 2) Which answer best describes the degree of synchronization between the following data models that your organization maintains and the mapping between them: ETL source and target models; DW and data marts models; BI semantic or query object models? a) Level 1 – No synchronization between data models b) Level 2 – Manual synchronization of some of the data models c) Level 3 – Manual or automatic synchronization depending on the data models d) Level 4 – Automatic synchronization of most of the data models e) Level 5 – Automatic synchronization of all of the data models 3) To what degree do you differentiate between data models levels: physical, logical and conceptual? a) Very low – No differentiation between data models levels b) Low – Logical and physical levels designed for some data models c) Moderate – Logical and physical levels designed for all the data models d) High – Conceptual level also designed for some data models e) Very high – All data models have conceptual, logical and physical levels designed 4) To what degree have you defined and implemented standards (e.g.: naming conventions, metadata, etc.) for your data models? a) Very low – No standards defined for data models b) Low – Solution-dependent standards defined for some of the data models c) Moderate – Solution-dependent standards defined for most of the data models / Enterprise-wide standards defined for some of the data models d) High – Enterprise-wide standards defined for most of the data models e) Very high – Enterprise-wide standards defined for all the data models - 126 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu 5) To what degree have you documented the metadata (e.g.: definitions, business rules, main values, data quality, etc.) in your data models? a) Very low – No documentation for any data models b) Low – Non standardized documentation for some of the data models c) Moderate – Standardized documentation for some of the data models d) High – Standardized documentation for most of the data models e) Very high – Standardized documentation for all the data models 6) What percentage of all your fact tables has their granularity at the lowest level possible? a) Very low – Very few fact tables have their granularity at the lowest level possible b) Low – Few fact tables have their granularity at the lowest level possible c) Moderate – Some fact tables have their granularity at the lowest level possible d) High – Most fact tables have their granularity at the lowest level possible e) Very high – All fact tables have their granularity at the lowest level possible 7) To what degree do you design conformed dimensions in your data models? a) Very low – No conformed dimensions b) Low – Conformed dimensions for few business processes c) Moderate – Conformed dimensions for some business processes d) High – Enterprise-wide standardized conformed dimensions for most business processes; also making use of a high level design technique such as an enterprise bus matrix e) Very high – Enterprise-wide standardized conformed dimensions for all business processes 8) Which answer best describes the current state of your dimension tables modelling? a) Level 1 – Few dimensions designed; no hierarchies or surrogate keys designed b) Level 2 – Some dimensions designed with surrogate keys and basic hierarchies (if needed) c) Level 3 – Most dimensions designed with surrogate keys and basic/complex hierarchies (if needed) d) Level 4 – Slowly changing dimensions techniques (i.e.: type 2, 3 and more) also designed e) Level 5 – Besides regular dimensions and slowly changing dimensions technique, special dimensions are also designed (e.g.: mini, monster, junk dimensions) 2.3 ETL 1) Which answer best describes the usage of an ETL tool in your organization? a) Level 1 – Only hand-coded ETL b) Level 2 – Hand-coded ETL and some standard scripts c) Level 3 – ETL tool(s) for all the ETL design and generation d) Level 4 – Standardized ETL tool and some standard scripts for better performance e) Level 5 – Complete ETL generated from metadata 2) Which answer best describes the complexity of your ETL? a) Level 1 – Simple ETL that just extracts and loads data into the data warehouse b) Level 2 – Basic ETL with simple transformations such as: format changes, sorting, filtering, joining, deriving new calculated values, aggregation, etc and surrogate key generator c) Level 3 – Advanced ETL capabilities: slowly changing dimensions manager, reusability, change data capture system, de-duplication and matching system, data quality system d) Level 4 – More advanced ETL capabilities: error event table creation, audit dimension creation, late arriving data handler, hierarchy manager, special dimensions manager - 127 - DWCMM: The Data Warehouse Capability Maturity Model e) Catalina Sacu Level 5 – Real-time ETL capabilities (optimization of ETL) / optimized ETL for an agile DW (real-time ETL capabilities) 3) Which answer best describes the data quality system implemented for your ETL? a) Very low – Daily automation: no; Specific data quality tools: no; Identifying data quality issues: no; Solving data quality issues: no b) Low – Daily automation: no; Specific data quality tools: no; Identifying data quality issues: yes; Solving data quality issues: no c) Moderate – Daily automation: yes/no; Specific data quality tools: yes/no; Identifying data quality issues: yes; Solving data quality issues: no d) High – Daily automation: yes/no; Specific data quality tools: yes/no; Identifying data quality issues: yes; Solving data quality issues: yes e) Very high – Daily automation: yes; Specific data quality tools: yes; Identifying data quality issues: yes; Solving data quality issues: yes 4) Which answer best describes the management and monitoring of your ETL? a) Level 1 – Restart and recovery system: no; Simple monitoring (i.e: ETL workflow monitor – statistics regarding ETL execution such as pending, running, completed and suspended jobs; MB processed per second; summaries of errors, etc.): no; Advanced monitoring (ETL workflow monitor – statistics on infrastructure performance like CPU usage, memory allocation, database performance, server utilization during ETL; job scheduler – time or event based ETL execution, events notification; data lineage and analyzer system); Real-time monitoring: no b) Level 2 – Restart and recovery system: no; Simple monitoring: yes; Advanced monitoring: no; Real-time monitoring: no c) Level 3 – Restart and recovery system: no / Manual restart and recovery system: yes; Simple monitoring: yes; Advanced monitoring: yes; Real-time monitoring: no d) Level 4 – Restart and recovery system: yes / Manual and automatic restart and recovery system: yes; Simple monitoring: yes; Advanced monitoring: yes; Real-time monitoring: no e) Level 5 – Restart and recovery system: yes / Completely automatic restart and recovery system: yes; Simple monitoring: yes; Advanced monitoring: yes; Real-time monitoring: yes (Manual or automatic restart and recovery system as needed) 5) To what degree have you defined and implemented standards (e.g.: naming conventions, set-up standards, recovery process, etc.) for your ETL? a) Very low – No standards defined b) Low – Few standards defined for ETL c) Moderate – Some standards defined for ETL d) High – Most standards defined for ETL e) Very high – All the standards defined for ETL 6) To what degree is your metadata management implemented for your ETL? a) Very low – No metadata management b) Low – Business and technical metadata for some ETL c) Moderate – Business and technical metadata for all ETL d) High – Process metadata is also managed for some ETL e) Very high – All types of metadata are managed for all ETL 2.4 BI Applications - 128 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu 1) Which types of BI applications best describe the highest level purpose of your DW environment? a) Level 1 – Static and parameter-driven reports and query applications b) Level 2 – Ad-hoc reporting; online analytical processing (OLAP) c) Level 3 – Visualization techniques: dashboards and scorecards d) Level 4 – Predictive analytics: data and text mining; alerts e) Level 5 – Closed-loop BI applications; real-time BI applications 2) Which answer best describes your current BI tool usage? a) Level 1 – BI tool related to the data mart b) Level 2 – More than two tools for main stream BI (i.e.: reporting and visualization applications) c) Level 3 – One tool recommended for main stream BI, but each department can use their own tool d) Level 4 – One standardized tool for main stream BI, but each department can use their own tool for specific BI applications (i.e.: data mining, financial analysis, etc.) e) Level 5 – One standardized tool for main stream BI and one standardized tool for specific BI applications 3) To what degree have you defined and implemented standards (e.g.: naming conventions, generic transformations, logical structure of attributes and measures) for your BI applications? a) Very low – No standards defined b) Low – Few standards defined for BI applications c) Moderate – Some standards defined for BI applications d) High – Most standards defined for BI applications e) Very high – All the standards defined for BI applications 4) To what degree are standardized objects (e.g.: KPIs, metrics, attributes, templates) implemented in your BI applications? / To what degree are generic components used for your BI applications? a) Very low – Objects defined for every BI application b) Low – Some reusable objects for similar BI applications c) Moderate – Some standard objects and templates for similar BI applications d) High – Most similar BI applications use standard objects and templates e) Very high – All similar BI applications use standard objects and templates 5) Which BI applications delivery method best describes the highest level purpose of your DW? a) Level 1 – Reports are delivered manually on paper or by email b) Level 2 – Reports are delivered automatically by email c) Level 3 – Direct tool-based interface d) Level 4 – A BI portal with basic functions: subscriptions, discussions forum, alerting e) Level 5 – Highly interactive, business process oriented, up-to-date portal (no differentiation between operational and BI portals) 6) Which answer best describes the metadata accessibility to users? a) Very low – No metadata available b) Low – Some incomplete metadata documents that users ask for periodically c) Moderate – Complete up-to-date metadata documents sent to users periodically or available on the intranet d) High – Metadata is always available through a metadata management tool, different from the BI tool e) Very high – Complete integration of metadata with the BI applications (metadata can be accessed through one button push on the attributes, etc.) 3 DW Organization and Processes - 129 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu 3.1 Development Processes 1) Which answer best describes the DW development processes in your organization? a) Level 1 – ad-hoc development processes; no clearly defined development phases (i.e.: planning, requirements definition, design, construction, deployment, maintenance) b) Level 2 – repeatable development processes based on experience with similar projects; some development phases clearly separated c) Level 3 – standard documented development processes; iterative and incremental development processes with all the development phases clearly separated d) Level 4 – development processes continuously measured against well-defined and consistent goals a) Level 5 – continuous development process improvement by identifying weaknesses and strengthen the process proactively, with the goal of preventing the occurrence of defects 2) To what degree is there a separation between the development/test/acceptance/deployment environments in your organization? – the time too market is too long if each environment is separate a) Very low – no separation between environments b) Low – two separate environments (i.e.: usually development and production) with manual transfer between them c) Moderate – some separation between environments (i.e.: at least three environments) with manual transfer between them d) High – some separation between environments (i.e.: at least two environments) with automatic transfer between them e) Very high – all the environments are distinct with automatic transfer between them 3) To what degree has your organization defined, documented and implemented standards for developing, testing and deploying DW functionalities (i.e.: ETL and BI applications)? a) Very low – no standards defined b) Low – few standards defined c) Moderate – some standards defined d) High – a lot of the standards defined e) Very high – a comprehensive set of standards defined 4) Which answer best describes the DW quality management? a) Level 1 – no quality assurance activities b) Level 2 – ad-hoc quality assurance activities c) Level 3 – standardized and documented quality assurance activities done for all the development phases d) Level 4 – level 3) + measurable and prioritized goals for managing the DW quality (e.g.: functionality, reliability, maintainability, usability) e) Level 5 – level 4) + causal analysis meetings to identify common defect causes and subsequent elimination of these causes; service quality management certification 5) Which answer best describes the sponsor for your DW project? a) Level 1 – no project sponsor b) Level 2 – chief information officer (CIO) or an IT director c) Level 3 – single sponsor from a business unit or department d) Level 4 – multiple individual sponsors from multiple business units or departments e) Level 5 – multiple levels of business-driven, cross-departmental sponsorship including top level management sponsorship (BI/DW is integrated in the company process with continuous budget) - 130 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu 6) Which answer best describes your DW project management? a) Level 1 – project planning and scheduling (i.e.: work breakdown structure, time, costs and resources estimates, planning and scheduling): no; project risk management: no; project tracking and control (i.e.: milestone tracking, change control): no; standard and efficient procedure and documentation, evaluation and assessment: no b) Level 2 – project planning and scheduling: yes; project risk management: no; project tracking and control: no; standard and efficient procedure and documentation, evaluation and assessment: no c) Level 3 – project planning and scheduling: yes; project risk management: no; project tracking and control: yes; standard and efficient procedure and documentation, evaluation and assessment: no d) Level 4 – project planning and scheduling: yes; project risk management: yes; project tracking and control: yes; standard and efficient procedure and documentation, evaluation and assessment: no e) Level 5 – project planning and scheduling: yes; project risk management: yes; project tracking and control: yes; standard and efficient procedure and documentation, evaluation and assessment: yes 7) Which answer best describes the role division for the DW development process? a) Level 1 – no formal roles defined b) Level 2 – defined roles, but not technically implemented c) Level 3 – formalized and implemented roles and responsibilities d) Level 4 – level 3) + periodic peer reviews (i.e.: review of each other‘s work) e) Level 5 – level 4) + periodic evaluation and assessment of roles (i.e.: assess the performance of the roles and match the needed roles with responsibilities and tasks) 8) Which answer best describes the knowledge management in your organization for the DW development processes? a) Level 1 – ad-hoc knowledge gathering and sharing b) Level 2 – organized knowledge sharing through written documentation and technology (e.g.: knowledge databases, intranets, wikis, etc.), and also through training and mentoring programs c) Level 3 – knowledge management is standardized; knowledge creation and sharing through brainstorming, training and mentoring programs d) Level 4 – central business unit knowledge management; quantitative knowledge management control and periodic knowledge gap analysis e) Level 5 – continuously improving inter-organizational knowledge management 9) Which answer best describes the requirements definition phase for your DW project? a) Level 1 – ad-hoc requirements definition; no methodology used b) Level 2 – methodologies differ from project to project; interviews with business users for collecting the requirements c) Level 3 – standard methodology for all the projects; interviews and group sessions with both business and IT users for collecting the requirements d) Level 4 – level 3) + qualitative assessment and measurement of the phase; requirements document also published e) Level 5 – level 4) + causal analysis meetings to identify common bottlenecks causes and subsequent elimination of these causes 10) Which answer best describes the testing and acceptance phase for your DW project? – answers hard to match a) Level 1 – unit testing by another person: yes; system integration testing: no; user training: no; acceptance testing: no; standard procedure and documentation for testing and acceptance: no; external assessments and reviews of testing and acceptance: no; - 131 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu b) Level 2 - unit testing by another person: yes; system integration testing: no; user training: yes; acceptance testing: yes; standard procedure and documentation for testing and acceptance: no; external assessments and reviews of testing and acceptance: no; c) Level 3 - unit testing by another person: yes; system integration testing: yes; user training: yes; acceptance testing: yes; standard procedure and documentation for testing and acceptance: no; external assessments and reviews of testing and acceptance: no; d) Level 4 - unit testing by another person: yes; system integration testing: yes; user training: yes; acceptance testing: yes; standard procedure and documentation for testing and acceptance: yes; external assessments and reviews of testing and acceptance: no; e) Level 5 - unit testing by another person: yes; system integration testing: yes; user training: yes; acceptance testing: yes; standard procedure and documentation for testing and acceptance: yes; external assessments and reviews of testing and acceptance: yes. 3.2 Service Processes (Maintenance and Monitoring Processes) 1) Which answer best describes the DW service quality management in your organization? a) Level 1 – no service quality management activities b) Level 2 – ad-hoc service quality management c) Level 3 – proactive service quality management including a standard procedure d) Level 4 – level 3) + service quality measurements periodically compared to the established goals to determine the deviations and their causes e) Level 5 – levels 4) + causal analysis meetings to identify common defect causes and subsequent elimination of these causes; service quality management certification 2) Which answer best describes the knowledge management in your organization for the DW service processes? a) Level 1 – ad-hoc knowledge gathering and sharing b) Level 2 – organized knowledge sharing through written documentation and technology (e.g.: knowledge databases, intranets, wikis, etc.), and also through training and mentoring programs c) Level 3 – knowledge management is standardized; knowledge creation and sharing through brainstorming, training and mentoring programs d) Level 4 – central business unit knowledge management; quantitative knowledge management control and periodic knowledge gap analysis e) Level 5 – continuously improving inter-organizational knowledge management 3) Which answer best describes the DW service level management in your organization? – SLA with the suppliers of data? a) Level 1 – customer service needs documented in an ad-hoc manner; no service catalogue compiled b) Level 2 – some customer service needs documented and formalized based on previous experience c) Level 3 – all the customer service needs documented and formalized according to a standard procedure into service level agreements (SLAs) d) Level 4 – SLAs reviewed with the customer on both a periodic and event-driven basis e) Level 5 – actual service delivery continuously monitored and evaluated with the customer on both a periodic and event-driven basis for continuous improvement (SLAs including penalties) 4) Which answer best describes the DW incident management in your organization? a) Level 1 – incident management is done ad-hoc with no specialized ticket handling system or service desk to assess and classify them prior to referring them to a specialist b) Level 2 – a ticket handling system is used for incident management; some policies and procedures for incident management are established, but nothing is standardized - 132 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Level 3 – a service desk is the recognized point of contact for all the customer queries; incidents assessment and classification is done following a standard procedure d) Level 4 – standard reports concerning the incident status including measurements and goals (e.g.: response time) are regularly produced for all the involved teams and customers; an incident management database is established as a repository for the event records e) Level 5 – trend analysis in incident occurrence and also in customer satisfaction and value perception of the services provided to them c) 5) Which answer best describes the DW change management in your organization? a) Level 1 – change requests are made and solved in an ad-hoc manner b) Level 2 – a change management system is used for storing and solving the requests for change; some policies and procedures for change management established, but nothing is standardized c) Level 3 – a standard procedure is used for approving, verifying, prioritizing and scheduling changes d) Level 4 – standard reports concerning the change status including measurements and goals (e.g.: response time) are regularly produced for all the involved teams and customers; standards established for documenting changes e) Level 5 – trend analysis and statistics regarding change occurrence, success rate, customer satisfaction and value perception of the services provided to them 6) Which answer best describes the DW technical resource management in your organization? a) Level 1 – ad-hoc resource management activities (only when there is a problem) b) Level 2 – resource management is done following some procedures, but nothing is standardized or documented c) Level 3 – resource management is done constantly following a standardized documented procedure d) Level 4 – standard reports concerning performance and resource management including measurements and goals are done on a regular basis e) Level 5 – resource management trend analysis and monitoring to make sure that there is sufficient capacity to support planned services 7) Which answer best describes the availability management in your organization? a) Level 1 – ad-hoc availability management b) Level 2 – availability management is done following some procedures, but nothing is standardized or documented c) Level 3 – availability management documented and done using a standardized procedure (all elements are monitored) d) Level 4 – risk assessment to determine the critical elements and possible problems e) Level 5 – availability management trend analysis and planning to make sure that all the elements are available for the agreed service level targets 8) Which answer best describes the release management in your organization? a) Level 1 – ad-hoc changes solving and implementation; no release naming and numbering conventions b) Level 2 – release management is done following some procedures, but nothing is standardized or documented; release naming and numbering conventions c) Level 3 – release management is documented and done following a standardized procedure; assigned release management roles and responsibilities d) Level 4 – standard reports concerning release management including measurements and goals are done on a regular basis; master copies of all software in a release secured in a release database e) Level 5 – release management trend analysis, statistics and planning. - 133 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Appendix D: Expert Interview Protocol Interviewee : Date Organization: Start Time: Place End Time : : : Interviewer Instructions Ask for recording permission, for processing purposes. Recordings will be deleted after processing. Check if recorder works correctly! Start with the following introduction and continue with the questions: General information: As briefly explained in our e-mail contact, my name is Catalina Sacu and I am following the two years Master of Business Informatics at Utrecht University. I am currently writing my thesis under the supervision of dr. M.R. Spruit and dr. J. M. Versendaal, aiming to develop a Data Warehouse Capability Maturity Model. The main goal of my thesis is to create a model that would help organizations assess their current data warehouse solution from both a technical and an organizational and processes points of view. Research: In nowadays economy, organizations have a lot of information to gather and process in order to be able to take the best decisions as fast as possible. One of the solutions that can improve the decision making process is the usage of Business Intelligence (BI)/Data Warehouses (DW) solutions. They combine tools, technologies and processes in order to turn data into information and information into knowledge that can optimize business actions. However, even if organizations spend a lot of money for developing these solutions, more than 50 percent of the BI/DW projects fail to deliver the promised results (Gartner Group, 2007). This was the trigger for my research that aims at creating a DW Capability Maturity Model. In this way, we will be able to assess and score the different variables that influence the quality of a DW and determine the current situation of an organization‘s DW solution. Then, we will be able to offer some guidelines on future DW improvements that will lead to a better organizational performance. Goal: As said before, the main goal of my research is to develop a Data Warehouse Capability Maturity Model. This interview is part of my research and its main objective is to get some expert validation for the model I have developed from theory and the case study done at Inergy. The interview will contain questions regarding the following aspects: Your organization and role The Data Warehouse Capability Maturity Model. Data collected during the interview will only be used for my thesis and will be processed anonymously. At the end of my research, you will have the chance to see the results and the final model. The interview will last for about two hours. Before we start, are there any questions? OK, let‘s start! Start recorder! - 134 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Questions: Organization and Role 1. Could you give a short introduction to your organization (including products, markets, customers)? 2. Could you explain your role in the organization (including your experience in BI)? On a scale from 1 to 5, how would you judge your knowledge on BI (Business vs. Technical)? The Data Warehouse Capability Maturity Model 1. In my model, I consider several benchmark variables/categories that have to be taken into consideration and assessed when analyzing the maturity of an organization‘s DW. Which categories would you recommend? Show and explain the DW Capability Maturity Model (with all its components). 2. Do you think the chosen categories are representative and if not, what changes would you make? Let‘s take a look at each category and the questions I chose in order to do the assessment. 3. Do you think the chosen questions are representative and if not, what changes would you make? Let‘s take a closer look at two categories you prefer. 4. Do you think the chosen answers are representative and if not, what changes would you make? 5. In my model, I consider each question to have five possible answers weighted from 1 to 5. Each answer is also specific to one of the five possible maturity stages. In this way, after getting all the answers, we can sum up all the weightings for each category and divide them by the number of questions per category (e.g.: a score for architecture, one for data modelling, etc.). In the end, an overall score can be obtained by summing up the scores for all the categories and dividing them by six (the number of categories). What is your opinion on the scoring method? Should we add weightings for each category (e.g.: architecture – 0.2; data modelling – 0.3; etc.)? What other changes would you make? Final Questions: 1. What are the current trends in DW in your opinion? 2. What are the situational factors (if any) in your opinion that can influence the development of a DW and hence, the applicability of the DW Capability Maturity Model? Thank you for your time and cooperation! Are there any additional comments or questions? Turn off recorder! - 135 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Appendix E: Case Study Interview Protocol Interviewee : Date Organization: Start Time: Place End Time : : : Interviewer Instructions Ask for recording permission, for processing purposes. Recordings will be deleted after processing. Check if recorder works correctly! Start with the following introduction and continue with the questions: General information: As briefly explained in our e-mail contact, my name is Catalina Sacu and I am following the two years Master of Business Informatics at Utrecht University. I am currently writing my thesis under the supervision of dr. M.R. Spruit and dr. J. M. Versendaal, aiming to develop a Data Warehouse Capability Maturity Model. The main goal of my thesis is to create a model that would help organizations assess their current data warehouse solution from both a technical and an organizational and processes points of view. Research: In nowadays economy, organizations have a lot of information to gather and process in order to be able to take the best decisions as fast as possible. One of the solutions that can improve the decision making process is the usage of Business Intelligence (BI)/Data Warehouses (DW) solutions. They combine tools, technologies and processes in order to turn data into information and information into knowledge that can optimize business actions. However, even if organizations spend a lot of money for developing these solutions, more than 50 percent of the BI/DW projects fail to deliver the promised results (Gartner Group, 2007). This was the trigger for my research that aims at creating a DW Capability Maturity Model. In this way, we will be able to assess and score the different variables that influence the quality of a DW and determine the current situation of an organization‘s DW solution. Then, we will be able to offer some guidelines on future DW improvements that will lead to a better organizational performance. Goal: As said before, the main goal of my research is to develop a Data Warehouse Capability Maturity Model. This interview is part of my research and its main objective is to test the model in an organization to see if it works in practice and get some feedback for future improvements of the model. The interview will contain questions regarding the following aspects: Your organization and role The Data Warehouse Maturity Assessment. Data collected during the interview will only be used for my thesis and will be processed anonymously. At the end of my research, you will have the chance to see the results and the final model. The interview will last for about 1.5 hours. Before we start, are there any questions? OK, let‘s start! Start recorder! - 136 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Questions: Organization and Role 1. Could you give a short introduction to your organization (including products, markets, customers)? 2. Could you explain your role in the organization (including your experience in BI/DW) and the BI/DW project? On a scale from 1 to 5, how would you judge your knowledge on BI (Business vs. Technical)? The Data Warehouse Maturity Assessment Questionnaire Show and explain the Data Warehouse Capability Maturity Model (with all its components). Please see the attached questionnaire and fill in the answers. Thank you for your time and cooperation! Are there any additional comments or questions? Turn off recorder! - 137 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Appendix F: Case Study Feedback Template 1. Maturity Scores Short overview on the maturity assessment questionnaire. Tables with maturity scores and radar graph. 2. Feedback Strengths regarding the current DW solution Feedback regarding the DW technical solution Feedback regarding the DW organization & processes. - 138 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Appendix G: Paper Paper will be submitted to the Journal of Database Management. DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu1, Marco Spruit1, Frank Habers2 1 Institute of Information and Computing Sciences, Utrecht University, 3508 TC, Utrecht, The Netherlands. 2 Inergy, 3447 GW Woerden, The Netherlands. Abstract: Data Warehouses and Business Intelligence have been part of a very dynamic and popular field of research in the last years as they help organizations in making better decisions and increasing their profitability. This paper aims at creating a Data Warehouse Capability Maturity Model (DWCMM) focused on the technical and organizational aspects involved in developing a data warehouse environment. This model and its associated maturity assessment questionnaire can be used to help organizations assess their current DW solution and provide them with guidelines for future improvements. The DWCMM was evaluated empirically through multiple expert interviews and case studies to enrich and validate the theory we have developed. Keywords: Data Warehousing, Business Intelligence, Maturity Modelling. Introduction and Problem Definition In nowadays economy, organizations are part of a very dynamic environment due to continuous changing conditions and relationships. As Kaye (1996) notes, ―organizations must collect, process, use, and communicate information, both external and internal, in order to plan, operate and take decisions‖ (p. 20). The ongoing request for profits, increasing competition and demanding customers, all require organizations to take the best decisions as fast as possible (Vitt et al., 2002). One of the solutions that can narrow down the period of time between the moment of acquiring the information and getting the right results to improve the decision making process is the implementation of Data Warehouses and Business Intelligence (BI) applications. Over the years, data warehouses (DWs) and BI solutions have become one of the fundamentals of the information systems that are used to support the decision making initiatives. Most large companies have already established DW systems as a component of the information systems landscape. According to (Gartner, 2007), BI and DWs are at the forefront of the use of IT to support management decisionmaking. DWs can be thought of as the large-scale data infrastructure for decision support. BI can be viewed as the data analysis and presentation layer that sits between the DW and the executive decisionmakers (Arnott & Pervan, 2005). In this way, the DW/BI solutions can transform raw data into information and then into knowledge. - 139 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu However, a DW is not only a software package. The adoption of DW technology requires massive capital expenditure and a certain deal of implementation time. DW projects are hence very expensive, timeconsuming and risky undertakings compared with other information technology initiatives, as cited by prior researchers (Wixom & Watson, 2001; Hwang et al., 2004; Solomon, 2005). Moreover, it is often believed that one-half to two-thirds of all initial DW efforts fail (Hayen et al., 2007). (Gartner, 2007) estimates that more than fifty percent of DW projects have limited acceptance or fail. Therefore, it is crucial to have a thorough understanding of the critical success factors and variables that determine the efficient implementation of a DW solution. These factors can refer to the development of the DW/BI solution or to the usage and adoption of BI. In this research, we will focus on the former as we consider that it represents the foundation for a solid DW solution that can have a high rate of usage and adoption. First, it is critical to properly design and implement the databases that lie at the heart of the DW. The right architecture and design can ensure performance today and scalability tomorrow. Second, all components of the DW solution (e.g.: data repository, infrastructure, user interface) must be designed to work together in a flexible, easy-to-use way. A third task is to develop a consistent data model and establish what and how source data will be extracted. In addition to these factors, the DW needs to be created and developed quickly and efficiently so that the organization can gain the business benefits as soon as possible (AbuAli & Abu-Addose, 2010). As can be seen, a DW project can unquestionably be complex and challenging, and there is usually not a single successful solution that can be applied to all organizations. Therefore, it is very important for organizations to be aware of their current situation and know the steps they need to take for continuous improvement. However, an objective assessment often proves to be a difficult task. Maturity models can be helpful in this situation. They essentially describe the development of an entity over time, where the entity can be anything of interest: a human being, an organizational function, an organization, etc. (Klimko, 2001). Maturity models have a number of sequentially ordered levels, where the bottom stage stands for an initial state than can be, for example, characterized by an organization having little capabilities in the domain under consideration. In contrast, the highest stage represents a conception of total maturity. Advancing on the evolution path between the two extremes involves a continuous progression regarding the organization‘s capabilities or process performance. The maturity model serves as an assessment of the position on the evolution path, as it offers a set of criteria and characteristics that need to be fulfilled in order to reach a particular maturity level (Becker et al., 2009). With the help of maturity modelling, we will gain some insight into the technical and organizational variables that determine the successful development of a DW solution and analyze these variables. Therefore, in order to make an assessment of the most important aspects that influence a DW project, this paper develops a Data Warehouse Capability Maturity Model (DWCMM) which provides an answer to the following research question: How can the maturity of a company’s data warehouse technical aspects be assessed and acted upon? Research Methodology The main goal of this research is to develop a DWCMM that depicts the maturity stages of a DW project. For this purpose, a design research approach is used as its main philosophy is to generate scientific knowledge by building and validating a previously designed artifact (Hevner et al., 2004). In this - 140 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu research, the artifact is the DWCMM, which is developed according to the five steps in developing design research artifacts as described by (Vaishnavi & Kuechler, 2008): problem awareness, suggestion and development, evaluation and conclusion. Awareness of the problem was raised in discussions with DW/BI practitioners and literature study on data warehousing and maturity modelling. A detailed problem description was provided in the section before. Based on this, it has become clear that DW projects often fail or do not bring the expected results and that organizations sometimes need guidelines for improvement. As a solution to this problem, we developed the DWCMM which can be used to assist organizations in doing a maturity assessment for the DW technical aspects and in providing guidelines for future improvements. First, an overview on the model and its main components will be presented. Then, results of the evaluation phase are presented. The DWCMM has been evaluated by carrying out five expert interviews and multiple case study within four organizations, following (Yin, 2009) case study approach. Finally, the last section contains conclusions regarding our model and agenda for future research. DWCMM: The Data Warehouse Capability Maturity Model In literature, a lot of maturity models have been developed (de Bruin et al., 2005), but only some of them managed to gain global acceptance. There are also several information technology and/or information system maturity models dealing with different aspects of maturity: technological, organizational and process maturity. Some of them are specific to the data warehousing/BI field. The most important maturity models that served as a source of inspiration for our research can be seen in table 1. Authors Nolan (1973) Software Engineering Institute (SEI) (1993) Watson, Ariyachandra & Matyska (2001) Chamoni & Gluchowski (2004) The Data Warehousing Institute (TDWI) (2004) Gartner – Hostmann (2007) Model Stages of Growth Capability Maturity Model (CMM) Focus IT Growth Inside an Organization Software Development Processes Data Warehousing Stages of Growth Data Warehousing Business Intelligence Maturity Model Business Intelligence Maturity Model Business Intelligence Business Intelligence Business Intelligence and Performance Management Maturity Model Table 1: Overview of Maturity Models. Business Intelligence Performance Management and Each of these models has a different way of assessing maturity, but there are some common elements for all the models. All the models have interesting elements, but also weak points that could be improved. Moreover, all the models developed for the field of data warehousing/BI focus on more variables involved in such a project, but they do not go deep into analyzing the technical aspects. The maturity model which served as the main foundation for this research is the CMM (Paulk et al., 1995). It has become a recognized standard for rating software development organizations. The CMM is a framework that describes the key elements of an effective software process and presents an evolutionary improvement path from an ad-hoc, immature process to a mature, disciplined one. Since its development, CMM has become a universal model for assessing software process maturity. However, the CMM has often been criticized for its complexity and difficulty of implementation. That is why we simplified it by keeping the five maturity levels (i.e.: initial, repeatable, defined, managed and optimizing), the process - 141 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu capabilities and the key process areas, which in our model would translate to the chosen benchmark variables/categories for doing the DW maturity assessment. Therefore, it can be seen that even if DW/BI solutions are often implemented in practice and a lot of maturity models have been created, none is actually focusing on the technical aspects of the DW/BI solution and the organizational processes that sustain them. Hence, this is the research gap we would like to fill in by developing a Data Warehouse Capability Maturity Model (DWCMM) that focuses on the DW technical solution and DW organization and processes. The DWCMM can be depicted in figure 1. A short overview of the model and its components will be provided in the next paragraphs. When analyzing the maturity of a DW solution, we are actually taking a snapshot of an organization at the current moment in time. Therefore, in order to do a valuable assessment, it is important to include in the maturity analysis the most representative dimensions involved in the development of a DW solution. Several authors describe that the main phases usually involved in a DW project lifecycle are (Kimball et al., 2008; Moss & Atre, 2003; Ponniah, 2001): project planning and management, requirements definition, design, development, testing and acceptance, deployment, growth and maintenance. All of these phases and processes refer to the implementation and maintenance of the actual DW technical solution which includes: the general architecture and infrastructure, data modelling, ETL, BI applications. These categories can be analyzed from many points of view which will be depicted in our model and the maturity assessment we developed. Therefore, the DWCMM will be restricted for doing the assessment of the technical aspects, without taking into consideration the DW/BI usage and adoption or the DW/BI business value. It will consider two main benchmark variables/categories for analysis, each of them having several sub-categories. Firstly, the DW Technical Solution consists of the following four components: General Architecture and Infrastructure, Data Modelling, Extract-Transform-Load (ETL) and BI Applications. Secondly, the DW Organization & Processes dimension comprises the following two aspects: Development Processes and Service Processes. Figure 1: Data Warehouse Capability Maturity Model (DWCMM). - 142 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu As can be seen from figure 1, the DWCMM does a maturity assessment which will provide a maturity score for each benchmark sub-category. In order to create a complete image on the current DW solution for an organization, the DWCMM has several components: A DW maturity assessment questionnaire: The whole DW maturity assessment questionnaire has been published in (Sacu et al., 2010). Emphasis should be put on two aspects regarding the DW maturity assessment questionnaire. Firstly, it does a high level assessment of an organization’s DW solution and it is limited strictly to the DW technical aspects. Secondly, the model will assess “what” and “if” certain characteristics and processes are implemented and not “how” they are implemented. The DW maturity assessment questionnaire has 60 questions divided into the following three categories: DW General Questions (9 questions) – it comprises of several questions about the DW/BI solution and they are not scored. Their purpose is to offer a better image on the drivers for implementing the DW environment, the budget allocated for data warehousing and BI, the DW business value, end-user adoption, etc. This will be useful in creating a complete picture on the current DW solution and its maturity. Also, once the questionnaire is filled in by more organizations, this data will serve as input for statistical analysis and comparisons between organizations from the same industry or across industries. DW Technical Solution (32 questions) – it comprises of several scored questions for each of the following sub-categories: General Architecture and Infrastructure (9 questions) Data Modelling (9 questions) ETL (7 questions) BI Applications (7 questions). More details on this part will be given in the next sections. DW Organization & Processes (19 questions) – it comprises of several scored questions for each of the following sub-categories: Development Processes (11 questions) Service Processes (8 questions). More details on this part will be given in the next sections. Each question from the questionnaire will have five possible answers which are scored from 1 to 5, 1 being a characteristic for the lowest maturity stage and 5 for the highest one. When an organization takes the survey, it will first receive a maturity score for each sub-category by computing the average value of the weightings (i.e.: sum of the weightings / number of questions); then, an overall score for each of the two main categories will be given by computing the average value of the scores obtained for each subcategory; and finally, an overall maturity score is shown following the same principle applied to the main two categories scores. We believe that the maturity scores for the sub-categories can give a good overview on the current DW solution implemented by the organization. This is the reason why, after computing the maturity scores for each sub-category, a radar graph as the one depicted in figure 1 will be drawn to show the alignment between these scores. In this way, the organization will have a clearer image on their current DW project and will know what sub-category is the strongest and which one is left behind. - 143 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Moreover, after reviewing the maturity scores and the given answers by a specific organization, some general feedback and advice for future improvements will be provided. Each organization that takes the assessment will receive a document with a short explanation on the scoring method, a table with their maturity scores and the radar graph, and then some general feedback that will consist of: a general overview on the maturity scores; an analysis of the positive aspects already implemented in the DW solution; and several steps that the organization should take in order to improve their current DW application. A condensed DW maturity matrix: As our model measures the maturity of a DW solution, we also created two maturity matrices – a condensed maturity matrix and a detailed one – each of them having five maturity stages as inspired by the CMM: Initial (1); Repeatable (2); Defined (3); Managed (4); Optimized (5); where the initial stage describes an incipient DW development and the optimized level shows a very mature solution that can be obtained by an organization with a lot of experience in the field where everything is standardized and monitored. An organization will usually be situated on different stages of maturity for each sub-category that will determine the overall maturity level. The condensed DW maturity matrix gives a short overview of the most important characteristics for each sub-category for each maturity level. This will offer a better image on the main goal of the DWCMM and on what the detailed maturity matrix entails. The condensed maturity matrix can be seen in figure 2. Stages DW Technical Solution Benchmark Variables Architecture Initial (1) Desktop marts data Repeatable (2) Defined (3) Managed (4) Optimized (5) DW/BI service that federates a central DW and other sources via standard interface Enterprise-wide standards and automatic synchronization of all the data models Optimized ETL for real-time DW with all the standards defined Independent data marts Independent data warehouses Central DW with/without data marts Data Modelling No data models synchronization or standards Manually synchronized data models Manually or automatically synchronized data models Automatic synchronization of most data models ETL Simple ETL with no standards that just extracts and loads data into the DW Basic ETL with simple transformations BI Applications Static and parameter-driven reports Ad-hoc reporting; OLAP Advanced ETL (e.g. slowly changing dimensions manager, data quality system, reusability, etc.) Dashboards & scorecards More advanced ETL (e.g. hierarchy manager, special dimensions manager, etc.) Predictive analytics; data & text mining Closed-loop & real-time BI applications - 144 - DW Organization & Processes DWCMM: The Data Warehouse Capability Maturity Model Development Processes Service Processes Ad-hoc, nonstandardized development processes or defined phases Some Standardized development development processes processes with policies and all the phases procedures separated and established with all the roles some phases formalized separated Ad-hoc, non- Some service Standardized standardized processes service service processes policies and processes with procedures all the roles established formalized Figure 2: DWCMM Condensed Maturity Matrix. Catalina Sacu Quantitative development processes management Continuous development processes improvement Quantitative service processes management Continuous service processes improvement A detailed DW maturity matrix: We will give a short overview on the detailed DW maturity matrix in this paragraph. First, the characteristics for each maturity stage are usually obtained by mapping the correspondent answers of each question from the maturity assessment questionnaire (except for several characteristics such as: project management, testing and acceptance, whose answers are formulated in a different way). In this way, an organization will be able to see their maturity stage by category (e.g.: General Architecture and Infrastructure) and by main category characteristics (e.g.: metadata, standards, infrastructure, etc.). The matrix has two dimensions: columns – show each benchmark sub-category (i.e.: General Architecture and Infrastructure, Data Modelling, ETL, BI Applications; Development Processes, Service Processes) with their maturity stages from Initial (1) to Optimized (5); rows – show the main analyzed characteristics (e.g.: for General Architecture and Infrastructure – conceptual architecture, business rules, metadata, security, data sources, performance, infrastructure, update frequency) for each sub-category divided by maturity stage. Moreover, the matrix can be interpreted in two ways. First, one could take each stage and see which the specific characteristics for each sub-category for that particular stage are. Second, one could take each sub-category and see which its specific characteristics for each stage or for a particular stage are. As the developed questionnaire does an assessment for each benchmark sub-category, a specific organization will most likely follow the second interpretation. They would probably like to know what steps to take to improve each sub-category and hence, the overall maturity score, which will lead to a higher maturity stage. It is also very unlikely that an organization will have all the characteristics for all the sub-categories on the same maturity stage at the same moment in time. Therefore, if a company gets a maturity score of 3, this does not mean that all the characteristics for all the sub-categories are on stage three. Depending also on the standard deviation and the answers themselves, we can find out more information about the actual situation. Now that the main components of the DWCMM have been identified, we will continue by taking a closer look at the main categories and sub-categories of the model and their analyzed characteristics. These can be depicted in the maturity assessment questionnaire and detailed maturity matrix. We will start with the DW technical solution and continue with the DW organization and processes. - 145 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu DW Technical Solution Maturity As mentioned earlier, the main components that need to be analyzed when doing an assessment of the DW technical solution are: general architecture and infrastructure, data modelling, ETL and BI applications. General Architecture and Infrastructure DW architecture includes: three main components (i.e.: data modelling, ETL, BI applications), several data storage components (e.g.: source systems, data staging area, DW database, operational data store, data marts) and the way they are assembled together (Ponniah, 2001), and underlying elements such as infrastructure, metadata and security that support the flow of data from the source systems to the endusers (Kimball et al., 2008; Chauduri & Dayal, 1997). This is connected to the conceptual approach of designing and building the DW (e.g.: conformed data marts – Kimball or enterprise-wide DW – Inmon, etc.). Therefore, in this research we consider architecture and infrastructure as a separate sub-category for assessing maturity and for which the main characteristics will be further analyzed. Conceptual architecture and its layers (question 1) – encompasses the conceptual approach of designing and building the DW with all its data storage layers. DW data sources (question 6) - the types of data sources that the DW extracts data from (e.g.: Excel files, text files, relational databases, ERP & CRM systems, unstructured data: text documents, e-mails, images, videos, Web data sources). Infrastructure (question 8) – it provides the underlying foundation that enables the DW architecture to be implemented (Ponniah, 2001), and it includes elements such as: hardware platforms and components, operating systems, database platforms, connectivity and networking (Kimball et al., 2008). Metadata management (question 4) – metadata can be seen as all the information that defines and describes the structures, operations and contents of the DW system in order to support the administration and effective exploitation of the DW. The main elements that influence its maturity are: the types of implemented metadata (i.e.: business, technical or process) and the integration of metadata repositories (Moss & Atre, 2003; Kimball et al., 2008). Security management (question 5) – user access security is usually implemented through several methods, presented here in hierarchical order of difficulty of implementation (Kimball et al., 2008; Moss & Atre, 2003; Ponniah, 2001): authentication, tool-based security, role-based security, authorization. Business rules (questions 2 & 3) – they are abstractions of the policies and practices of a business organization (Kaula, 2009), and are used to capture and implement precise business logic in processes, procedures, and systems (manual or automated). Performance optimization (question 7) – encompasses the various methods needed to improve DW performance (Ponniah, 2001): software performance improvement (e.g.: index management, data partitioning, parallel processing, view materialization); hardware performance improvement; specialized DW appliances or cloud computing which are characteristics for a very high stage of maturity. - 146 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Update frequency (question 9) – it is one of the characteristics that differentiate classical DW solutions built for strategic and tactical BI from the newer DWs that process data in real time. Data Modelling Data modelling is the process of creating a data model. A data model is ―a set of concepts that can be used to describe the structure of and operations on a database‖ (Navathe, 1992, pp. 112-113). Data modelling is very important for creating a successful information system as it defines not only data elements, but also their structures and relationships between them. The most important characteristics which should be taken into consideration when assessing the maturity of data modelling are described below. Synchronization between all the data models found in the DW (question 2) – establishing consistency among data from a source to a target data storage and vice versa and the continuous harmonization of the data over time. Design levels (question 3) – encompasses all the data model design levels: conceptual design, logical design and physical design. Tool (question 1) – data models can be created by just drawing the models in different spreadsheets and documents. However, the more mature solution is to use a data modelling tool that can make the design itself and metadata management easier and more efficient. Standards (questions 4 & 5) – standards in a DW environment are necessary and cover a wide range of objects, processes, and procedures. All the maturity assessments related to standards will address general aspects such as the definition and documentation of standards and their actual implementation. Most often, standards related to data modelling refer to naming conventions for the objects and attributes in the data models. Metadata management (question 6) – encompasses the common subset of business and technical metadata components as they apply to data (Moss & Atre, 2003): data names, definitions, relationships, identifiers, types, lengths, policies, ownership, etc. Dimensional modelling (questions 7, 8 & 9) – there are several data modelling techniques that can be applied for data warehousing: relational (or normalized), dimensional, data vault, etc. In this research we focused on dimensional modelling. For more information on dimensional modelling, see (Kimball, 1996). Extract-Transform-Load (ETL) As the name shows, the Extract-Transform-Load (ETL) process mainly involves the following activities: extracting data from outside sources; transforming data to fit the target‘s requirements; loading data into the target database. The ETL system is very complex and resource demanding (Kimball et al., 2008), and hence, 60 to 80 percent of the time and effort of developing a DW project is devoted to the ETL system (Nagabhushana, 2006). The main characteristics that we included in our ETL maturity assessment are further described in this paragraph. Complexity (question 2) – this refers to the maturity and performance of each ETL component (i.e.: extract, transform, load). For example, the extraction phase should include a data profiling system, a - 147 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu change data capture system and the extract system itself. The transformation step usually includes cleaning and transforming data according to the business rules and standards that have been established for the DW. The DW load system takes the load images created by the extraction and transformation subsystems and loads these images directly into the DW. Data quality system (question 3) – data quality is critical for the success of a DW. Therefore, we decided to include a question that would depict its main characteristics for each maturity stage regarding: daily automation, specific data quality tools, identifying data quality issues and actually solving them. Management and monitoring (question 4) – encompasses all the necessary capabilities for the ETL processes to run consistently to completion and be available when needed (e.g.: an ETL job scheduler; a backup system; a recovery and restart system – it can be manual or automatic; a workflow monitor, etc.) Tool (question 1) – there is a constant debate whether an organization should deploy custom-coded ETL solutions or should buy an ETL tool suite (Kimball & Caserta, 2004). A company that uses hand-coded ETL usually does not have a very complex ETL process which shows a low level of maturity regarding ETL capabilities. Metadata management (question 7) – ETL is responsible for the creation and use of much of the metadata describing the DW environment. Therefore, it is important to capture and manage all possible types of metadata for ETL: business, technical and process metadata. Standards (questions 5 & 6) – includes ETL specific standards that are related to: naming conventions, set-up standards, recovery and restart system, etc. BI Applications BI applications, sometimes referred to as ―front-end‖ tools (Chauduri & Dayal, 1997), are what the endusers see and hence, are very important for a DW to be considered a successful one. According to March & Hevner (2007), a crucial point for achieving DW implementation success is the selection and implementation of appropriate end-user analysis tools, because business benefits of BI are only gained when the system is adopted by its intended end-users. The main aspects that determine the maturity of BI applications are analyzed further in this paragraph. Types of BI applications (question 1) – encompasses the main types of BI whose complexity contributes to the maturity of a DW environment. According to Azvine et al. (2006), traditional BI applications fall into the following categories sorted by ascending complexity: report what has happened – standard reporting and query applications; analyze and understand why it has happened – ad-hoc reporting and online analytical processing (OLAP); visualization applications (i.e.: dashboards, scorecards); predict what will happen – predictive analytics (i.e.: data and text mining). In the last couple of years, due to the development of real-time data warehousing, a new category of BI applications has developed called operational BI and closed-loop applications (Kimball et al., 2008). Delivery method (question 6) – it includes the main BI applications delivery methods. As end users are interested only in the results they get from the BI applications, the easiness of accessing and delivering these results is critical for the success of the DW solution. - 148 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Tool (question 2) – defines the usage of BI applications tools which can really make a difference for the DW solution. Metadata management (question 7) – encompasses the main metadata accessibility methods. As BI applications are what the end user sees, this is an important aspect for DW success (Moss & Atre, 2003). Standards (questions 3 & 4) – it includes standards specific to BI Applications such as: naming conventions, generic transformations, logical structure of attributes and measures, etc. DW Organization and Processes Maturity When assessing the maturity of a DW technical solution, the processes and roles involved in the project also need to be analyzed. A good technical solution cannot be developed without the processes surrounding it as there is a strong interconnection between the two parts. The necessary processes for a DW project are: development processes and service processes. DW Development Processes A DW solution can be considered a software engineering project with some specific characteristics. And, therefore, as any software engineering project, it will go through several development stages (Moss & Atre, 2003). Since DW/BI is an enterprise-wide evolving environment that is continually improved and enhanced based on feedback from the business community, the best approach for its development is iterative and incremental development, with agile techniques for the development of BI applications (Kimball et al., 2008; Ponniah, 2001). The high level phases and tasks required for an effective DW implementation are (Kimball et al., 2008; Moss & Atre, 2003): project planning and management; requirements definition; design; development; testing and acceptance; deployment/production. The main characteristics which might influence the maturity of DW development processes can be seen below. CMM levels (question 1) – as it is hard to judge which software development paradigm is better and more mature, the first maturity question on development processes is a more general one and it refers to how the DW development processes map to the CMM levels. Project planning and management (question 7) – encompasses the main elements that determine the maturity of this characteristic (Lewis, 2001): project planning and scheduling; project risk management; project tracking and control; standard procedure and documentation; and evaluation and assessment. DW/BI sponsor (question 6) – defines the extent of organizational support and sponsorship for the DW environment. Strong support and sponsorship from senior business management is critical for a successful DW initiative (Ponniah, 2001). DW project team and roles (question 8) – encompasses how DW project roles and responsibilities are formalized and implemented to solve skill-role mismatches (Humphries et al., 1999; Nagabhushana, 2006). Requirements definition (question 10) – encompasses how requirements definition is done. In a DW, users‘ business requirements represent the most powerful driving force (Ponniah, 2001) as they impact virtually every aspect of the project. - 149 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Testing and acceptance (question 11) – this is a critical phase for DW success as it includes several important activities which are not always implemented. The degree of implementation influences the success of a DW project and hence, its maturity. Development/ testing/ acceptance/ production environments (question 2) – encompasses the way organizations set up different environments for different purposes to support all the development phases (Moss & Atre, 2003). DW quality management (question 5) – its purpose is to provide management with appropriate visibility into the development process being used by the DW project and the products being built (Paulk et al., 1995). Knowledge management (question 9) – encompasses all the knowledge management activities and the way they are implemented. Standards (questions 3 & 4) – makes an analysis of the standards used for successfully developing, testing and deploying DW functionalities. DW Service Processes In the last two decades, software maintenance began to be treated as a sequence of activities and not as the final stage of a software development project (April et al., 2004). These processes are very important after a DW has been deployed in order to keep the system up and running and to manage all the necessary changes. Lately, IT organizations made a transition from being pure technology providers to being service providers. This service oriented perspective on IT organizations can be best applied to the software maintenance field as it is an ongoing activity as opposed to the software development which is more project based (Niessink & van Vliet, 2000). Over the years, various IT service frameworks have been proposed, but one that acts as the de-facto standard for the definition of best practices and processes for service support and service delivery is the Information Technology Infrastructure Library (ITIL) (Salle, 2004). Therefore, we will consider the service components from ITIL as a starting point for our analysis of the DW service processes part. Moreover, two maturity models related to IT maintenance and service also served as a foundation for this part of our DW maturity model: the Software Maintenance Maturity Model (April et al., 2004) and the IT Service CMM (Niessink et al., 2002). Taking into consideration these models and the changing nature of a DW, we considered the following components when assessing the maturity of DW service processes. Service quality management (question 2) – this is similar to the DW quality management, but applied to the service processes. Knowledge management (question 3) – this is also similar to the DW development processes, but in the context of service processes. Service level management (question 4) – it negotiates service level agreements (SLAs) with the suppliers and customers and ensures that they are met by continual monitoring and reviewing (Cater-Steel, 2006). Incident management (question 5) – its main objective is to provide continuity by restoring the service in the quickest way possible by whatever means necessary (Salle, 2004). - 150 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Change management (question 6) – it is described as a regular task for immediate and efficient handling of changes that might occur in a DW environment. Technical resource management (question 7) – the purpose of resource management is to maintain control of the necessary hardware and software resources needed to deliver the agreed DW services level targets (Niessink & van Vliet, 1999). Availability management (question 8) – manages risks and ensures that all DW infrastructure, processes, tools and roles are according to the SLAs by using appropriate means and techniques (Colin, 2004). Release management (question 9) – as a DW is continuously changing and evolving over time, the objective of release management is to ensure that only authorized and correct versions of DW are made available for operation (Salle, 2004). Evaluation of the DWCMM In order to validate the DWCMM, two methods were chosen – expert validation and multiple case studies – on which we will elaborate in this section. Expert Validation To evaluate the utility and further revise the DWCMM, expert validation was applied. An ―expert‖ is defined by Hoffman et al. (1995) as a person ―highly regarded by peers, whose judgements are uncommonly accurate and reliable and who can deal effectively with rare or tough cases. Also, an expert is one who has special skills or knowledge derived from extensive experience with subdomains‖ (p. 132). Therefore, eliciting knowledge from experts is very important and useful and can be done using several methods, one of them being structured and unstructured interviews (Hoffman et al., 1995). Moreover, five experts in data warehousing and BI were interviewed and asked to give their opinions about the content of the model we have developed. The interviews were structured, but consisted of open questions, in order to capture the knowledge of respondents. This offered the possibility of enabling the experts to liberally state their opinions and ideas for improvement. The expert panel consists of five experts from practice, each of them having at least 10 years of experience in the DW/BI field. An overview of the experts and their affiliations is depicted in table 2. All of them are DW/BI consultants at different organizations in The Netherlands (local or multinational). Respondent ID Job Position 1 CI/BI consultant Industry DW/BI Consulting B2B ≈ 45 Market Employees 2 3 Principal consultant/ BI consultant Thought leader BI/CRM Respondent Affiliation IT Services BI Consulting 4 Principal consultant BI 5 BI consultant IT Services DW Consulting B2B ≈ 49000 B2B ≈ 38000 B2B ≈1 B2B ≈ 35 Table 2: Expert Overview. - 151 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu The experts were asked to give their opinions regarding the DWCMM structure, the DWCMM condensed maturity matrix and the DW maturity assessment questionnaire. All reviewers gave positive feedback for their first impression of all three deliverables, said they made sense and the model could be applied for assessing an organization‘s current DW solution. Valuable insights and criticism were provided that resulted in several (mostly minor) improvements. Furthermore, the category ―Architecture‖ was renamed ―General Architecture and Infrastructure‖ as the former created some confusion among the interviewees. Some adjustments were made to the ETL characterization for each stage of the DWCMM condensed maturity matrix. However, most feedback was received regarding the maturity assessment questionnaire. This resulted in two categories of changes: proposed changes that due to time constraints and scope limitation were not implemented in the final version of the model, but should be considered for future research; and implemented improvement suggestions that involved some question rephrasing and answer rephrasing or changing. Multiple Case Studies Depending on the nature of a research topic and the goal of a researcher, different research methods (qualitative and quantitative) are appropriate to be used (Benbasat et al., 1987; Yin, 2009). One of the most widely used qualitative research methods in information systems (IS) research is case study research. It can be used to achieve various research aims: provide descriptions of phenomena, develop theory and test theory (Darke et al., 1998). In our research, we will use it to test theory which in this case is the DWCMM we developed. The theory is usually either validated or found to be inadequate in some way, and may then be further refined on the basis of the case study findings. Case study research may adopt single or multiple case designs. As according to Benbasat et al. (1987) and Yin (2009), multiple case studies are preferred over single ones to get better results and analytic conclusions, we decided to conduct a multiple case study research following (Yin, 2009) case study approach. In this way, we can achieve a multiple goal: test the model in practice to see if the chosen benchmark variables/categories, the maturity assessment questions and answers match the organizations‘ specific solutions; and receive feedback and knowledge from respondents regarding the DWCMM in order to make future improvements. Despite the fact that all individual cases are interesting, this section focuses on the overall results. Case Overview The case studies have been conducted at four organizations of different sizes, operating in several types of industries and offering a wide variety of products and services. An overview of the case study organizations (figures are taken from 2009 annual reports) and respondents is depicted in table 3. The main criterion used in the search for suitable organizations was that all approached organizations had a professionally DW/BI system in place whose maturity could be assessed by applying the DWCMM. Furthermore, an important criterion for the selection of respondent per case was that the interviewed respondents had an overall view on the technical and organizational aspects for the DW/BI solution implemented in their organization. A short analysis on the maturity scores each organization got after taking the assessment is also given further in this paragraph. - 152 - DWCMM: The Data Warehouse Capability Maturity Model Organization Industry Market Revenue Employees Respondent Function A Catalina Sacu B Retail Insurance C Retail B2C 19.94 billion € ≈ 138000 BI consultant B2B & B2C B2C 4.87 billion € 780 million € ≈ 4500 ≈ 3660 DW/BI technical BI manager architect Table 3: Case and Respondent Overview. D Maintenance & Servicing B2B NA ≈ 3500 BI consultant & DW lead architect Case Study Analysis In this section, a short analysis of the results gotten by all the organizations after filling in the assessment questionnaire is given. The maturity scores regarding the implemented DW solution obtained by the organizations can be seen in the table below. Maturity Score Benchmark Category Architecture Data Modelling ETL BI Applications Development Processes Service Processes Organization A Organization B Organization C 2.67 2.56 2.17 3.44 3.14 3.29 2.71 2.71 2.90 3.19 2.63 3.00 Table 4: Organizations‘ Maturity Scores. Organization D 3.89 3.00 3.71 3.43 3.66 2.87 3.55 4.11 2.86 3.57 3.02 3.12 As shown in the picture depicting our model, a better way to see the alignment between the maturity scores for the six categories is by drawing the radar graph. We will show here the radar graph for organization A as an example. Service Processes Architecture 5 4 3 2 1 0 Development Processes Data Modelling Organization A Ideal Situation ETL BI Applications Figure 3: Alignment Between Organization A‘s Maturity Scores. Some more information regarding the maturity scores for all the four case studies are provided in the table below. - 153 - DWCMM: The Data Warehouse Capability Maturity Model Organization Maturity Score Total Maturity Score for DW Technical Solution Total Maturity Score for DW Organization & Processes Overall Maturity Score Highest Score Lowest Score Catalina Sacu A B C D 2.67 3.00 3.51 3.52 2.77 3.10 3.26 3.07 2.72 ETL - 3.14 3.05 Data Modelling 3.44 Architecture - 2.56 3.38 Architecture 3.89 Service Processes - 2.87 3.29 Data Modelling 4.11 ETL - 2.86 Data Modelling 2.17 Table 5: Maturity Scores Analysis. As can be seen from table 4, maturity scores for each sub-category are usually between 2 and 4, with one exception: organization D scored 4.11 for Data Modelling. Thus, the overall maturity scores and the total scores per category ranged between 2 and 4 which shows that most organizations are probably somewhere between the second and fourth stage of maturity. The highest maturity score was gotten by organization C, and the lowest one by organization A. Apparently, an overall score close to 4 or 5 is quite difficult to achieve. This is usually normal in maturity assessments, as in practice, nobody is so close to the ideal situation. It will be interesting to see the range of scores after the questionnaire will have been filled in by a large number of organizations. From table 5 it can be seen that the categories with the highest and lowest scores are diverse depending on the organization. For example, organization A scored lowest for Data Modelling, whereas Data Modelling was the most mature variable for organization D. Interesting conclusions can also be drawn if comparing the scores for organizations A and C as they are part of the same industry. The former is an international food retailer and has more experience in this industry, whereas the latter is a local one with less experience. However, organization A got a quite low DW maturity score. Thus, experience in the industry does not also mean maturity in data warehousing. Of course, more factors can influence this difference in scores: size, the way data warehousing/BI is embedded in the organizational culture, the percentage from the IT budget for BI, etc. However, the goal of our model is not only to give a maturity score to a specific organization, but also provide them with some feedback and the necessary steps for reaching a higher maturity stage. For example, the overall maturity score for organization A is 2.72, which leaves a lot of room for improvement. Moreover, as the lowest score is for Data Modelling, a good starting point would be this category. Due to confidentiality reasons, more details regarding the maturity scores and feedback cannot be offered here. Benchmarking As already mentioned in the previous sections, the DWCMM can serve as a benchmarking tool for organizations. The DW maturity assessment questionnaire provides a quick way for organizations to assess their DW maturity and, at the same time, compare themselves in an objective way against others in the same industry or across industries. Of course, better results will be achieved for benchmarking after more organizations will take the maturity assessment. However, in order to have a better image on how - 154 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu the graph will look like when doing benchmarking, we will provide here an example for organization A using the data from the case studies we performed. The bar chart can be depicted below. Service Processes Development Processes BI Applications Average Score Best Practice ETL Organization A Data Modelling Architecture 0 1 2 3 4 5 Figure 4: Benchmarking for Organization A. To sum up, the DW maturity assessment questionnaire can be successfully applied in practice. We generally received positive feedback regarding the questionnaire from the case study interviewees. In this way, we could test whether the questions and their answers are representative for assessing the current DW solution for a specific organization and if they can be mapped to any organization depending on the situational factors. Respondents usually had no problems in recognizing the proposed benchmark categories and understanding the questions and answers from the survey. We also had the chance to apply the scoring method and give appropriate feedback for each case study. Finally, we combined all the feedback received from the case studies and did some minor, but valuable improvements to several questions and answers in order for them to be more representative for the analyzed characteristics and better fit the maturity stages. Conclusions and Further Research This research has been triggered by the estimates made by (Gartner, 2007) and other researchers that more than fifty percent of DW projects have limited acceptance or fail. Therefore, we developed a Data Warehouse Capability Maturity Model (DWCMM) that would help organizations assess the technical aspects of their current DW solution and provide guidelines for future improvements. In this way we attempted to answer the main research question for our study: How can the maturity of a company’s data warehouse technical aspects be assessed and acted upon? The main conclusion from our study is that, even if our maturity model could help organizations improve their DW solutions, there is no ―silver bullet‖ for a successful development of DW/BI solutions. The DWCMM provides a quick way for organizations to assess their DW/BI maturity and compare themselves in an objective way against others in the same industry or across industries. It received positive feedback from the five experts that reviewed and validated it and it also resonated well with the audiences from our four case studies. Several (mostly minor) improvements were made after the validation process. - 155 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu However, our model is not without limitations. First of all, it is critical to emphasize the fact that the model only does a high-level assessment. In order to truly assess the maturity of their DW/BI solutions and discover the strong and weak variables, organizations should use our assessment as a starting point for a more thorough analysis. In the future, several questions could be added in our model for a more detailed analysis of the current DW/BI environment and more valuable feedback offered to organizations. Second, a limitation of this study is that it is based on the design science research which answers to research questions in the form of design artifacts. Being a qualitative research method, a risk for objectivity might arise. Another limitation is related to the validation process for our model. Due to time constraints and difficulty of finding them, it was reviewed only by five experts. Therefore, more experts should be interviewed in the future to enrich the structure and content of the model. Also, due to the fact that the model was tested only in four cases, it is not possible to generalize the findings to any given similar situation. For further research, it would be interesting to validate the model using quantitative research methods. In this way, we will be able to do some statistical analysis on the data, more valuable benchmarking and improvements on the whole structure of the model. Another future extension that would increase the value of the model could include questions and analysis for other types of data modelling (e.g.: normalized modelling, data vault, etc.) because, as stated earlier in this paper, we limited our maturity assessment only to dimensional modelling. Last, but not least, more work is also needed to extend our model to the analysis of DW/BI end user adoption and business value. New benchmark categories and maturity assessment questions could be added regarding these two problems. References AbuAli, A., & Abu-Addose, H. (2010). Data Warehouse Critical Success Factors. European Journal of Scientific Research, 42 , (2), 326-335. Aldrich, H., & Mindlin, S. (1978). Uncertainty and Dependence: Two Perspectives on Environment. In L. Karpik, Organization and Environment: Theories, Issues and Reality (pp. 149-170). London: Sage Publications Inc. Arnott, D., & Pervan, G. (2005). A Critical Analysis of Decision Support Systems Research. Journal of Information Technology, 20 , (2), 67-87. Blumberg, R., & Atre, S. (2003). The Problem with Unstructured Data. Retrieved July 23, 2010, from Information Management: http://www.information-management.com/issues/20030201/6287-1.html Cater-Steel, A. (2006). Transforming IT Service Management - the ITIL Impact. Proceedings of the 17th Australasian Conference on Information Systems. Adelaide, Australia. Cavaye, A. (1996). Case Study Research: A Multifaceted Research Approach for Information Systems. Information Systems Journal, 6 , 227-242. Chamoni, P., & Gluchowski, P. (2004). Integrationstrends bei Business-Intelligence-Systemen, Empirische Untersuchung auf Basis des Business Intelligence Maturity Model. Wirtschaftsinformatik, 46 , (2), 119-128. Chauduri, S., & Dayal, U. (1997). An Overview of Data Warehousing and OLAP Technology. ACM Sigmod Record, 26 , (1), 65-74. Choo, C. (1995). Information Management for the Intelligent Organization. Medford, NJ: Information Today, Inc. Colin, R. (2004). An Introductory Overview of ITIL. Reading, United Kingdom: itSMF Publications. - 156 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu de Bruin, T., Freezey, R., Kulkarniz, U., & Rosemann, M. (2005). Understanding the Main Phases of Developing a Maturity Assessment Model. Proceedings of the 16th Australasian Conference on Information Systems. Sydney, Australia. Eckerson, W. (2004). Gauge Your Data Warehousing Maturity. Retrieved July 3, 2010, from The Data Warehousing Institute: http://tdwi.org/Articles/2004/10/19/Gauge-Your-Data-Warehousing-Maturity.aspx?Page=2 Feinberg, D., & Beyer, M. (2010). Magic Quadrant for Data Warehouse Database Management Systems. Retrieved July 21, 2010, from Business Intelligence: http://www.businessintelligence.info/docs/estudios/Gartner-MagicQuadrant-for-Datawarehouse-Systems-2010.pdf Gartner. (2007, February 1). Creating Enterprise Leverage: The 2007 CIO Agenda . Retrieved June 24, 2010, from Gartner: http://www.gartner.com/DisplayDocument?id=500835 Gray, P., & Negash, S. (2003). Business Intelligence. Proceedings of the 9th Americas Conference on Information Systems, (pp. 3190-3199). Tampa, Florida, USA. Hakes, C. (1996). The Corporate Self Assessment Handbook, 3rd edition. London: Chapman & Hall. Hevner, A., March, S., Park, J., & Ram, S. (2004). Design Science in Information Systems Research. Management Information Systems Quarterly, 28 , (1), 75-106. Hoffman, R., Shadbolt, N., Burton, A., & Klein, G. (1995). Eliciting Knowledge from Experts: A Methodological Analysis. Organizational Behaviour and Human Decision Processes, 62 , (2), 129-158. Hwang, H., Ku, C., Yen, D., & Cheng, C. (2005). Critical Factors Influencing the Adoption of Data Warehouse Technology: A Study of the Banking Industry in Taiwan. Decision Support Systems, 37 , 1-21. Inmon, W. (1992). Building the Data Warehouse. Indianapolis: John Wiley and Sons, Inc. Kaula, R. (2009). Business Rules for Data Warehouse. International Journal of Information Technology, 5 , 58-66. Kaye, D. (1996). An Information Model of Organization. Managing Information, 3 , (6),19-21. Kimball, R., Ross, M., Thornthwaite, W., Mundy, J., & Becker, B. (2008). The Data Warehouse Lifecycle Toolkit, 2nd Edition. Indianapolis: Wiley Publishing, Inc. Klimko, G. (2001). Knowledge Management and Maturity Models: Building Common Understanding . Proceedings of the 2nd European Conference on Knowledge Management, (pp. 269-278). Bled, Slovenia. Lewis, J. (2001). Project Planning, Scheduling and Control, 3rd Edition. New York: McGraw-Hill. Madden, S. (2006). Rethinking Database Appliances. Retrieved July 21, 2010, from Information Management: http://www.information-management.com/specialreports/20061024/1066827-1.html?pg=1 March, S., & Hevner, A. (2007). Integrated Decision Support Systems: A Data Warehousing Perspective. Decision Support Systems, 43 , (3), 1031-1043. Moss, L., & Atre, S. (2003). Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications. Boston: Addison Wesley. Nagabhushana, S. (2006). Data Warehousing. OLAP and Data Mining. New Delhi: New Age International Limited. - 157 - DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu Navathe, S. B. (1992). Evolution of Data Modelling for Databases. Communications of the ACM, 35 , (9), 112-123. Nolan, R. (1973). Managing the Computer Resource: A Stage Hypothesis. Communications of the ACM, 16 , (7), 399-405. Ponniah, P. (2001). Data Warehousing Fundamentals. New York: John Wiley & Sons, Inc. Salle, M. (2004). IT Service Management and IT Governance: Review, Comparative. Retrieved July 16, 2010, from HP Technical Reports: http://www.hpl.hp.com/techreports/2004/HPL-2004-98.pdf Sacu, C., Spruit, M. & Habers, F. (2010). Data Warehouse (DW) Maturity Assessment Questionnaire. Utrecht: Utrecht University. Sen, A., & Sinha, A. (2005). A Comparison of Data Warehousing Methodologies. Communications of the ACM, 48 , (3), 79-84. Vaishnavi, V., & Kuechler, W. (2008). Design Science Research Methods and Patters: Innovating Information and Communication Technology. Boca Raton, Florida: Auerbach Publications Taylor & Francis Group. Yin, R. (2009). Case Study Research Design and Methods. Thousand Oaks, California: SAGE Inc. - 158 -