Information Quality Balanced Scorecard
Transcription
Information Quality Balanced Scorecard
Information Quality Balanced Scorecard Information Quality Balanced Scorecard Dublin City University M.Sc. in Electronic Commerce Project Submission Date: 01/08/2006 Ronan Flannery Na Xing Yuan Shi Richard Howlett 55138187 MECT1 52142264 MECB1 52141233 MECB1 55231957 MECT1 Declaration We hereby declare that this report is entirely our own work and has not been submitted as an exercise for a degree at any other university. Acknowledgements MEC 2006 1 Information Quality Balanced Scorecard We would like to thank our supervisor Dr. Markus Helfert for all his effort and guidance over the last two months. His support and patience has been second to none and is truly appreciated. He has been particularly instrumental during the past few tenuous weeks of this project. We also acknowledge Mr. Paul Davis who helped us comprehend our practicum from a business view point. We also appreciate the effort by Dr. Cathal Gurrin who has spent a lot of time coordinating and organising the many projects which are being submitted. We must also acknowledge a number of our friends who have taken then time and effort to proofread this business plan before its submission. Thank you to all of them. MEC 2006 2 Information Quality Balanced Scorecard Abstract Business data is becoming more and more precious as its uses expand more and more. Corporate databases are filled with important data that reflects their business processes and activities. It usefulness however goes far beyond the operational applications which generate it, data can be used in data warehouses and decision making systems which can be integrated with other applications which are in turn connected to other external applications over a corporate extranet or indeed the Internet. Data problems have always existed in databases. In the past measures were put in place to ensure the accuracy of the most significant data while less important data was handled in the correct way. However with the applications of data increasing all the time these methodologies need to be severely altered and new techniques introduced. The requirement for accurate data is currently a must have for businesses as their databases of customers increases rapidly. Only four types of organisations need to worry about data quality; those that care about their customers, those that care about profit and loss, those that care about their employees and those that care about their future (Redman 1996). The main challenge for corporations is to make data quality requirements and data quality assessments an integral of part of their work. Data Quality is vital to companies and organisations today. The costs relating to bad data can be immense. In a recent survey conducted as part of this Ecommerce Practicum, 61.9% of people said that data quality was critical to their organisation. We also found that 57.1% of the companies do not currently use any data quality analysis software and that some of the reasons for this included lack of resources for looking after data quality in the company and financial issues relating to this. As a result of this research we believe that there is a definitive market for a data quality service. The questionnaire which we developed can be found in the appendix of this business plan. MEC 2006 3 Information Quality Balanced Scorecard Table of Contents 1 Executive Summary..........................................................................................................7 2 Strategy ............................................................................................................................8 2.1 Service Concept.........................................................................................................8 2.2 The Balanced Scorecard ...........................................................................................9 2.2.1 Introduction.........................................................................................................9 2.2.2 ADOscore Toolkit.............................................................................................11 2.2.3 Benefits of Balanced Scorecard........................................................................11 2.2.4 Summation........................................................................................................12 2.3 Business Model........................................................................................................12 2.4 Risk Assessment ..................................................................................................13 2.5 Product Milestones .................................................................................................13 2.6 Competitive Advantage............................................................................................14 3 Marketing........................................................................................................................16 3.1 Market Research ......................................................................................................16 3.1.1 The Data, Information, Knowledge, Wisdom (DIKW) Chain...........................16 3.1.2 Data Quality.......................................................................................................18 3.1.3 The Effect of Poor Data Quality........................................................................19 3.1.4 The Importance of Data Quality........................................................................19 3.2 SWOT Analysis........................................................................................................20 Strengths ...........................................................................................................................20 Weaknesses .......................................................................................................................20 3.3 Market Analysis .......................................................................................................21 3.3.1 The Current Market of Data Quality and Tools.................................................21 3.3.2 Data Quality in Ireland.....................................................................................23 3.3.3 Target Market....................................................................................................25 3.3.4 Market Forecast.................................................................................................25 3.4 Porter’s Five Forces .................................................................................................26 3.5 Competitors..............................................................................................................27 3.6 Marketing Strategy...................................................................................................32 3.6.1 Overall Strategy.................................................................................................32 3.6.2 Positioning Strategy..........................................................................................32 3.6.3 Pricing Strategy.................................................................................................33 3.6.4 Sales Strategy....................................................................................................33 3.7 Marketing Mix.........................................................................................................34 3.7.1 Product...............................................................................................................34 3.7.2 Price...................................................................................................................34 3.7.3 Promotion..........................................................................................................35 3.7.4 Place..................................................................................................................36 3.7.5 People................................................................................................................36 3.7.6 Process...............................................................................................................36 4. Technical Report ...........................................................................................................37 MEC 2006 4 Information Quality Balanced Scorecard 4.1 Software Overview...................................................................................................37 4.2 System Architecture.................................................................................................39 4.2.1 Overview...........................................................................................................39 4.2.2 Teradata University Network Database.............................................................40 4.3 Creating the Balanced Scorecard.............................................................................41 4.3.1 Technical Background.......................................................................................41 4.4 IQ Balanced Scorecard ...........................................................................................43 4.4.1 The Strategy Model...........................................................................................43 4.4.2 Success Factors Model......................................................................................44 4.4.3 Cause & Effect Model.......................................................................................45 4.4.4 Indicator Model.................................................................................................46 4.4.5 Indicator Pool....................................................................................................47 4.4.6 The Balanced Scorecard Map...........................................................................49 4.4.7 Reporting..........................................................................................................49 4.5 The Guideline of ADOscore Controlling Cockpit...................................................49 4.6 Technology Evaluation ............................................................................................52 4.6.1 Data Cleaning....................................................................................................52 4.6.2 Data Cleaning Software....................................................................................53 4.6.3 Other Technology..............................................................................................55 5 Management and Labour Requirements.........................................................................56 5.1 Structure ..................................................................................................................56 5.2 Salaries ....................................................................................................................58 6 Finance............................................................................................................................59 6.1 Income......................................................................................................................59 6.2 Expenditure..............................................................................................................60 6.3 Future IQ Market ....................................................................................................66 Appendix ...........................................................................................................................68 Appendix A. Information Quality Survey......................................................................68 Appendix B. Balance Scorecard Steps ..........................................................................73 References .........................................................................................................................80 MEC 2006 5 Information Quality Balanced Scorecard List of Figures Figure 1 Perspectives of Balanced Scorecard (Source: Arveson 1998)..........................10 Figure 2 The DIKW Hierarchy (Source: Bellinger 2004)...............................................16 Figure 3 The Intelligent Learning Organisation (Source: Larry 1999)............................18 Figure 4 – The Dysfunctional Learning Organisation (Source: Larry 1999).....................18 Figure 5 – SWOT Analysis................................................................................................21 Figure 6 Data Quality Tools (Source: METAspectrum 2004).........................................22 Figure 7 Results of Survey (Source: Agosta 2004).........................................................23 Figure 8 Questionnaire Results 1.....................................................................................24 Figure 9 Questionnaire Results 2.....................................................................................24 Figure 10 Visibility into Data in Real Time (Source: INFORMATIC2 2005)................25 Figure 11 Porter’s Five Forces.........................................................................................26 Figure 12 Survey Results (Source: TDWI Conference 2004)..........................................28 Figure 13 Overall Strategy...............................................................................................32 Figure 14 Projected Sales.................................................................................................33 Figure 15 Overview of the ADOscore System Components (Source: ADOscore1).......38 Figure 16 Overview of the Controlling Cockpit (Source: ADOscore2)..........................38 Figure 17 System Overview.............................................................................................40 MEC 2006 6 Information Quality Balanced Scorecard Figure 18 Extract of the Data from the TUN Database...................................................41 Figure 19 Strategy Model for the IQ Balanced Scorecard...............................................44 Figure 20 Success Factors Model....................................................................................44 Figure 21 Cause and Effect Model..................................................................................45 Figure 22 Setting the Variables within the Performance Measure..................................46 Figure 23 Indicator Model .........................47 Figure 24 Indicators Pool.................................................................................................48 Figure 25 The Indicators Pool.........................................................................................48 Figure 26 Overview of the Controlling Cockpit..............................................................50 Figure 27 Drilldown on the Consistency Dimension......................................................50 Figure 28 Drilldown on the Accuracy Dimension...........................................................51 Figure 29 Drilldown on the Completeness Dimension....................................................52 Figure 30 AlphaMiner Screenshot – Replacing Missing Values.....................................54 Figure 31 SQL Data Compare Screenshot.......................................................................55 Figure 32 Management Structure....................................................................................56 List of Tables Table 1 Annual Salaries...................................................................................................58 Table 2 Sales Figures for First Three Years.....................................................................60 1 Executive Summary The business plan which follows gives a detailed discussion on the implementation of a Balanced Scorecard to manage the accuracy, consistency and completeness of data within a typical organisation. The Balanced Scorecard is created by the use of a software tool called ADOscore and is constructed in such a way that it will provide a service to our customers. Client corporations supply data, taken from a database, spreadsheet or any typical data storage device and our Balanced Scorecard will perform the necessary functionality to determine the accuracy, consistency and completeness of the specified data. A Balanced Scorecard typically works with numerical values; therefore our service provides the necessary operations to convert from typical database values to their numerical equivalents. When the results are finalised, we provide the output in many different forms such as PDF files, .RTF files or simple word documents. This provides customers an efficient source from which to examine problematic data entries. Further development of our product will enable us to expand the current three dimensions which we use and provide more capabilities and more efficiency to our clients. As a supplementary service we will also provide a data cleaning tool to our clients. This will be available at their discretion. MEC 2006 7 Information Quality Balanced Scorecard The objective of this project is to examine the commercial potential of the Balanced Scorecard tool. By developing a working prototype at the outset we hope to establish the markets which could exist for such a tool. As data quality control is relatively immature in the Irish market this could prove to be a very daunting task. However by engaging with various companies, introducing them to the Balanced Scorecard software, its functionalities and indeed its competitive advantage we hope to create an awakening so that perspective clients might be interested in the long term. By conducting extensive market research, primarily through the use of a questionnaire aimed at extremely experienced personnel in this domain, the key markets can indeed be identified. In summation then, what we offer is a unique solution in the data quality domain, providing our clients with a new and efficient service in this exciting and extremely important sector of information technology. Initially we had viewed the Balanced Scorecard as a product which would be sold to our customers but further research and development of the software led us to believe that a service provided would be the best approach to take. The subsequent business plan which follows is concerned with the strategic approach which is deemed the most appropriate for our company to succeed. We will take a detailed look at the marketing analysis and indeed the sales plan whilst paying particular attention to the technological characteristics of our service. We hope to provide enough evidence and a fully functional end service to illustrate the competitive advantage our service will provide over other existing software tools in this domain. 2 Strategy 2.1 Service Concept The service utilises a newly developed Information Quality Balanced Scorecard data analysis technique. We provide a flexible analysis of companies’ data based around an agreed set of data quality dimensions. This is important as ultimately it is the final users who are able to judge the appropriateness of data in relation to the process in which they are involved. Different users can differently perceive the quality of a data set and accordingly with their requirements they have different acceptable quality levels. In the customised set up of the Balanced Scorecard we meet with the organisation and determine the specified level of values associated with their data quality dimensions. We then take the data to be analysed and use the customised Balanced Scorecard to provide the company with feedback on the quality of the data. In general, nowadays the strategy trend for organisations is the offering of personalised services (Cappiello 2005) such as MEC 2006 8 Information Quality Balanced Scorecard our data quality service, and this together with the fact that we are removing some of the extra pressure on company resources; it will help to make our business a success. 2.2 The Balanced Scorecard 2.2.1 Introduction A lot of research has been carried out on data quality in various domains. The idea of data quality has been widely investigated and can be distributed over a range of topics which include data warehousing, data mining, data cleaning, quality management on the web and indeed data quality in information systems. All of the above have relevant research targets which exist to enhance their respective capabilities. In every field, different data quality dimensions exist for the measurement and improvement of information quality. The Balanced Scorecard is a management system that allows businesses or organisations to clarify their strategy and translate them into action. It measures corporate performance by using a set of balanced attributes based on the relevant strategy. The Balanced Scorecard is primarily concerned with internal processes within an organisation and provides conclusive feedback to improve organisational performance and goals (Arveson, 1998). The external environment is also included in this management system. In short, the Balanced Scorecard illustrates what the key characteristics a company should measure in order to balance the financial side of things. Drs. Robert Kaplan and David Norton who developed the Balanced Scorecard approach describe it as follows: The Balanced Scorecard retains traditional financial measures. But financial measures tell the story of past events, an adequate story for industrial age companies for which investments in long term capabilities and customer relationships were not critical for success. These financial measures are inadequate, however, for guiding and evaluating the journey that information age companies must make to create future value through investment in customers, suppliers, employees, processes, technology, and innovation (Arveson 1998). Typically but not always the Balanced Scorecard suggests we view an organisation from four different perspectives, to development measurements, collect sample data and analyse the data with respect to each of these perspectives. The four perspectives are as follows: Financial Perspective Customer Perspective Business Process Perspective Learning and Growth Perspective The diagram which follows illustrates how business strategy is linked to the four perspectives: MEC 2006 9 Information Quality Balanced Scorecard Figure 1 Perspectives of Balanced Scorecard (Source: Arveson 1998) This is the main idea behind the Balanced Scorecard data analysis technique. However for our service we have identified different perspectives. Indeed any number of perspectives could be defined as deemed appropriate to suit ones particular requirements. The three characteristics are chosen in such a way as they best illustrate the dimensions which should be monitored for an organisation to exist in a positive and effective manner with regard data quality. The dimensions or indeed the perspectives are: (1) Accuracy Perspective (2) Completeness Perspective (3) Consistency Perspective Accuracy Data accuracy can be viewed as the fundamental characteristic of data quality. The basis of this is that if data is incorrect or unreliable then the other dimensions matter very little. Therefore we can conclude that the accuracy of data relates to the extent to which data is correct and reliable. Naturally for this formula to function correctly it is necessary to have strict rules governing the precise meaning of incorrect data. In some cases the absence of MEC 2006 10 Information Quality Balanced Scorecard a single character value would be accepted whilst in others it would be unacceptable. Thomas C. Redman gives a good definition of accuracy in his book Data quality for the Information Age, Accuracy is a measure of the degree of agreement between a data value or collection of data values and a source agreed to be correct (Redman 1996). Completeness Completeness is defined as the degree of presence of data in a given collection of data. Or a more precise definition is the extent to which data are of sufficient width, depth and scope for the current task which is being dealt with (Capiello 2005). To look at completeness in an obvious sense one can conclude that it refers to the percentage of missing data from a given data collection. Completeness is a measure of the degree to which data values are present for required attributes. Consistency Consistency can be considered to be a representational dimension of data quality. Consistency means that two or more data values do not conflict with one another. Inconsistency arises the when two or more collections overlap with one another. The expectation is that values are the same for overlapping attributes, when they are not severe inconsistencies can arise. A major consequence of inconsistencies with data values is redundancy or duplication which can have a detrimental effect with regard time issues. A strict definition of consistency is then a measure of the degree to which a set of data satisfies a set of constraints, consistency may refer to two data vales, which are consistent if they do not disagree (Redman 1996). A discussion of these dimensions in a more technical and mathematically minded approach is illustrated in the Technical report of this business plan. 2.2.2 ADOscore Toolkit There are many software tools available in order to create a management system for an organisation. The tool which this business plan focuses on is the ADOscore software. The ADOscore toolkit offers an application library to describe the many model types, classes, functions and relational features which it offers. The Balanced Scorecard provides a controlling cockpit to manage and control the most important performance indicators and the status of the strategic objectives. The visualisation of these goals is achieved through a “Traffic Light” system which helps establish the negative implications of some goals. Typically the controlling cockpit is HTML based, however other output representations can be obtained including RTF files, PDF files and basic word documents. Through these files feedback is given to our clients. A thorough evaluation and synopsis of ADOscore is given in the Technology and Operations Section of business plan. 2.2.3 Benefits of Balanced Scorecard MEC 2006 11 Information Quality Balanced Scorecard (1) It allows project managers to identify and analyse the best practices in an organisation and expand their usage elsewhere. (2) The Balanced Scorecard is not completely concerned with financial measures but takes into account company strategy. (3) It allows organisations to compare their business processes against competitors. (4) It improves the bottom line by reducing process cost and improving productivity and mission effectiveness. (5) A performance measurement system such as the Balanced Scorecard allows an agency to align its strategic activities to the strategic plan. (6) Represents a uniques modelling technique in order to manage an organisations processes. 2.2.4 Summation The Balanced Scorecard approach is distinct from other strategic measurement systems in that it is more than a mere collection of financial and nonfinancial measures. The Balanced Scorecard is intended as a strategic control system which aligns departmental and personal costs to strategy and not only as a strategic measurement system. In short the Balanced Scorecard overcomes the inherent weaknesses and vagueness in other management approaches and provides a clear definition of what organisations should do in order to stabilise the aforementioned perspectives. 2.3 Business Model The business model for this concept is largely a service provided to organisations who store large quantities of data in order to maintain operational efficiency. A typical example of this would be a large retail company who maintain large collections of data about their customers and their sales. Originally our primary market was the retail domain where we considered that a lot of business could be acquired. The many businesses which exist and indeed the ever expanding industry make this a relatively easy market to identify. However our service is not solely confined to this sector and could be incorporated into any business which keeps large database records. Our service will utilise the software which we provide, including our Balanced Scorecard and range of optimal personal computers and function over the data which our customers provide to us. By this very nature our service provides all the necessary tools to analyse where the problems exist for our clients. Our Balanced Scorecard works solely on three characteristics as already mentioned and is merely a prototypical piece of software. This prototype will provide the foundation on which new and improved versions could be based. The dimensions which we provide could indeed be expanded to include many more of the relevant dimensions which are considered in many literatures. The startup cost of the business, combined with monthly expenditure and the development of the Balanced Scorecard will introduce a fairly substantial start up cost for our service. Rates MEC 2006 12 Information Quality Balanced Scorecard for use of the software will be charged based upon the extent of data which is measured. Target sectors include the financial industry, retail industry and other large organisations which maintain large quantities of stored data. We will also need a license agreement from ADOscore to use their balanced software application. 2.4 Risk Assessment As this is a new venture the risks involved are quite substantial. Financial backing needs to be acquired in the early stages off development in order to obtain the software necessary to undertake such a venture. Capital will need to be raised in order to purchase the processors on which our service will manage and measure data quality. The capital acquired will also help to mitigate against the initial financial burden of start up which includes monthly salaries, sales and marketing and indeed further research and development. We must ensure that these costs are not over budget and can be neutralised within a short space off time. The main source of risk with regard financing will be to achieve a worthwhile licensing agreement with ADOscore. A possible agreement could be reached whereby we will provide them with a percentage of the earnings we obtain from every company which we service, combined with a somewhat initial small fee for the use of their software. However a license agreement which includes full use of their software is our preferred option. Renewals can be obtained after a finite period of time. Employees will be kept to a minimum at the start which will somewhat curtail our level of expenditure. As this service is relatively new within the Irish market, a slow up take could be encountered on many levels. Businesses might be reluctant to use a relatively unknown piece of software to analyse their systems; therefore an appropriate marketing plan will be set in place. The corporations which currently use data quality tools have their own system for data management and measurement. We must break into the market and encourage corporations to use our tool. We must create a strong brand name in the data quality sector providing a range of capabilities to our clients. This is detrimental to the overall success off our company. The ability to hire experienced and knowledgeable staff is also a key area which needs to be monitored. Early development of the company is based from Dublin where the labour market has been extremely competitive in the last few years. Failure to hire the right staff on time is a key risk to company plans. Finally as there are existing competitors in this domain, there is a likelihood that our service will never catch on and exist in a positive manner. Also customers may be disappointed by the functional aspect of our service and may look elsewhere whilst warning their trading partners to seek alternate guidance in solving their data quality problems. All of the above aspects need to be seriously considered if we are to remain in business. 2.5 Product Milestones MEC 2006 13 Information Quality Balanced Scorecard The Balanced Scorecard prototype is still in the very early stages of development but a working example is available. (1) The Balanced Scorecard prototype, which currently analyses accuracy, consistency and completeness of data has been tested on a sample database (Walmart Retail Corporation) and functions correctly. The sample data which is provided from the respective database is totalled and the output figures which are produced are entered into the ADOscore application. An SQL statement is used in order to format the figures correctly from the database and enter them into the Balanced Scorecard. (2) The service prototype will be launched and used by our client corporations. Research will be done investigating the success of our service prototype (3) Currently the working service prototype measures the three aforementioned characteristics of data quality, however subject to demand various other dimensions can indeed be measured. The Balanced Scorecard tool can be expanded to include dimensions such as accessibility, reliability, timeliness and access security. A minor tweak to the scorecard is required to provide this alternate functionality. (4) Judging by past results and market demand a fully functional Balanced Scorecard tool measuring intrinsic dimensions, contextual dimensions, representational dimensions and accessibility dimensions will be created. This will provide a more optimised service to our current clients and indeed attract more business. (5) A data cleansing tool will be developed and supplied to our clients as appropriate. 2.6 Competitive Advantage Areas of Application (1) Health Industry. (2) Financial Industry. (3) Retail Industry. (4) Corporations with large collections of data. Service Benefits (1) Easy to Use. (2) Accurate Comprehensible Feedback to Clients. (3) Possible Development of Product. (4) No Business Disruption to Clients. (5) Data Quality Highly Important. (6) Personalised Service. Data Quality Highly Important MEC 2006 14 Information Quality Balanced Scorecard Data Quality control is by all accounts immature in the Irish market. However a lot of emphasis is placed on business organisations to hold accurate representations of their data to remain operational in a positive sense. By advertising our product through media such as our website and television an awakening in the importance of data quality can be aroused. As the market is still immature our service will hope to promote itself and become a leading force in this domain. Easy to Use The ease of use of our product is highly beneficial to us and our clients alike. Companies will simply send us their data which they want analysed (such as names, addresses and financial data) and we will enter it into our Balanced Scorecard. The company will be aware of the dimensions under which we measure data through our company website. By using an implementation technique such as running SQL formulas on the received data we can acquire the summary figure to put into the Balanced Scorecard. Comprehensible Feedback to Clients When the results of the Balanced Scorecard have been finalised our system will generate reports and analysis on the data and will provide feedback to the company in the form which they desire. These formats could range from HTML, XML, .RTF files, PDF files or simply a Microsoft Word document. Possible Development of Product Companies can learn of our service through our advertisements on the company website. It will illustrate to them the dimensions which we currently use to analyse data. As our Balanced Scorecard can easily be expanded to include more dimensions there is a possibility for further development of the service. Our website will be updated accordingly to illustrate our current capabilities. As a minor service we will also offer a data cleaning service to our clients as well. We will utilise a cleaning tool based on the Alphaminer open source software to clean up data inconsistencies. This will be in the later stages of our service development. No Business Disruption to Clients Little or no disruption on the client side will be apparent as they merely duplicate their figures from their database and provide them to us. Our feedback will be based on supplementary files and will have no impact on the client’s currently executing database. Personalised Service Further development of the Balanced Scorecard will enable us to offer a personalised service to our clients. They can simply check our website and see what functionalities we MEC 2006 15 Information Quality Balanced Scorecard currently offer. This personalised service could prove to be very effective as we build the Balanced Scorecard around our clients needs. 3 Marketing 3.1 Market Research 3.1.1 The Data, Information, Knowledge, Wisdom (DIKW) Chain There are many approaches to information: some people think there is no distinction between data and information; others can see the significant difference. According to the knowledge management school, they draw a line between information and knowledge so that data, information and knowledge can be seen as a hierarchical structure. There is also a hierarchy, a distinct relationship between wisdom to knowledge and knowledge to information, which can affect the other and even can be changed into another. Data represents the fact without relating to any other things. Information embodies the understanding of data in a relationship and makes it possible to cause and affect. Knowledge represents a pattern which can connect and highly predict what is described or what is going to happen next. Wisdom is systemic, and embodies more than a basic understanding of fundamental principles of the knowledge being what it is. The diagram below shows the view of this DIKW hierarchy. Figure 2 The DIKW Hierarchy (Source: Bellinger 2004) Data Data is the raw material that information is derived from and intelligent actions and decisions are based on it. Data represents things or entities in the real world that can exist MEC 2006 16 Information Quality Balanced Scorecard in any form, usable or not, and has no meaning of itself. Data is only the raw material that is produced by information. Information Information is the finished product and the context of data. Data becomes usable and meaningful, and facts also become understandable. Clear definition or meaning of data, correct values, and understandable presentation are three quality components of information quality. Nonquality components can cause a failure of business processes or result in a wrong decision. From a business point of view, although information may be well defined, the values may be accurate, and meaningfully presented, but it may not be a valuable resource to organisations. Quality information, when understood by people can lead to value. Knowledge Quality information becomes a powerful resource that can have a significant contribution to mankind. Knowledgeable workers and quality information have the potential for valuable information. Knowledge is not just known as information, which is information in context. Knowledge is to understand the significance of the information. The development of information technology allows organisations to capture knowledge electronically, and share it across the entire enterprise. Moreover, the use of the Internet, Intranet, the World Wide Web, and data mining are expanding data to be shared in both data warehouses and operational databases. Wisdom The goal of any organisation is to gain maximum value from its available resources. The information resource can be maximized if it is managed in a way which quality is easy to use when needed. Personnel resources are maximized when they are well trained and provided with efficient resources. Information quality aims to train the knowledgeable workers and enable the intelligent learning organisation through a strategic resource. The intelligent learning organisation can maximize both the experience and information resources in the learning process. According to Figure 3 below, the intelligent learning organisation enables us to share high quality information in a way, which maximizes the entire organisation. MEC 2006 17 Information Quality Balanced Scorecard Figure 3 The Intelligent Learning Organisation (Source: Larry 1999) Figure 4 shows the dysfunctional learning organisation, which means, “impaired or abnormal functioning” organisations. Nonquality information hampers the dysfunctional organisations, which prevents the sharing of information and knowledge. Nonquality information keeps these organisations from being effective and competitive because “it hinders knowledge of markets, customers, technologies, and processes that help any organisation grow. Knowledge gains added power when it is the primary ingredient of a business” to facilitate learning as a competitive weapon. (Larry 1999) Figure 4 – The Dysfunctional Learning Organisation (Source: Larry 1999) 3.1.2 Data Quality Data quality or information quality is an increasingly important issue for organisations of all size. Data quality has a broad view for the accuracy of a particular set of data, and the way data enters and flows through the entire organisation. Organisations may not realize MEC 2006 18 Information Quality Balanced Scorecard the impact of poor or unknown data quality if they did not define a broad term. To evaluate the organisations’ data quality, they should consider existence, validity, consistency, timeliness, accuracy and relevance. In order to ensure data quality, it requires more than just finding and fixing missing or inaccurate data in the organisation. It means delivering comprehensive, consistent, relevant, and timely data to the business regardless of its application, use, or origin. (INFORMATICA1 2005) 3.1.3 The Effect of Poor Data Quality Data needs to be analyzed and cleansed, before it can be used in the data warehouse, in a customer relationship management and in an enterprise resource planning applications. Poor data quality often has a greater effect on organisations than they realize. Many kinds of costs have been led by low quality information for organisations. In the information age, many organisations ignore the processing of poor data quality, and it may cause monetary loses to organisations in a variety of ways. There are many worldwide cases which show that the low quality data invoked costs millions. For example, General Motors (GM) has been forced to increase its annual 2005 losses by $2bn because of accounting errors. As a result, it was going to delay filing GM 2005 annual report. GM will also restate the results for the years 2000 to 2004 due to the same error. The other example is from the ‘BBC climate model’, an error has been discovered in the climate prediction project. The fault in a Climateprediction.net model launched in February 2006 that causes temperatures in past climates to rise quicker than seen in real observation. From a business perspective, data quality is very important and is managed by a data steward in many large organisations. According to Gartner Inc., 25% of critical data within Fortune 1000 companies will continue to be inaccurate through 2007. And Gartner also says that the US business spends an estimated $611 billion dollars a year on poor quality customer data in postage, printing, and staff overhead. These results could have been avoided if the organisations had put effort to evaluate the detrimental effects of poor data quality. Having a concrete way to measure the cost of poor data quality allows the organisations to determine the extension of low quality information that affects the bottom line. This process also highlights opportunities to improve customer relations, optimize the production stream and enhance employee satisfaction by analyzing and improving data quality. (Loshin 2001) 3.1.4 The Importance of Data Quality To efficient IT and business operations, as well as to the successful strategic business initiatives and to the organisation’s longterm competitive advantage, it is important to achieve and maintain highquality data. For example, Ireland’s largest telecommunications company eircom implemented a rulesbased data standardization process, which improve accuracy and save costs when transitioning legacy telephone directory data between systems. This result reduced the amount of manual rekey by 75%. Most importantly, eircom’s business team could be able to develop those processes, and support a businessoriented information and information management system by the data quality product. The key issues are listed as follows: MEC 2006 19 Information Quality Balanced Scorecard • Minimizing IT projects risk. Data quality is crucial for IT organisations, especially when handling data integration projects. Data quality is addressed as a key requirement to prevent project delays and overruns, which minimizes project risks. It also saves costs. Furthermore, highquality data streamlines IT operations and ensures the best use of limited resources. • Making timely business decisions. The executives rely on the IT organisation to deliver highquality data so that they can make their strategic business decisions immediately. Poor data quality may cause the companies to suffer productivity losses, incur significant costs, or lose hardearned competitive advantage. Time means everything in business, inaccurate, incomplete, inconsistent, or outdated data could have an impact for executives to make quick and informed business decisions. • Ensuring regulatory compliance. Highquality data supports operations by helping organisations meet the requirements of the SarbanesOxley Act in the United States and Basel ll in Europe, which demand control and accurate reporting of business performance. (INFORMATICA 2005) Reliable, auditable, highquality data gives the organisation the transparency that needs to comply with the regulations. • Expanding the customer base. To gather, present, and maintain highquality data is important in all the customer interactions: correctly spell a customer’s name to ensure uptotheminute product and price data are listed on the web site that cannot be underestimated. Accurate, current data helps the organisation to achieve superior customer service. 3.2 SWOT Analysis Strengths Weaknesses SWOT analysis is a general technique for assessing the data quality evaluation service and • Accurate, consistent, and • Limited measurement of its environment in here. The SWOT analysis involves the generation and recording of the complete information to dimensions strengths, weaknesses, opportunities and threats. increase data quality • Lack of resources Comprehensive feedback to • Immature Irish market clients • Financial costs • Easy to use and integrate • No assistance • Possible development of • Lack of management support in • products Opportunities Data Quality Threats • Serviceoriented to increase • flexibility Hugh potential for data • Political issues • Suitable for companies of all quality market • Competition from mature • sizes Awareness of data quality data quality software Wellqualified staff for vendors • Also attractive to small MEC 2006 development team sized companies • Lack of software tools 20 Information Quality Balanced Scorecard Figure 5 – SWOT Analysis 3.3 Market Analysis 3.3.1 The Current Market of Data Quality and Tools The current data quality market is growing and gaining more popularity. As IT companies continue to spend a large amount of money on business intelligence applications such as customer relationship management, data warehousing, data mining and so on, data quality which feeds these applications has become increasingly higher. The overall information quality market is maintaining the current 14% annual growth rate that is higher than the 7% average forecast of other IT segments. Additionally, Forrester also found that 20% of data quality applications now address noncustomer data, which includes product information, inventory, pricing, order management and business administration. On the other hand, the market has been attracting a number of software vendors who offer solutions used in the identification and remediation of problems with enterprise information assets, as shows in Figure 6. To ensure accuracy, completeness, and completeness of data, data quality vendors offer different kinds of products and platforms for profiling, standardizing, matching, cleansing, and enriching data. MEC 2006 21 Information Quality Balanced Scorecard Figure 6 Data Quality Tools (Source: METAspectrum 2004) Mediumsized software vendors are the main player for data quality and data integration technologies in this market. In addition, there are a few smallsized software vendors that focus on specific markets or technologies. Since the vendors possess different vision or market strategy, market presence criteria become important factors in the vendor selection process. Performance criteria are then more focused on technology and product functionality, even though potential buyers should pay close attention to services, executions, and finance. • Leaders. Leaders have mature products that are supported by proven technology and contain almost all aspects of data quality functionality in this market. Furthermore, leaders have larger market shares in relation to others in the market. • Challengers. Challengers come in different varieties in the data quality market, and some are even close enough to break into the leader category such as DataFlux. It means the gap between leaders and challengers is significant, but can be surmountable. All challengers are committed to data quality technologies and provide sufficient solutions. In addition, challengers do not have the resources or customer base to drive the market share or dominate other areas of presence and performance criteria. • Followers. No software vendors can accurately be described as followers were evaluated in Figure 4. It would focus on the evaluation of the established market leaders, their competitors, and the emergence of vendors to gain this market. However, most of enterprises still do not own a data quality tool, even though the scrub and profile of data quality tools have been available in the form of data processing technology for many years. The reasons are as follows: MEC 2006 22 Information Quality Balanced Scorecard • Information quality is not a standalone application. Information or data quality are not separate applications, but can be a combination of such as order entry, accounting and marketing trend analysis applications. It can also be a criterion for both transactional and business intelligence applications. • Organisations do not have a consistent approach to information quality. The centralized or even federated approach could prove the automation of the information quality process, which could improve low quality data by motivating the acquisition of additional technology. • Data quality tools have lacked integration. Data quality tools have still not yet integrated with the minimum essential functionalities such as data profiling, standardization, matching into a single code base and so on. According to the Data Warehousing Institute TDWIForrester Quarterly Technology Survey – Figure 7, the potential market of data quality software demonstrated that 61% of participants still do not have any tool. From this point of view, it means there is a huge potential market for the software providers and end users alike. Figure 7 Results of Survey (Source: Agosta 2004) 3.3.2 Data Quality in Ireland Under the guidance of Dr. Markus Helfert in DCU, our team carried out research into the information quality field and how it impacts organisations in Ireland. The surveys examined the influence and effect that information quality has on organisations today and looked to determine how best to control and measure it. According to our survey, we found that healthcare is the most popular sector to operate data quality in Irish organisations – Figure 8. MEC 2006 23 Information Quality Balanced Scorecard Figure 8 Questionnaire Results 1 We also found that 42% of Irish companies use data quality software tools to control and analyse data. The pie chart below shows the percentage of data quality software tools that have been used in Ireland. Informatica Data Quality software tool is the leader in the current Irish market. Figure 9 Questionnaire Results 2 Regardless of the cost, 86% of Irish companies would consider using an information quality control system to improve their data quality. However, there are three other major concerns for companies to use this kind of system: • • • Lack of resources Lack of proper plan for data quality and information control Impractical/extremely difficult to keep data fully up to date Overall, data quality is still immature in Ireland. Our data quality service has the potential to fulfil the Irish market. MEC 2006 24 Information Quality Balanced Scorecard 3.3.3 Target Market Data quality provides industry solutions to sectors, including energy and utilities, financial services, government and public sector, healthcare, life sciences, manufacturing, retail, telecommunications and transportation. Initially, the data quality evaluation service will target the retail industry in Ireland. From Figure 8 above, we know that there is no data quality tools that have been operated in the Irish retail industry. With the increasing competition from retailers, complex information, and fast changing market, retailers need to maximize their operational efficiency. Figure 10 Visibility into Data in Real Time (Source: INFORMATIC2 2005) The diagram above demonstrates the complex of global data integrated and synchronized data from disparate sources include external data from global trading partners. Data must be well managed in order to achieve the greatest return on investment (ROI) and positive business impact. Unfortunately, retailers often fail to make a unified view of business and operational data. Therefore, it is a huge potential data quality market to grow to achieve supply chain efficiency. 3.3.4 Market Forecast According to Agosta (2005), the overall information quality market is on course to pass the $1 billion mark in 2008, an important milestone that will validate its maturity and importance. We also expect the growth rate to expand 20%30% annually in this market. Some of the existing software vendors are trying to develop new viable market strategies, so they can take advantage and follow the recent technologies trends, which could open new markets in the field. New entrants will need the customer bases, also with their long term vision and substantial backing in order to compete. It is getting popular for existing vendors to partner with established data integration vendors. Meantime, data quality leaders will continue to develop new technologies and add new functionalities to emerge data integration and analysis technologies, which improve integration with other markets MEC 2006 25 Information Quality Balanced Scorecard as development of technologies and evolvement of market trends. From a business perspective, information accuracy is an essential enterprise asset, and the solutions of data quality ensure the value of that asset, also reduce the risk and identify the opportunities with quality information. 3.4 Porter’s Five Forces Figure 11 Porter’s Five Forces Rivalry among existing competitors • No direct competitors are available in the Irish market. Five main players dominate the data quality market global wide. Strong competition is expected. • A good reputation and strong brand are hard to compete against, making it more difficult to gain competitive advantage. Threat of new entrants • The barriers to enter the same market are reasonably high. It requires time and resources for them to copy the idea and enter the market. • The first mover advantage over the laggards in Irish market. Threat of new substitutes • Existing substitutes to our service (‘IQ Scorecard ’) include products or services used in data warehousing, management and quality control field in Retail, Healthcare, Banking and Insurance, etc. MEC 2006 26 Information Quality Balanced Scorecard • Potential customers need to be convinced of the simplicity, high quality and advantages associated with IQ Scorecard to reduce the threat of substitutes Bargain power of the supplier • The power of the supplier in our case is reasonably low. Bargain power of the buyers • The power of the buyers in our target is quite strong. • There is a substantial market for data quality products and services in the near future. On a large scale, we feel, some products or services are hindered by their cumbersome and complicated design. We will service needs not currently served in the marketplace. It is up to the buyers to determine whether they require those products or services and whether they would be beneficial to them. • There is a risk that buyers could develop similar inhouse products. Large international companies have more money to spend on research and development. 3.5 Competitors In this section we will discuss the current competitive environment surrounding the information quality market. We will also profile the main competitors in the data quality market and the main products offered by them. According to a survey from Forrester, the overall information quality market will pass the $1 Billion mark in 2008 (Beal 2006). This market includes data related software, professional services and data enhancement. Indeed the overall spending on data warehousing itself is also expected to rise (Rajan 2004). Before data can be used effectively in a data warehouse, in business analysis applications, in customer relationship management, or in enterprise resource planning, it needs to be analysed and cleansed. Poor data quality costs businesses vast amounts of money and can lead to breakdowns in the supply chain, poor business decisions and inferior customer relationship management. Defective data can also hamper efforts to meet regulatory compliance responsibilities and legal responsibilities. Many factors are driving the growth of the information quality market including increasing volumes of data, a lower tolerance towards data latency, the increasing complexity of data, security issues and increased regulatory requirements imposed by the government and industry. Companies have always been concerned with issues regarding poor data quality, for example, incorrect customer details, missing data in databases for reports etc, however as recent studies indicate, awareness of the importance of data quality and the financial fallout that can result from poor data quality is really beginning to hit home with business and IT management. Therefore, it is abundantly clear that there is a substantial market out there for data quality solutions. There are a number of companies that are considered to be the major players in this market, for example: Informatica, Trillium, First Logic, Dataflux and Acxiom. Many of these companies have in general adopted strategies of diversifying into many different areas of data warehousing, management and quality control as well as pursuing takeovers and buyouts of smaller companies. One example of this can be seen with the Irish company Similarity Systems, a Dublin based data quality tools provider which was acquired by the California based Informatica in a deal valued at $55 million in January MEC 2006 27 Information Quality Balanced Scorecard 2006 (Smith 2006). Informatica aquired Similarity Systems in a cashbased deal and then incorporated the Similarity Systems technology into their latest version of their data Power Center. Similarity was only founded in 2001 and so its quick success in the Data Quality market did not go unnoticed. Sohaib Abbasi the CEO of Informatica, has said that data quality is one of its customers’ top challenges within data integration projects (Smith 2006) and that the company is dedicated to providing the best possible solution to its customers. Figure 12 Survey Results (Source: TDWI Conference 2004) The above diagram shows the results of a survey taken in the third quarter of 2004, asking companies about their expected increases in data warehouse spending in the next quarter (TDWI Conference 2004). As can be seen the vast majority expected an increase of some kind. The company: Informatica primarily delivers data integration software and services to solve the problem of data fragmentation across disparate systems and with a view to helping organizations gain greater business value from their information assets. Informatica's software is platformneutral and in the words of the company “reduces costs, speed time to results, and scales to handle data integration projects of any size or complexity” (Informatica 2006). MEC 2006 28 Information Quality Balanced Scorecard Financial: In 2005, total revenues grew 22% to $267.4 million and the company generated net income of $33.8 million (Informatica 2005 Annual Report). Revenues are generated primarily from licenses sold for Data Integration software products and other services provided by Informatica. Future Strategies include: Expand from Data Warehousing to Broad Enterprise Data Integration. Focus on Horizontal Data Integration Solutions: Migration and Consolidation. Expand further research and development. Data Quality tool: Informatica Data Quality is a businessfocused data quality software tool that allows data owners to design, manage, deploy, and control enterprisewide data quality solutions. It provides a single solution for tackling data quality at multiple points across an organization, while maintaining centralised control and management of data quality standards. Informatica Data Quality can also link in with the companies other solutions these include data warehousing, data migration/consolidation, data synchronisation, data governance, master data management, and crossenterprise data integration. The Company: Trillium Software, a division of HarteHanks, provides solutions for data integration dealing with Customer Relationship Management (CRM), Enterprise Resource Planning (ERP), Business Performance Management (BPM), and Supply Chain Management (SCM) projects (HarteHanks Corporation 2006). Trilliums’ Software solutions aim to provide global information integrity regardless of how data is captured collected, stored, and manipulated within the enterprise, essentially turning raw, unreliable data into usable data. Trillium Software solutions support complete lifecycle of data management that includes data assessment and data profiling; data standardisation; information enrichment; data linking; and perpetual data quality monitoring. Financial: (Source: HarteHanks Annual Report 2005) MEC 2006 29 Information Quality Balanced Scorecard The above financial figures are not specific to Trillium software alone as Trillium is integrated into the wider HarteHanks group which also include marketing and support solutions. Data Quality Tool: The Trillium Total Data Quality software system provides an environment for turning raw, chaotic business and customer data into more reliable, usable, and valuable information across multiple applications. A data quality methodology built into the software helps to design the necessary data improvement processes around proven best practices: • Data Investigation • Data Standardisation • Information Enrichment • Data Linking The Company: FirstLogic was acquired in 2006 by Business Objects in a $69 million deal (Business Objects 2006). FirstLogic is a global provider of enterprise data quality solutions and services. For over 20 years, Firstlogic has provided technology solutions, services, education, and consulting to help businesses establish confidence in the data driving their businesses. Firstlogic offers enterprise data quality management solutions and support for the full breadth of data quality capabilities including initial data assessment and measurement, data cleansing, data enrichment, matching and consolidation, and continuous monitoring. Financial: In 2005 FirstLogic had revenues in excess of $50 million. FirstLogic has over 6,000 customers worldwide. Company Goal: In addition to developing commercial solutions, Firstlogic looks to partner with many systems integrators, consultants, and original equipment manufacturers to provide its unique technology to their enduser customers. Data Quality Tool: Information Quality Suite offers a comprehensive environment with centralised data quality services. It was first introduced in 2004 and provides a solution that expands to meet evolving data quality needs while minimising total cost of ownership. It offers many features specifically designed to streamline the data quality integration process, including its centralised approach to data quality and business rule management, an intuitive user interface, Unicode framework and relational database support. MEC 2006 30 Information Quality Balanced Scorecard The Company: DataFlux provides solutions to help companies to analyse, improve and control their data from a single data quality integration platform (DataFlux 2006). DataFlux can be used to build a comprehensive set of business rules that can help to create a unified view of enterprise data. DataFlux aims to ensure high quality levels of data reaching the final destination. Financial: DataFlux is part of the SAS group, which is a large company involved in sectors such as manufacturing and financial. SAS achieved overall revenue of US$1.34 billion in 2003, a 13.5 percent increase in worldwide revenue growth and the 27th consecutive year of profitability for the world’s largest privately held software company. Data Quality Tool: Dataflux Power Server & Integration Studio. DataFlux solutions operate on different platforms, including Windows, Unix and Linux. They allow the use of development languages such as C and Java to integrate DataFlux functionality into existing thirdparty applications. All DataFlux solutions rely on the same shared data management repository, providing a costeffective way to monitor, maintain and extend information quality initiatives. The Company: Acxiom creates and delivers customer and information management solutions by blending data, technology and services to provide the most advanced customer information infrastructure available (Acxiom Corporation 2006). Financial: • Revenue of $1.011 billion in 2004, up 5.5 percent from $958 million in 2003. • Operating cash flow of $259.9 million. • Free cash flow of $187.8 million. • New contracts that were expected to contribute an annual value of $107 million. • Contract renewals that are expected to generate $90 million in annual revenue. Strategy: Acxiom aims to provide customer and information management solutions for marketing, risk and IT help companies. Data Quality Tools: Opticx A customer data quality analysis tool. Opticx provides a statistical assessment of customer databases against 40plus measures of database quality. MEC 2006 31 Information Quality Balanced Scorecard CMAT Provides an assessment of a client's ability to manage customers. This tool helps establish performance strengths and weaknesses and recommend solutions in the form of an action plan. 3.6 Marketing Strategy 3.6.1 Overall Strategy We have decided to adopt a mix of Porter’s differentiation and focus strategies to market our service which will focus our service at a low cost level and on a broad scope. In order to compete with existing competitors, we will set our service price slightly lower to acquire the customer’s attention; then if sales start to pick up, we will raise the price to a benchmark similar to existing competitors to gain more profit and market share. By using a differentiation strategy, we will highlight the uniqueness of our service, which can be used by various companies in different industries. In the current stage, our service is only a prototype model. Then the final version of our service model can be personalised to meet customer demands (e.g. the customer can choose additional features to suit their special needs – the service for Health Care and the service for Insurance is completely different). By using our service to measure accuracy, completeness and consistency of data quality in their organization, the customer can detect incorrect data, reduce number of data errors, minimise risks associated with incorrect data, and improve their overall business performance. Also, we will try to continually develop new services based on our original innovation. Figure 13 Overall Strategy Source: http://www.marketingteacher.com/Lessons/lesson_generic_strategies.htm 3.6.2 Positioning Strategy We would like to position our service as the leader in data quality field in Ireland, and then expand the market to main Europe. The data quality and Balance ScoreCard have been implemented by the early adopter (Cigna Property & Casualty) in 1993. However, it MEC 2006 32 Information Quality Balanced Scorecard was not until the later 1990s and early 21st century that the development and the implementation of Balance Scorecard took off; the product and the service became available in USA. But, no services are available on the Irish market. By developing the data quality measurement service early on before any other competitors in Ireland, we hope to gain a first mover advantage and take as much of the market as possible. As our service becomes more mature and well known we hope to expand the market across Europe. This may be due to the fact that some of our clients are international organizations and may request to extend our service to the rest of their international market. 3.6.3 Pricing Strategy In order to determine our price, we have to carefully design our financial objectives (the expected profit) and analyse different aspects of costs (the sales and marketing costs, salaries and costs of future research and development) involved with the service. Our service has been implemented by using ADOscore toolkit. The existing IP licensing costs us about €18,000 per annum. The details of our pricing strategy can be seen in Marketing Mix section of this business plan. 3.6.4 Sales Strategy Nowadays, more and more organizations realise the importance of data quality. As a result of bad data quality, companies may spend millions trying to solve the problem, or even worse, they may end up becoming bankrupt. Therefore, our data quality offering will be ideal. Our biggest selling point is that we offer customised data quality service to the customer. By giving them the opportunity, different customers can specify their different needs and requirements of data quality according to their organizations. The initial market that we will develop our service for will be Republic of Ireland. As the sale of our service starts to pick up and becomes more popular, we will extend it to Northern Ireland in 2008 and then if it succeeds, we will extend it to Europe in 2009. The projected sales are: 2007 €160,000, 2008 €208,000, and 2009 €312,000. Further information about sales forecasts and financial issues can be examined in the Finance section. Projected Sales 350,000 300,000 250,000 200,000 Sales (€) 150,000 100,000 50,000 0 2007 2008 2009 Years Figure 14 Projected Sales MEC 2006 33 Information Quality Balanced Scorecard 3.7 Marketing Mix 3.7.1 Product We will be offering a data quality analysis service to our clients. The service provides a flexible analysis of companies’ data based on a set of data quality dimensions. We have three dimensions (accuracy, completeness and consistency) which consist of the basic service that allows the client to analyse their data. The uniqueness of our service is that the service can be customised to meet customers’ special needs and requirements. Also, the service was implemented by using ADOscore. By using the Balanced Scorecard data quality technique, the service takes the data to be analysed and uses the customised Balanced Scorecard to provide the company with feedback on the quality of the data. 3.7.2 Price We would expect the service to be valued with a market price of €500 € 5,000. Option 1 Price for small size business: €500 €999. Option 2 Price for medium and large size business: €1,000 €5,000. Option 3 – Price for enhanced service could stretch as far as €10,000. In order to set the price, there are a number of factors that need to be considered: • • • • • Competitor they have been in the business longer than us, have excellent reputations, and strong relationships with the customer. They are the dominator in the data quality field. Therefore, as a small start up business, we really need to emphasise the uniqueness of our service, and try to find the edge to enter the market. The price of the service must be reasonable, neither too high nor too low. Market capacity the data quality market is still growing and immature, according to a survey from Forrester, the overall information quality market will pass the $1 Billion mark in 2008(HarteHanks Corporation 2006), this indicates an opportunity for a competitively priced service. Size of the data set and different features we would expect that different versions of the service to be developed for different industries accounting for price difference. It will take much longer to provide data quality analysis service to companies with large data sets. We will transform companies’ special needs and requirements to additional features, which will enable us to charge more for the service. Future Research and Development cost in order to stay in the business and compete with our competitors, we will spend more on data quality assessment and data cleaning for the future development. Thus, the price of the service must take this into account. Market and sales cost our director of marketing and sales will launch campaigns in Ireland and Europe. MEC 2006 34 Information Quality Balanced Scorecard 3.7.3 Promotion The survey that we carried out for data quality indicates that companies are highly interested in the data quality service. But the promotion of the final service could still be a costly factor. Thus a licensing partnership with one of the major market players, i.e. Informatic, would have considerable advantages on the promotion front. A company like Informatic would have adequate funding to promote the service to the markets and have considerable promotional experience. Main Promotional methods for us could include advertisements at selected events and conferences, promotional email/newsletter to interested parties and Search Engine Submissions. Internet marketing and advertising are effective ways to develop recognised brand. Banner Ads is a common tool for advertising. They must be placed where potential customers browse. For instance, we can buy keyword advertising (i.e. ‘data quality analysis service’) on Google search engine, so that our banner ads will be displayed when someone performs a ‘data quality analysis service’ search. It can probably increase click through rates, because the user has already shown some interest in finding the sites on that particular subject. Main Conferences: “Information Quality Improvement” Seminar Seattle, Washington, USA 1011 August, 2006 2006 Information and Data Quality Conference San Francisco, California, USA 1619 October, 2006 Data Management & Information Quality Conference (7th Annual Conference) London, UK 30 October 2 November 2006 www.irmuk.co.uk/dm2006 ICIQ 11th International Conference on Information Quality MEC 2006 35 Information Quality Balanced Scorecard Massachusetts Institute of Technology Cambridgeshire, Massachusetts, USA 10 12 November, 2006 The TDWI Conference http://www.tdwi.org/education/conferences/sandiego2006/index.aspx Also, in order to promote the service, our marketing and sales person could deliver speeches at major events and conferences, or even arrange meetings with the companies which are interested in our service. 3.7.4 Place We will try to promote our service directly to the companies. Our sale person could contact to meet them either in their own place or else have a business lunch to discuss our service. She could also give demonstrations at major events and conferences, and show them how the service works. For the service launch, we will start in Ireland for the first year; if everything goes well, then we will extend it further to Northern Ireland from the second year; if we would reach our target, we will enter the European market in the third year. Hopefully, we will succeed. 3.7.5 People The People employed in our organization are highly important to the service. They will determine the quality of service our customers receive. We have skilled and highly motivated staff on board who will fully satisfy customer demands. We will offer a high level of presales and aftersales support and advice, e.g. technical support personnel are available by phone and email to answer queries about the service. Again, this can impact on the price we set, as customers are likely to pay more for the service they receive. In order to improve the quality of our service, we need to identify those staff who come into contact with customers, either facetoface or by phone or email. 1) Set standard for customer service by involving those staff. 2) Provide appropriate training to those staff based on their different needs. 3.7.6 Process The process involved in delivering our service will affect the way that our customers perceive us. There are several processes associated with our service. E.g. first contact MEC 2006 36 Information Quality Balanced Scorecard with customers, discuss special needs and system requirements with customers, transform those requirements into a set of data dimensions, analyse the data set by using the customised Balanced Scorecard, and finally provide feedback on the quality of the data to the customer. There are a number of suggestions for the management to improve the quality of our service: • Look at all the processes involved in providing the service to the customer. E.g. aftersales support. We can carry out email surveys after deliveries are made, ensuing that customer expectations continue to be met. • Recruit high quality staff, treat them well and communicate clearly to them: they are the key to service quality. • Set up a simple and quick response facility in order to deal with customer problems and complaints. • Employ new technology to provide better services at lower costs. 4. Technical Report 4.1 Software Overview ADOscore is an Object Oriented Performance Management and Balanced Scorecard Tool. The software we used to create the Information Quality Balanced Scorecard is ADOscore an Object Oriented Performance Management and Balanced Scorecard Tool (BOC IT consulting GmbH 2006). The diagram below shows an overview of the various components of the ADOscore system. MEC 2006 37 Information Quality Balanced Scorecard Figure 15 Overview of the ADOscore System Components (Source: ADOscore1) The toolkit is broken up into the Strategy component and the Cockpit component. In the strategy component the user can model the Balanced Scorecard using various graphical implementation options as well as provide the input data to be evaluated by the scorecard. The data to be evaluated by the scorecard can be taken from spreadsheet documents, relational databases such as Oracle, MS SQL, or also through the manual entering of data. This process is done during the set up of the scorecard. We will discuss the creation of the Information Quality scorecard in more detail later in the technology report. Figure 16 Overview of the Controlling Cockpit (Source: ADOscore2) MEC 2006 38 Information Quality Balanced Scorecard The other major component of the software is the controlling cockpit and reporting component as can be seen above. After the Balanced Scorecard is created and run the user has the option to view the controlling cockpit. This is a JavaScript based html web page that is generated once the controlling cockpit system is run. The cockpit gives a visual summary of the outcome of the scorecard analysis. It uses a traffic light system – i.e. generally: green = good performance, red = bad performance, yellow = in between, to represent the outcome visually. As well as this various reports can be generated based on the outcome of the scorecard and these can be exported in many formats for later analysis. ADOscore System Guidelines [ADOscore 2006]: Operating System: ADOscore runs on Windows NT, 2000, XP Database Connection: All ODBC data sources versions 3.x and up are supported. Controlling Cockpit: For optimal display of the information, the ADOscore Controlling Cockpit requires the Microsoft Internet Explorer 5.0 or higher. 4.2 System Architecture 4.2.1 Overview The system is essentially made up of two components – the actual scorecard software itself and the external data input component (i.e. the database or spreadsheet that contains the data to be analysed). This data input component is linked into the system through the two parts of the Balanced Scorecard system: the indicators pool model and also the scorecard map model. From here the connection to the relevant database can be specified (Oracle, MS SQL, MSDB, DB2 etc) or the details of the relevant spreadsheet data can be entered. For database connections the system supports user entered SQL statements for specifying indicator model values and for excel it supports excel formulae. The diagram below gives an overview of the system, from the various steps in the Balanced Scorecard (which will be discussed in more detail later) to the data connection to the controlling cockpit which provides a representation of the outcome of the scorecard analysis. MEC 2006 39 Information Quality Balanced Scorecard Success Factors Cause & Effect Model Strategy Overview Data Input Indicators Model Cockpit Indicators Pool Scorcard Map Figure 17 System Overview 4.2.2 Teradata University Network Database As part of the prototype Balanced Scorecard system created for this practicum we used data taken from a Teradata University Network (TUN) database. The TUN provides access to databases of information for university related research purposes. For this project we had access to a retail database containing information on WalMart stores and customers in the USA. Although the data was relating to the retail industry, this did not affect the creation of the IQ Balanced Scorecard for two reasons: (1) We were creating a general prototype relevant to data quality across a wide range of industries and (2) The Balanced Scorecard is highly customisable and can be tailored to meet companies’ specific needs as required. Access to this TUN database was provided online and the system allows SQL queries to be run on the database to return information. For this project we ran some queries and exported information to MS Excel spreadsheets which we used as input to the Scorecard. Sample Query: SELECT Street_Addr, City, State, ZIP_Code, Manager_Name, Phone_Nbr, Store_Code, Store_Name, Store_Nbr, Store_Type from store_information An example extract of the database can be seen in the figure below. Note that some of the information in the database has been altered purposely for this project in line with the research into database inaccuracy and completeness etc to be measured by the scorecard. MEC 2006 40 Information Quality Balanced Scorecard Figure 18 Extract of the Data from the TUN Database 4.3 Creating the Balanced Scorecard 4.3.1 Technical Background TDQM – Total Data Quality Management is a research project started in 1991 at MIT with the objective of applying the principles and practices of Total Quality Management to data management (Wikipedia source definition 2006). The project is sponsored by several corporations, research groups, US government departments, the US Department of Defense, and the US Navy and it has grown from ever increasing industry needs for high quality data. The three major components of TDQM are: data quality definition, analysis, and improvement (MIT TDQM Program 2006). The definition component focuses on defining and measuring data quality. The analysis component identifies and calculates the impacts of poor quality data, and the benefits of high quality data, on an organisation's effectiveness. Finally, the improvement component involves redesigning business practices and implementing new technologies in order to significantly improve the quality of corporate information. For the purpose of this project we looked at how part of the TDQM process could be applied to the creation of an Information Quality Balanced Scorecard and then how the MEC 2006 41 Information Quality Balanced Scorecard application of the rest of the TDQM process could be effective in the overall data quality service provision. The first and most important step in setting up an IQ Balanced Scorecard is the definition of the data quality dimensions under which the data will be assessed. Under the TDQM research framework the first component, data quality definition, deals with the identification of the key dimensions of data quality and methods of measuring each dimension for the data. Under this step we identified what we believed to be the key dimension relating to each of the main data quality dimensional headings and also the method of measuring the dimension. 1. Intrinsic Dimension: Accuracy. 2. Contextual Dimension: Completeness. 3. Representational Dimension: Consistency • Accuracy. Although it is difficult to tie down a specific definition of accuracy in terms of data quality, a generally accepted definition of accuracy is the proximity that some data value x has to some other value x’ that is considered correct (Capiello 2005). In terms of what the Balanced Scorecard would evaluate, the inaccuracy of some data set would be defined as the total number of incorrect data divided by the total number of data in the set and this figure is then multiplied by 100 to give a percentage figure of the total level of inaccuracy in the data set. The Balanced Scorecard takes in these figures, performs the calculation and returns the inaccuracy percentage along with a score level as defined by user parameters. These parameters defined in the setting up of the scorecard determine in the report if the level of inaccuracy is acceptible or not. Accuracy = 1 – Number of Incorrect Data Total Amount of Data • Completeness. Completeness is simply defined as the degree of presence of data in a particular data set, i.e. if there is any missing data values or not. Our Balanced Scorecard system measures this contextual dimension by taking in values representing both the total number of missing values in the data set and the total numebr of values that there should be overall in the set. As per the measure formula above the Balanced Scorecard makes a measure as to the incompleteness of the the data set by dividing the number of missing values by the total number of values and multiplying this figure by 100 to get an inaccuracy percentage. Again, as above with accuracy, this percentage is returned along with a score determining if this level is acceptible or not. Completeness = 1 – Number of Missing Values Total Number of Values MEC 2006 42 Information Quality Balanced Scorecard • Consisitency. Consistency in this context was defined as consistency of both the format of the data and the the actual entities themselves, i.e. consistent date format. The measure of inconsistency within a data set is defined in the Balanced Scorecard as the number of inconsistent values divided by the total number of values and then multiplied by 100 to get the percentage level of inconsistency. Again similar to the other two dimensions above a score for insconsistency is also displayed along with the measure determining an acceptable level or not. Consistency = 1 – Number of Inconsistent Values Total Number of Values The other two main parts of the TDQM framework – the analysis component and the improvement component are relevant for the analysis part of our service. For example our service, based on the feedback from the Balanced Scorecard, would help the organisation to identify and calculate the impact of poor quality data and the benefits of high quality data to them. By using the TDQM improvement component, we could help the organisation to redesign their business practices and provide advice on techniques in order to help them significantly improve the quality of their data. 4.4 IQ Balanced Scorecard In this section I will document the setting up of the information quality Balanced Scorecard using the ADOscore software. As mentioned above, ADOscore is an Object Oriented Performance Management and Balanced Scorecard Tool that supports data taken in from Oracle, MS SQL Server, MSDE & DB2 as well as from MS Excel spreadsheets and manual data input. ADOscore allows the user to plan and implement a Balanced Scorecard through the various steps, from the initial Strategy model through to the implementation phase. The scores returned from the scorecard are based on the data taken on to the application and used in the formulas defined for the calculations. I will briefly describe each step that we used relevant to creating an Information Quality Balanced Scorecard. 4.4.1 The Strategy Model In this step the strategic variables are defined. This is done simply by selecting the elements from the side of the screen. In the case of an IQ Balanced Scorecard these represent the dimensions under which the data will be assessed. As discussed previously, the dimensions we defined were Accuracy, Completeness and Consistency. MEC 2006 43 Information Quality Balanced Scorecard Figure 19 Strategy Model for the IQ Balanced Scorecard 4.4.2 Success Factors Model In this step the derived goals are defined as well as the success factors relating to these goals. For the information quality scorecard, these goals are derived from the main dimensions as defined in the first step. We defined goals of Ensuring Accuracy, Measuring Completeness and Consistency as well as the success factors of reducing inaccuracy, inconsistency and incompleteness. Again, as with the first step, these factors can be modelled easily using the new model functionality, only this time selecting success factors model. Figure 20 Success Factors Model MEC 2006 44 Information Quality Balanced Scorecard 4.4.3 Cause & Effect Model In this step the strategic goals from the previous step are modelled together with their relevant performance indicators. So here we set up strategic goals for Ensuring accuracy, Measurement of completeness and Measurement of consistency. The performance indicators for these were also set up – inaccuracy measure, incompleteness measure and inconsistency measure. Within each of the measures score thresholds, periodicity and corresponding success factor amongst other variables have to be defined (figure: setting the variables within the performance measure). Figure 21 Cause and Effect Model MEC 2006 45 Information Quality Balanced Scorecard Figure 22 Setting the Variables within the Performance Measure 4.4.4 Indicator Model The Strategic goals are quantified by defining performance measures in this section. The indicator models created here illustrate how the performance measures above are calculated. So basically for each performance measure, an indicator model needs to be defined. It is important to work out the formula to be used first so that it can then be efficiently modelled in this step. So for example, with our IQ scorecard if we want to tell the level of inaccuracy of a set of data we defined the inaccuracy model based on the formula: Number of inaccurate data divided by total amount of data multiplied by 100 to get an inaccuracy percentage. This is similar for all the performance indicators that have been defined previously they are all based on the formulas that are used to measure the indicator. As can be seen below, each model is generally made up of a number of referenced indicators and a composed indicator, as well as the numerical operations specific to the formula. MEC 2006 46 Information Quality Balanced Scorecard Figure 23 Indicator Model 4.4.5 Indicator Pool The indicator pool defines all the elementary indicators used in the previous indicator model. For example, in the above diagram, there would be indicator pool models for total data and incorrect # data (i.e. all the elementary indicators used for calculations in the formula). In the indicator pool models, the data inputs are also defined. For each model in the pool, the source of the data that is used in the scorecard is defined. Data can be from database systems such as Oracle, MS SQL etc, or from spreadsheets or even manual entry. As the Balanced Scorecard deals with numerical values to perform the various calculations it should be noted that if using SQL or formulae the outcome should give a numerical value that can be used in the scorecard system. The screenshot below shows how the data is being set up to be taken from an excel file. MEC 2006 47 Information Quality Balanced Scorecard Figure 24 Indicators Pool Figure 25 The Indicators Pool MEC 2006 48 Information Quality Balanced Scorecard 4.4.6 The Balanced Scorecard Map After performing all the above steps, each of these models will have been defined for the scorecard. Each of these models interlink with the others to provide the final score, therefore after the initial creation of the models it is important to go back through the setup of each model to ensure the correct references are made. For example in the indicator model you need to define which indicator pool the data is coming from, in the cause and effect model you need to define the relevant success model and composed indicator. After completing this process the IQ Balanced Scorecard can be run by creating one more model – the Balanced Scorecard map. Within the scorecard map the cause and effect model is referenced and then the Balanced Scorecard can be run from the calculation menu at the top of the screen. In the cause & effect model figure above the outcome of a successful run of the Balanced Scorecard can be seen. The indicators have returned percentages and traffic light values (red, yellow or green) based upon the thresholds defined in step three. These thresholds are values under which the calculation should show as successful – green, unsuccessful – red or in between both – yellow (warning). These values are entirely customisable depending on what level of strictness the user wants to apply to the dimensions. I.e. what level of inaccuracy is acceptable in the data? In the above example the acceptable level of inaccuracy was set at 10% or less. An unacceptable level was set at 15% and above. As the value 12% falls between the two it shows up as a warning yellow light. 4.4.7 Reporting After the Balanced Scorecard is run and the scores have been generated, a number of reporting options are available. The controlling cockpit generates an interactive java based html page that displays the outcome of the Balanced Scorecard visually using the above mentioned traffic light system. Drilldown options are available for further analysis. As well as this various reports can be generated and there are options to export this information in various formats such as html and .rtf. 4.5 The Guideline of ADOscore Controlling Cockpit The controlling cockpit, when run from the ADOscore application, creates a series of dynamic html pages using JavaScript and Active X controls. The cockpit contains traffic lights for each of the dimensions defined in the scorecard and an option to drilldown on the traffic light to see further information. The figures below show various stages of the Controlling Cockpit which we ran based on the prototype Information Quality Scorecard we created. MEC 2006 49 Information Quality Balanced Scorecard Figure 26 Overview of the Controlling Cockpit Once the Controlling Cockpit option is selected, the application generates the visual system, as seen above, taking the form of interactive html pages. Figure 27 Drilldown on the Consistency Dimension MEC 2006 50 Information Quality Balanced Scorecard Users have the option to drilldown on dimensions and measures to see a more detailed breakdown of figures. Figure 28 Drilldown on the Accuracy Dimension MEC 2006 51 Information Quality Balanced Scorecard Figure 29 Drilldown on the Completeness Dimension 4.6 Technology Evaluation • Scalability. Scalability depends largely on the constraints of the processing power of the system on which it is being run. There are no major issues with this as our systems are top range with adequate processing power. • Reliability. Tests have indicated the system is reliable. It interfaces with various database systems and spreadsheets. We have created a variety of dimensions for the Balanced Scorecard that can be reused if necessary, as well as the option to tailor dimensions for the customer based on their requirements • Availability. The Balanced Scorecard software itself is expensive, around the €10,000 price mark for the commercial license. Due to price constraints it is unlikely that the software is widely used by small and medium sized businesses. • Security. There are no issues with system security as it is not “live” on the web or an Internet based system. 4.6.1 Data Cleaning As part of our research into the overall Data Quality Process we also investigated the Data Cleaning Process as well as some data cleaning tools. Data cleaning essentially goes hand in hand with the overall data quality process and is relevant to the Information Quality Balanced Scorecard. Within the scope of providing a wide scale data quality service, the MEC 2006 52 Information Quality Balanced Scorecard provision of a data cleaning service could logically be the next step after providing the Balanced Scorecard analysis, i.e. after measuring if the data is accurate or not then provide a method of cleaning bad data. This data cleaning service would deal primarily with databases of names, addresses and financial data. The cleaning process would deal with clearing duplicate entries, missing entries, inaccurate data etc. The three major steps towards ensuring a high level of data quality in an organisation are: Error Prevention, Error Detection and Error cleaning (Chapman 2005). Of course, if at all possible, the initial prevention of data errors is by far the most superior method of having high data quality, it is cheaper and more efficient to prevent errors than to try and find them and correct them at some later stage. However regardless of how efficient the process of data entry is, errors will still occur and therefore data checking and correction cannot be ignored. Redman (1996) (as quoted in Chapman 2005) has suggested that an error rate of 15% with data entry should be expected. Generally the data cleaning process is broken down into six steps: cleaning into six steps (Kimball 1995): elementising, standardising, verifying, matching, householding, and documenting. • Elementising. This deals with the parsing of the data. E.g. with an address ensuring it is broken down in to different fields. Line 1, line 2, line 3 of the address etc. • Standardising. This involves standardising the elements of the data input, i.e. putting them in a standard form. E.g. St. > street. • Verifying. This step involves the verification of the standardised elements above. Any possible errors are flagged. • Matching. This involves attempting to match the standard records with other available records to check that they are the same. • Householding. This step involves matching multiple elements that may constitute part of a larger element and flagging this. • Documenting. In this step the results of the previous five steps are documented. The purpose of this is to improve and assist future cleaning efforts. 4.6.2 Data Cleaning Software As part of our research into the data cleaning area we researched some relevant tools namely: AlphaMiner and SQL Data Compare. AlphaMiner is opensource software developed by the EBusiness Technology Institute (ETI) in the University of Hong Kong. It provides an open source data mining and data cleaning platform. AlphaMiner contains (from Ebusiness TI University of Hong Kong 2005): Plugable component architecture providing extensibility for adding new Business Intelligence capabilities in data import and export, data transformations, modeling algorithms, model assessment and deployment. MEC 2006 53 Information Quality Balanced Scorecard Versatile data mining functions offering powerful analytics to conduct industry specific analysis including customer profiling and clustering, product association analysis, classification and prediction. Figure 30 AlphaMiner Screenshot – Replacing Missing Values Alpha Miner allows data to be modelled and facilitates a cleaning process such as in the above screenshot where missing data can be assigned a replacement value. SQL Data Compare – Retail Price $295. SQL data compare works by comparing any two SQL databases and highlighting differences between the databases. It also facilitates the synchronisation of databases. In conjunction with the Balanced Scorecard tool this could be used to pinpoint differences between sets of data and also be used to synchronise two data sets. MEC 2006 54 Information Quality Balanced Scorecard Figure 31 SQL Data Compare Screenshot Features (Redgate Software Ltd 2005): Comparison and synchronization of Microsoft SQL Server databases, eliminating hours of tedious, manual work. Comparison and synchronization of large databases – works on databases with thousands of tables and gigabytes of data. 4.6.3 Other Technology As part of creating a general awareness of our business our company will have a website set up on a “.ie” domain. This will allow customers to familiarise themselves with our business and the service we provide, as well as providing a facility in which customers can contact us and also an advertisement of our business. Customers would also have the possibility to complete an online scorecard creation request where they could submit online certain data to be analysed and specify the dimensions they wish to have the data analysed under. They could then pay securely online for the service which we would then carry out for the company, providing of course that their submission met all of the requirements for creating a Balanced Scorecard. The website could be developed inhouse using PHP/MySql/Apache opensource software. The technical team has experience in setting up similar ecommerce websites so this would not be a major difficulty. MEC 2006 55 Information Quality Balanced Scorecard 5 Management and Labour Requirements 5.1 Structure Currently the four members working on this project will supply the basic labour requirements for the development of the service. As we near closer the time of commercialisation appropriate measures will be taken and additional personnel will be recruited as deemed necessary. We hope the following management structure will be in place by the end of the third year. All employees will be highly experienced in their respective domains. Figure 32 Management Structure As our initial funds are somewhat limited, some of our existing team members will take on the aforementioned roles in our company. As we offer a unique solution to organisations our developer/ chief technical officer will be one of our founder members. He/ She will work for the first three years on the development of the service and as agreed will take a pay cut if potential financial problems were to arise. Also in the short term our Chief Executive Officer and Chief Financial Officer will be a member of the current team. However we will recruit an experienced CEO as soon as possible. These assignments should significantly reduce the cost of personnel expenditure. We will recruit a sales representative as soon as possible, our team is not experienced enough to undertake such a role. They would be responsible for making contact with potential clients, promoting and indeed selling our service to them. However by gaining knowledge of the issues involved and the factors to consider in selling our service, we hope to be in a strong position after the initial three years to take on the role ourselves. Chief Executive Officer MEC 2006 56 Information Quality Balanced Scorecard Initially one of the current developers of the Balanced Scorecard Tool will become the CEO, he/ she will be responsible for the startup of the business and the success or failure of the company. They should: • • • • • • Set the strategy and vision for the company. Market right position for the service. Build strong organisational culture. Examine business processes and outcomes, then take action, react effectively and efficiently. Must be visionary, a problem solver and a communicator. Implement and maintain the corporation's objectives through unexpected as well as foreseen threats and opportunities. Chief Financial Officer The Chief Financial Officer position requires exceptional leadership qualities including strong communication skills and attention to planning, process and detail. In addition to the CFO’s basic responsibility of developing annual financial plans, other roles can be easily and efficiently assigned, such as: • • • • • • Projections and forecasts. Assistance in tax efficient structuring of a sale or acquisition. Budgeting. Assistance in lease versus buy decisions. Selecting tax entities for new ventures. Technology needs analysis. Sales Representative The sales representative is responsible for making contact with potential clients and promoting our service to them. Basically the challenge for this position requires the sales representative to create the company’s corporate awareness and build a brand name strong enough to hold its own in the information quality domain. There list of tasks include: • • • • • Making contact with journals and media companies to promote our service. Integrating online and offline campaigns. Introducing our data quality service to companies in different industries. Hiring legal adviser to deal with patents and licences issues. Monitoring sales and trying to increase sales by adopting advisable marketing campaigns and promotions. Developer The Service Developer plays a significant role in strategic business decisions; they act as the technology specialist and also can recognise profitable applications to our data quality service. They are responsible for further research and development of our prototype MEC 2006 57 Information Quality Balanced Scorecard service and are required to have a certain level of technical experience. The main responsibilities include: • • • • Monitoring the data quality service and the entire process. Providing technical assessments to the service. Examining the service in different industries. Developing service prototype 5.2 Salaries CEO Year 1 Year 2 Year 3 €40,000 €40,000 €40,000 €38,000 €38,000 CFO DEVELOPER SALES REP. €30,000 + COMMISSION €30,000 + COMMISSION €30,000 + COMMISSION Table 1 Annual Salaries MEC 2006 58 Information Quality Balanced Scorecard 6 Finance Cash flow Analysis: 2006 – 2009 (3 years, commencing September 2006) The cash flow projections for the Balanced Scorecard service for the next three years are illustrated in this section of our business plan. These costs are estimates and can be deliberated upon up until the preliminary commercialisation of the Balanced Scorecard service (as our project is still in the very early stages of development the following figures are subject to change). 6.1 Income The commercialisation of our service will depend on initial funding from Enterprise Ireland. We hope to secure a figure of around €150,000, providing concrete evidence of a secure investment. Our second source of financial income will come from our estimated sales for the first three years. (1) Enterprise Ireland Funding: €150,000 (2) Sales for period September 2006 – August 2009 Year 1: Small to Medium Sized Businesses: 10 purchases at €1,000 each Large Corporations: 30 purchases at €5,000 each Total Sales: €160,000 Year 2: Small to Medium Sized Businesses: 18 purchases at €1,000 each Large Corporations: 38 purchases at €5,000 each Total Sales: €208,000 Year 3: Small to Medium Sized Businesses: 21 purchases at €2,000 each MEC 2006 59 Information Quality Balanced Scorecard Large Corporations: 27 purchases at €10,000 each Total Sales: €312,000 Table 2 Sales Figures for First Three Years The increase in the price of our service is due to the fact that more functionality and capabilities will be added to our service by the end of the second year of operations. As a result sales should increase. However the more corporations we service might require us to have a larger work force. This factor will be examined and monitored closely. 6.2 Expenditure Salaries: The Salaries are €78,000, €108,000 and €108,000 for Year1, Year 2 and Year 3 respectively. Total for the three years is €294,000. ADOscore License: €18,000 per annum. Broadband: €400 per annum. Insurance: €900 per annum. This insurance will cover business liability, property damage and most other types of insurance. Telephone: €1,304.96 per annum. Office Equipment: 12,100. One server: €3,500. 4 Personal Computers: €1,000 each, Total €4,000. 1 Laptop Computer: €2,000. Printer: €100. Scanner: €150. Photocopier: €300. Fax Machine: €200. Furniture: €1,350. Rent of Building: €21,000 per annum. Audit/ Accounting: €4,000 per annum. This includes bookkeeping and other accounting service. Travel and Subsistence: €5,500 per annum. Legal Fees: €3,200 in second year. MEC 2006 60 Information Quality Balanced Scorecard Payroll Taxes: They are estimated at 10.75% of annual wages. Website: Development and maintenance estimated at €500 per annum. Research and Development: €14,400 per annum. Sales and Marketing: €15,600 per annum. MEC 2006 61 Information Quality Balanced Scorecard MEC 2006 62 Information Quality Balanced Scorecard MEC 2006 63 Information Quality Balanced Scorecard MEC 2006 64 Information Quality Balanced Scorecard MEC 2006 65 Information Quality Balanced Scorecard 6.3 Future IQ Market Initially the questionnaire we developed was intended for seventy recipients, all of whom have strong experience in the information quality area. The feedback we acquired was in the region of twenty to thirty. The vast majority deemed that data quality was highly important and that accuracy, completeness and indeed consistency were all highly important aspects of data quality. From this we concluded that a strong market size is available for our unique service. As already stated our target markets include financial industries, retail industries, the health sector, large corporations and any organisations who store quantities of data. We strongly believe there are enormous opportunities for our service. According to Doug Laney, vice president and director with Meta Group’s technology research services, data quality is highly important and should be integrated with a company’s IT portfolio. Enterprises are starting to understand that data is an asset like their financial or material assets, and that data needs to be managed and given the same sort of treatment as the traditional assets (Picarille 2003). A recent survey conducted by Meta Group concluded that the size of the information quality market was somewhere between $500 million and €1 billion while the data quality adoption rates for organisations are expected to expand approximately 25% during the next three or four years. The information quality market is expected to exceed the €1 billion mark by 2008, an important milestone that will validate its importance and maturity (Agosta 2005). This indeed is conclusive evidence that data quality management has huge potential presently and will do so in the years ahead. Businesses must realise that whilst having data is fine enough, have correct and meaningful data is key to doing something worthwhile with the respective information. Completeness of data is also another potential market of enterprise. If data coming in to an organisation is useless then the vast likelihood is that data going out is also useless. If data contains gaps or indeed is wrong then this is of no use at all. Typically data quality has comprised of three categories: name, address, identification and matching; data standardisation; and data cleansing. However new categories are on the horizon which include data enrichment from data sources and data profiling (basically analysing where holes occur in data). As previously stated we fully intend to optimise our service not only to provide data quality management through the Balanced Scorecard tool but also to provide other functionalities based on the aforementioned data quality categories. A financially vast market lies here. The main emphasis for us would be to become a strong market leader in this domain as organisations are more inclined to interact with well known and well established data quality brand names. We must develop our strategy side by side with our service functionality. Vendors attempting to break into the market sometimes struggle as their product is functionally inferior to others and is sometimes platform specific. Having a data quality service which can offer a broad range of capabilities, is platform independent, can handle large corporations and doesn’t underestimate the cost of their MEC 2006 66 Information Quality Balanced Scorecard respective service is fundamental to prevailing in the data quality sector. Laney concludes: Unlike other technology genres, it's not just new functionality that wins the day. It takes years and years to generate and expose a complex set of rules. This technology is not revolutionary, but evolutionary, and that is why it is not easy for upstarts to come in and gain ground (Picarille 2003). MEC 2006 67 Information Quality Balanced Scorecard Appendix Appendix A. Information Quality Survey This survey was carried out as part of our M.Sc. Ecommerce Practicum in Dublin City University. For this practicum, under the guidance of Dr. Markus Helfert, we are carrying out research into the area of Information Quality and how it impacts organisations. The purpose of this survey was to examine the influence and effect that information quality has on organisations today and to look to determine how best to control and measure information quality. The survey contained questions relating to the general area of Information Quality, the importance of Information Quality to companies today and problems with information quality that impact companies. Analysis: This survey ran for 10 days from the 7th to the 16th July 2006. We received 21 responses to this survey. From the results it can be seen that there was an almost even split between the roles of the respondents, between those that classified themselves as working in primarily business based roles and those who described themselves as primarily working in IT based roles. The sectors in which the respondents’ organisations operated were also widely distributed, with the healthcare/medical industry having the highest percentage at 19%. Unsurprisingly the majority of people think that information quality control is MEC 2006 68 Information Quality Balanced Scorecard important to their organisation. 61.9% classified it as critical. 42.9% of the respondents use some kind of data quality control/analysis software. Of those that use software of some kind, Informatica was the most used with 33% of respondents using Informaticas’ Data Quality package. In question 6 we asked people to rate each of the data quality dimensions. The top three were Accuracy, Completeness and Relevance. The vast majority of respondents stated that they had encountered some problems as a result of poor data quality – 73.7%. A detailed breakdown of the problems and any given solutions can be read below in the answers to questions 7 & 8. Another important issue for us as part of this survey was determining what people felt were the main obstacles to maintaining a high level of data quality in organisations today. From question 9 it can be seen that the top two obstacles people felt were – Lack of resources and Lack of a proper data quality plan in the organisation. 85.7% of people said they would use an Information Quality tool to help them increase their data quality. A detailed breakdown of the questions can be seen below. INFORMATION QUALITY SURVEY You are invited to participate in our survey on Information Quality. We are a research group conducting a survey as part of an Ecommerce Practicum in Dublin City University. The purpose of this survey is to examine the influence and effect that information quality has on organisations today and to determine how best to control and measure information quality. The survey asks questions about Information Quality in general and the importance of Information Quality to your company. It will take less than 10 minutes to complete the questionnaire. Your participation in this study is completely voluntary, you are free to withdraw from it at any point. Your survey response will be kept strictly confidential. If you have any questions please feel free to contact us by email. ronan.flannery3@mail.dcu.ie Thank you very much for your time and support. Results: Question 1: How would you primarily describe your role in the organisation? Responses received: 21 Answers: Business based: 47.6% IT Based: 52.4% MEC 2006 69 Information Quality Balanced Scorecard Question 2: In which sector does your company operate? Responses received: 21 Answers: Healthcare/Medical: 19% IT: 14.3% Financial/Banking: 14.3% Communications: 9.5% Insurance: 4.8% Other: 38.1% (This included education, media, Government, Marketing, professional services) Question 3: How would you rate the importance of information quality control (i.e. ensuring accurate, consistent and complete data) to your company/organisation? Responses received: 21 Answers: Critical: 61.9% Very Important: 28.6% Important: 9.5% ACritical, BVery Important, CImportant, DNot that important Question 4: Do you currently use any software for controlling and analysing the quality of the data used by your organisation? Responses received: 21 Answers: Yes: 42.9% No: 57.1% Question 5: If yes to the above, please select any of the software below that you use. Responses received: 12 Answers: Informatica Data Quality: 33.3% Trillium Total Data Quality: 25% FirstLogic IQS: 8.3% DataFlux: 8.3% MEC 2006 70 Information Quality Balanced Scorecard Other: 25% (Includes: Commix and Proprietry) Question 6: On a scale of 1 to 5, please rate each of these data quality dimensions in terms of importance. ( 5 being the very important 1 being the not too important) Responses received: 20 Answers: Accuracy: 4.55 91% (Avg. scores) Relevance: 4.15 – 83% Completeness: 4.20 – 84% Accessibility: 4 – 80% Timeliness: 3.90 – 78% Consistency: 3.89 – 77.8% Interoperability: 3.45 69% Question 7: Have you encountered instances where poor or inaccurate information has had an adverse effect on your organisation? ( For example missing data for reports or invoices etc). Responses received: 19 Answers: Yes: 73.7% No: 26.3% Question 8: If yes to the above, please briefly state the problem and any solution used to overcome the problem. Answers: Problems included: Multiple instances involving incomplete or "non standard data entry" of customer data leading to incorrect ordering and billing. Solution is largely to clean up locally however we are increasingly using Data Quality Tools to target priority areas. duplicate data in accounting system; delay in audit, delay in accounts collections Inaccurate recording of sales data. This was addressed through process revision. Problem: Sarbanes Oxley compliance; Solution; TIQM based projects to build a comprehensive measurement and management function, coupled with a Process improvement and data re engineering capability to fix legacy issues and continuously improve processes. Inability to launch a new service to business partners, because the data recorded in a reporting data repository was not an accurate reflection of the transaction processing system from which it was derived. Poor data on patients can mean that improvements are not taking place. It is hard to manage what you do not measure. The data system I use records information on patients and their procedures and drugs. There is evidence that certain medical procedures and drugs are beneficial and the system I manage records if this is happening and actions take place if it is not. Ambulance being sent to wrong address! Missing data for reports, had to call customers to confirm missing data MEC 2006 71 Information Quality Balanced Scorecard Question 9: What, in your opinion, are the main obstacles to maintaining a high level of data quality in an organisation? (select any as appropriate) Responses received: 47 (note this was a multiselect question) Answers: Lack of resources: 21.3% Lack of proper plan for data quality and information control: 21.3% Impractical/extremely difficult to keep data fully up to date: 19.1% Financial Cost: 14.9% Lack of personnel training: 8.5% Company Size: 4.3% Other: 10.6% (includes Political issues, Lack of support from management, No assistance, focus on new development rather than maintenance) AFinancial Cost, BImpractical/Extremely difficult to keep up to date, CLack of Resources, DLack of Proper DQ plan, ELack of training, FCompany Size, GOther Question 10: Regardless of the cost, would you consider using an Information Quality control system to improve your data quality? Responses received: 21 Answers: Yes: 85.7% No: 14.3% Relevant Comments left: IQ systems tend in themselves to be expensive to purchase and to implement. This makes them less attractive to a smaller organisation. MEC 2006 72 Information Quality Balanced Scorecard Appendix B. Balance Scorecard Steps MEC 2006 73 Information Quality Balanced Scorecard MEC 2006 74 Information Quality Balanced Scorecard MEC 2006 75 Information Quality Balanced Scorecard MEC 2006 76 Information Quality Balanced Scorecard MEC 2006 77 Information Quality Balanced Scorecard MEC 2006 78 Information Quality Balanced Scorecard MEC 2006 79 Information Quality Balanced Scorecard References 1. Acxiom Corporation. 2006. Company Information – What We Do. [Online]. Available from: http://www.acxiom.com/ [Accessed 16th July 2006] 2. ADOscore1. [Online]. Available from: http://www.boceu.com/bochp.jsp;jsessionid=5DF5A64022E67E09DA4263B96A D13BF1?file=WP_582571cc1ed802de.46e381.f59775478f.7f2c [Accessed 20th July 2006] 3. ADOscore2. [Online]. Available from: http://www.boc eu.com/bochp.jsp?file=WP_582571cc1ed802de.46e381.f59775478f.7f27 [Accessed 20th July 2006] 4. Agosta, L. 2004. The Data Strategy Advisor: The Information Quality Tools Glass Appear Half Empty. [Online]. Available from: http://www.researchandmarkets.com/reports/337831/ [Accessed 10th July 2006] 5. Arveson, P. 1998. What is the Balanced Scorecard? [Online]. Available from: http://www.balancedscorecard.org/basics/bsc1.html [Accessed 3th July 2006] 6. Beal, B. 2006. Information Quality Market to Reach $1 billion. [Online]. Available from: http://searchcrm.techtarget.com/originalContent/0,289142,sid11_gci1076430,00.ht ml [Accessed 8th July 2006] 7. Bellinger, G. 2004. Data, Information, Knowledge, and Wisdom. [Online]. Available from: http://www.systemsthinking.org/dikw/dikw.htm [Accessed 7th July 2006] 8. BOC IT consulting GmbH. 2006. ADOscore – The Balanced Scorecard Toolkit. [Online]. Available from: http://www.boceu.com/ [Accessed 17th July 2006] 9. Business Objects. 2006. Data Quality Functionality. [Online]. Available from: http://www.firstlogic.com/home.asp [Accessed 16th July 2006] 10. Capiello, C. 2005. “Data Quality and Multichannel Services, PhD Thesis” MEC 2006 80 Information Quality Balanced Scorecard 11. Chandras, R. 2004. ReEnter the Leader? [Online]. Available from:http://www.intelligententerprise.com/print_article.jhtml?articleID=22102279 [Accessed 7th July 2006] 12. Chapman, Arthur D. 2005. “Principles and Methods of Data Cleaning”. [Online]. Available from: http://www.gbif.org/prog/digit/data_quality [Accessed 18th July 2006] 13. DataFlux Corporation. 2006. About Us. [Online]. Available from: http://www.dataflux.com/ [Accessed 16th July 2006] 14. Ebusiness Technology Institute, University of Hong Kong. 2005. AlphaMiner Technologies. [Online]. Available from: http://www.eti.hku.hk/alphaminer/ [Accessed 18th July 2006] 15. HarteHanks Corporation. 2006. Trillium Software Solution. [Online]. Available from: http://www.trilliumsoftware.com/ [Accessed 16th July 2006] 16. HarteHanks Corporation. 2005. Annual Report 2005. [Online]. Available from: http://www.hartehanks.com/Interior.aspx?CategoryID=202 [Accessed 16th July 2006] 17. INFORMATICA1. 2005. Addressing Data Quality at the Enterprise Level. White Paper. [Online]. Available from: http://www.informatica.com/info/addressingdqwp/ [Accessed 7th July 2006] 18. INFORMATICA2. 2005 Informatica in Retail. [Online]. Available from: http://www.informatica.com/solutions/industry/retail/infa_retail_6621_lo.pdf [Accessed 20th July 2006] 19. Informatica Corporation. 2006. Company Overview. [Online]. Available from: http://www.informatica.com/ [Accessed 16th July 2006] 20. Informatica Corporation. 2005. Annual Report 2005. [Online]. Available from: http://www.informatica.com/company/investors/sec_filings/default.htm [Accessed 7th July 2006] 21. Kimball, R.1996. “Dealing with Dirty Data” [Online]. Available from: http://www.dbmsmag.com/9609d14.html [Accessed 17th July 2006] 22. Loshin, D. 2001. The Cost of Poor Data Quality. [Online]. Available from: http://www.dmreview.com/article_sub.cfm?articleId=3605 MEC 2006 81 Information Quality Balanced Scorecard [Accessed 10th July 2006] 23. Larry, P. 1999. Improving Data Warehousing and Business Information Quality. New York: Wiley. 24. Marketing Teacher. 2006. [Online]. Available from: http://www.marketingteacher.com/Lessons/lesson_generic_strategies.htm [Accessed 24th July 2006] 25. Massachusettes Institute of Technology – TDQM program. 2006 – TDQM – Total Data Quality Management, MIT initiative [Online]. Available from: http://web.mit.edu/tdqm/ [Accessed 16th July 2006] 26. METAspectrum. 2004. Market Summary. [Online]. Available from: www.firstlogic.com/pdfs/DQmetaspectrum04.pdf [Accessed 10th July 2006] 27. Picarille, L. 2003. Data Quality Market to Grow at 30 Percent. [Online]. Available from: http://www.destinationcrm.com/articles/default.asp?ArticleID=3450 [Accessed 23th July 2006] 28. Redgate Software Ltd (2006) SQL Data Compare Software Features. [Online]. Available from: http://www.redgate.com/products/SQL_Data_Compare/features.htm [Accessed 19th July 2006] 29. Redman, T.1996. Data Quality for the Information Age. Boston ; London : Artech House. 30. Smith, G. 2006. Similarity Systems bought by California firm for US$55m. [Online]. Available from: http://www.siliconrepublic.com/news/news.nv?storyid=single5974 [Accessed 8th July 2006] 31. TDWI Conference Paper. 2004. Post Conference Review. [Online]. Available from: http://www.dmreview.com/ & download.101com.com/tdwi/LV04_Trip_Report.doc Forrester Research [Accessed 10th July 2006] 32. Wikipedia definition of TDQM. 2005. TDQM. [Online]. Available from: http://www.wikipedia.org [Accessed 16th July 2006] MEC 2006 82 Information Quality Balanced Scorecard MEC 2006 83