ABSTRACT EVOLUTION OF A DSS COURSE
Transcription
ABSTRACT EVOLUTION OF A DSS COURSE
ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support Data Cleansing and Analysis as a Prelude to Model Based Decision Support advent of the spreadsheet based modeling course. There has been a virtual revolution in terms of spreadsheet based management science and operations management courses that seems to have stuck in business schools (Winston, 1996, Powell, 1998, 2001). Spreadsheets have evolved into a quite capable platform for end-user decision support modeling (Grossman, 1997, Ragsdale, 2001). Within Microsoft Excel, this evolution has resulted in the inclusion of Solver (Fylstra, Lasdon, Watson, Waren, 1998) for optimization, Pivot Tables, database connectivity, numerous mathematical and statistical functions and the Visual Basic for Applications (VBA) programming language. Sophisticated add−ins such as @Risk, Crystal Ball and PrecisionTree allow analysts to do Monte−Carlo simulation and decision analysis within the familiar confines of Excel. Relational database management systems such as Microsoft Access provide a viable platform for creating decision support tools with the inclusion of VBA, database functionality and connectivity, as well as simple yet powerful interface development tools. Taken together, these developments suggest there exists tremendous opportunity for business analysts in the role of end−user modelers and end−user developers to pair models with rich data sources and create sophisticated decision support tools using readily available software components. Mark W. Isken Dept. of Decision and Information Sciences School of Business Administration Oakland University isken@oakland.edu ABSTRACT We describe a loosely integrated set of teaching modules that allow students to explore decision support systems (DSS) topics ranging from data warehousing to model based decision support. The modules are used as part of a spreadsheet based modeling approach to teaching DSS. We describe one particular module in detail in which students are forced to deal with a muddy dataset as a prelude to analysis, modeling and decision support system development. Editor’s note: This is a pdf copy of an html document which resides at http://ite.pubs.informs.org/Vol3No3/Isken/ EVOLUTION OF A DSS COURSE While this is all quite exciting for those with a passion for both systems development and mathematical modeling, there are challenges involved in effective teaching of such concepts and skills in todays business schools. At Oakland University, MIS is the largest major in the business school; there is no operations management major nor is there a management science course taught regularly. Students take six credit hours of business statistics in their sophomore year. The MIS department has offered a traditional survey DSS course for a number of years. In 2001, we created a spreadsheet based version of the course. Given the heavy MIS emphasis at our school, we decided to develop a course that integrates MIS, DSS, and management science concepts with the goal of helping students become technically adept problem solvers in practical business settings. Having spent many years as a practicing operations research and decision support specialist in the healthcare industry, the author is convinced of the value of equipping business students with this The huge umbrella of decision support systems (DSS) has long provided a welcome gathering spot for those interested in building software applications based on a mixture of models, data analysis, and powerful interfaces. DSS attracts practitioners, scholars and students from a range of fields including information systems, operations research/management science, computer science, psychology and other business disciplines. While this breadth is a strength, it also poses challenges to those teaching DSS courses. The sheer range of potential topics poses a challenge as well as a fact that while management information systems (MIS) is a strong, popular major at many business schools, the elimination of the required management science course at numerous universities (Powell, 1998, Grossman, 2001), makes it more difficult to introduce model based DSS to students at a deeper than superficial level. However, there has been a positive side effect of the pressure put on management science courses the INFORMS Transaction on Education 3:3 (23-75) 23 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support integrated tool set. The course is taught in a computer teaching lab and is extremely hands−on. Currently it is cross−listed and attracts senior undergraduates, MBA students, and MS in Information Technology Management students. No computer programming experience is assumed, though all students are taught to program as part of the course. While we will give a brief overview in this paper, the course web site (http://www.sba.oakland.edu/faculty/isken/mis646/) contains detailed information about the entire course. The primary focus of this paper will be on a particular set of modules used to integrate concepts of data based and model based decision support. These modules were designed to provide a natural progression from the world of corporate transaction system data to data based decision support and finally to model based decision support. in the business world. This multi−part module begins with muddy data, then OLAP, and concludes with queueing and optimization models. The application thread running through all three modules is that of analyzing call center data for purposes of assessing staffing capacity and customer service related questions. In this paper we will describe the muddy data module in some detail as it is somewhat unique, the OLAP module in less detail and just briefly describe the follow on modeling topics. The progression of the modules parallels a situation commonly faced by business analysts. In response to some business problem or managerial inquiry, the analyst might begin by trying to better understand the problem through preliminary data analysis and exploration. Unfortunately, at this stage one is often faced with formidable data gathering and cleansing problems that must be tackled before the data is usable for analysis. Next, relatively unstructured data exploration and analysis might be done in an attempt to quantify the problem, explore relationships between variables, and to gain better insight into the nature of the business problem. Depending on the specific problem at hand and set of questions to be answered, data analysis may simply not be sufficient and the analyst must move on to modeling and more structured data analysis. This linear progression is idealized in the sense that, certainly, one can do modeling in parallel with (or even in the absence of) data analysis. However, it is a common scenario in practice and has pedagogical value in that business students are usually more comfortable with concrete data than with more abstract modeling topics. MOTIVATION FOR THE TEACHING MODULES Many businesses have realized the tremendous value of their internal operational data for supporting decision making. This has played out in the software marketplace in the growth of the data warehousing, OLAP (online analytical processing), and data mining fields. Data warehouses provide a foundation of data that has already been merged from multiple systems, validated, transformed for purposes of facilitating analysis, stored according to a logical organizational scheme, and which is easily accessible using various client side tools such as query builders, spreadsheets and OLAP systems (Kimball, 1996). Multidimensional data modeling provides an alternative way of structuring databases for business analysis (Thomsen, 2002). OLAP tools such as Microsoft Excels Pivot Table and Pivot Chart functionality provide a powerful front-end for flexible data exploration. However, no matter how good the data warehouse and data analysis tools, many common managerial decision making problems require models to aid the decision maker in arriving at a defensible decision. MODULE 1 - MUDDY DATA Perceived value of data buried in corporate information systems, the growth in enterprise resource planning (ERP), workflow and e-commerce systems along with the availability of inexpensive computing resources have contributed to massive growth in the data warehousing and OLAP markets (Dhond et at 2000). Typically, such efforts involve a significant amount of effort to get data out of one place and into another. This phase, optimistically called extraction−transform−load (ETL), can be an unbelievably difficult exercise in dealing with missing, contradictory, ambiguous, undefined, and just plain We created a set of loosely integrated teaching modules that allows students to explore the inextricably linked concepts of data warehousing, OLAP, and decision modeling. In doing so, the students also become familiar with a host of advanced Excel tools and techniques that they will find extremely valuable INFORMS Transaction on Education 3:3 (23-75) 24 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support messy data (Pyle, 1999, Mallach, 2000). 6. Cultivate an MIS mindset that abhors manual data cleaning/transformation/manipulation and is always looking for structure in data files that can be exploited to create automated solutions, In general, much time is spent on getting data out of systems for: 7. Remind students that knowledge work often involves getting down and dirty with data and that their value as analysts is increased by developing proficiency in handling any data set thrown at them. • Analysis (spreadsheet, database, statistics package) • Transformation and input into new system (ERP, data warehouse) There are commercial products such as Monarch (from Data Watch (http://www.datawatch.com/index.asp)) and Content Extractor (Data Junction (http://www.datajunction.com/)) which address the general problem of extracting usable data from formatted electronic reports. However, the process of doing this task with a general purpose tool such as Excel gives students a better understanding of data manipulation concepts such as data types, string parsing, exploiting structure in data and automation of repetitive tasks which is often required even with commercial ETL tools. Furthermore, virtually every business analyst will have access to a spreadsheet and/or database package. Ideally, systems can export data in nice comma delimited, tab delimited, spreadsheet or database formats. This is not always the case with legacy systems. Often the fastest way to get data from such systems is to redirect a printed report to disk. Now the business analyst is faced with a huge electronic file that needs to be manipulated in order to make it useful. This is a typical and important aspect of working data from corporate information systems. It is often dismissed as tedious yet mindless work. This is far from the truth as it provides an opportunity for clever problem solving as well as saving time during the analysis process. We have created an Excel based teaching module entitled Dealing with Muddy Data Using Excel to introduce this real business problem to our students. The module has multiple learning objectives: STRUCTURE OF THE CLASS 1. Let students experience the difficulty of ETL activities, We begin the session with a brief introduction to the general problem and stress the ubiquitous nature of such problems and the undesirability of brute force manual approaches due to time constraints and likelihood of error. Then the specific case that we are going to use is introduced: data generated from an automatic call distributor (ACD) from a call center at a large tertiary care hospital. We take the perspective of analysts interested in studying call center performance and suggesting staffing changes that would improve customer service and/or reduce operating costs. The ACD system generated many summary reports but it was difficult to get at this data in an electronic fashion. The system was capable of sending reports to disk. So we had a large number of reports for different areas (called splits) in the hospital and different dates generated and redirected to disk. A small set of slides are used during the discussion DataMud Presentation (http://ite.pubs.informs.org/Vol3No3/Isken/DataMud.ppt). 2. Allow students to learn some of Excels useful data manipulation functions such as string parsing and date/time functions and to generally raise their power level in Excel (which translates to most database and spreadsheet applications), 3. Help the students start to learn how to make guesses about function existence, to hunt down useful functions and to figure out how to use them from online help systems and wizards, 4. Provide students with their first look at VBA via small user defined functions to aid the ETL process, 5. Provide a bridge for our predominantly MIS audience from the comfortable world (for them) of information systems to the uncomfortable world of data analysis and modeling, INFORMS Transaction on Education 3:3 (23-75) 25 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support The tutorial is an HTML document, MuddyDataTutorial (http://ite.pubs.informs.org/Vol3No3/Isken/ index.php#MuddyDataTutorial), and contains hyperlinks to all of the data files and spreadsheets used. The tutorial is written in an informal tone. The tutorial begins with some background to the general problem and then has the student open the primary data file in a text editor such as WordPad. discuss structural elements such as report headers and footers which contain useful information (e.g. report date), blank and other unnecessary lines, and totals sections. The notion of looking for exploitable structure is emphasized. For example, subtotal lines follow a line with a string of hyphens partially spanning the page, the Split number seems to follow the heading Split: and the phrase Daily Split Report signals the beginning of a new report. They get a chance to explore the structure of the file before attempting to import it into Excel. An annotated snapshot of the file (Figure 1) is displayed and we Figure 1: Sample ACD Report INFORMS Transaction on Education 3:3 (23-75) 26 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support EXAMPLE 1 SINGLE REPORT to delimited), is used to import the file. Surprisingly, most students, though regular users of Excel, had little experience even with simple text file importing. Figure 2 shows the result of this step. We start by working with a text file that contains a single ACD report for one date. The Text Import feature of Excel, treating the file as Fixed Width (as opposed Figure 2: Result of First Import While the numeric data in the report falls nicely into columns and rows, one important piece of information, the Split Number and Split Name have gotten somewhat garbled: Figure 3: Garbled Split Name INFORMS Transaction on Education 3:3 (23-75) 27 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support The students now are told that we want to create a formula in cell A3 that will return the split number, i.e. 2. They can assume that the split number is always <=100 and they are given a hint that the Value() and Mid() functions might be useful. Step 1. Concatenate A4:G4 with a formula in B3. Use the & operator. Step 2. Find the numeric position of the left parentheses in the result in cell B3. Use the Find() function in cell F3. Next we want to create a formula in cell J3 that will return the split name, i.e. OUTPAT CLINIC REG. They get the following guidance: Step 3. Use the result of Step 2 along with the Mid() function to get OUTPAT CLINIC REG. in cell G3. Figure 4: Fixing the Split Name Of course, several of these steps could be combined. However, students are urged to break problems like this into small formulaic chunks as it is much easier to debug multiple simple formulas than one enormous formula containing five or six nested Excel functions. After getting the desired result, one can always build a single function by copying and pasting the pieces. use this first example to talk about things like using the concatenation operator to create unique field names in a single row in preparation for creating Pivot Tables or importing the data into a database. Finally, to get ready for the second part of the tutorial and to illustrate another useful approach to text manipulation, we re−import the text file into a single column and then use the Text to Columns tool available in Excel to parse just the data rows into columns. This is all explained in detail in MuddyDataTutorial (http://ite.pubs.informs.org/Vol3No3/Isken/ index.php#MuddyDataTutorial). A LITTLE DATE/TIME PROBLEM It turns out that the date has luckily made it unscathed into a cell but the time periods in column A look like: EXAMPLE 2: MANY DAYS OF REPORTS IN ONE FILE This is one string in a cell. The students are challenged to create formulas in two separate cells that contain the actual date/time values corresponding to the beginning and end of the period. Now that the students have some familiarity with data importing and cleaning, we tackle a case of over sixty ACD reports from three different splits and multiple dates all contained in a single text file. The size of the file drives home the point that we really do not want to manually wade through this file to prepare it for analysis. Certainly, if the data files will need to be dealt with on an ongoing basis or if the file is enormous, the value of automation is clear. However, even if the file is somewhat of a one-time obstacle and of manageable size, the value of the learning that takes place in figuring out how to automate the process usually outweighs This leads to a discussion about how dates and times are stored in most spreadsheet and database packages. Our experience has been that few students are aware of how they are stored and thus cannot effectively manipulate date/time values in formulas nor take advantage of the wealth of date/time functions built into standard office productivity applications. A related discussion then is likely to ensue about the differences between date/time values, strings that look like dates, and different date/time cell formats. We also INFORMS Transaction on Education 3:3 (23-75) 28 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support any extra time spent doing so. The goal is to eventually create a spreadsheet that looks like Figure 5 (many more columns to the right). Figure 5: Our Goal In other words, we have gotten rid of all the extraneous rows and we have the report date and split number on each row so that we know where the data came from. This file is now in good shape for pivoting or importing into a database or statistics package. a network manager for an auto supplier, used these ideas to create a network utility that summarized the file content of their LAN by processing the output of a VBA automated stream of DOS DIR commands from within an Excel application. In addition to data cleansing, the utility also included pivot table based analytics (the subject of the next section) which were made available via their corporate intranet. Another student was faced with a difficult data transformation problem as part of an ERP implementation. The project involved a mainframe report directed to text file with over 50,000 lines of data describing information security access levels for several thousand employees. Also, a graduate student who was working in the Universitys finance department gave the class a demonstration of a commercial ETL product and described numerous muddy data problems that they deal with routinely. Finally, a former graduate student contacted me and described her new position with a major consulting firm − Lots of modeling, muddy data problems, and working with OLAP tools and data warehousing. I know the muddy data area was a particular area of interest to you and it seems that it is a really BIG issue for many businesses - actually bigger than I imagined. The tutorial walks the students through one general approach for tackling this problem. A combination of instruction, hints, screen shots, and questions are used to guide the learning experience. Solutions to the questions can be found in the file MuddyDataTutorial-Notes (http://ite.pubs.informs.org/Vol3No3/Isken/ index.php#MuddyDataTutorial-Notes). The students are reminded that this is certainly not the only way to proceed and they are urged to look for opportunities for other, perhaps better, approaches. Along the way they encounter concepts and tactics like: • inserting a column containing a unique sequence number so that the original row order can be regained as needed, • use of functions such as FIND(), ISBLANK(), ISERROR(), and ISNUMBER() to test cells for certain conditions, • using VBA to create user defined functions to encapsulate complex, nested formulas, MODULE 2 OLAP STYLE ANALYSIS WITH EXCEL PIVOT TABLES AND CHARTS • using data filtering to hide unwanted data rows, he muddy data tutorial is followed by a module focusing on OLAP style analysis using Excel Pivot Tables and Pivot Charts. While a precise definition of OLAP software technology is elusive, an oft cited definition is the one given by Nigel Pendse, co-founder of the The OLAP Report (http://www.olapreport.com/), an industry research group. OLAP is characterized as tools providing • copying and Paste Special|Values to freeze intermediate results of complex calculations. APPLICATIONS This tutorial has never failed to elicit examples from students from their places of employment. One student, INFORMS Transaction on Education 3:3 (23-75) 29 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support A warning is in order − this tutorial contains material related to Access. At our university, since many of the students taking the course are MIS majors, they are familiar with Access (often more so than with Excel). However, even those students with limited or no experience with Access are usually able to follow along even if they cannot do all of the Access portions of the tutorial. I believe it is important for business analysts to have a working knowledge of database concepts and Access is a very good package for meeting this objective. It also offers a perfect opportunity to discuss the pros and cons of spreadsheets versus databases for various business computing tasks. For those interested in an Excel only version of this tutorial, see ExcelPivotTutorial-NoAccess (http://ite.pubs.informs.org/Vol3No3/Isken/ index.php#ExcelPivotTutorial-NoAccess). fast analysis of shared multidimensional information (FASMI (http://www.olapreport.com/fasmi.htm)). The multidimensional part, arguably the defining feature of OLAP (Thomsen, 2002), refers to the approach of organizing and accessing business facts, such as sales volume, along natural dimensions such as customers, products, and time. The notion of a data cube arises from this approach. Numerous software tools exist for creating, managing, and analyzing multidimensional data to support business analysis. A few good sources for information related to data warehousing and OLAP include The Data Warehousing Information Center (http://www.dwinfocenter.org/), DSS Resources (http://dssresources.com/), and Intelligent Enterprise (http://www.intelligententerprise.com/). Excel Pivot Tables, while certainly not considered a full-featured OLAP analysis tool, provide a readily accessible and simple way to introduce OLAP analysis concepts in the classroom. Through polling our students, many who are working and are routine spreadsheet users, we have found that very few are familiar with Pivot Tables and Pivot Charts. This session begins with a discussion of basic data warehousing concepts including star schemas and multidimensional data models., Differences are highlighted between databases designed for analysis and databases designed to support transaction processing . For this module we have created a fictitious call center that responds to technical support calls for Microsoft Office products. The application itself, computer technical support call center, was chosen partly because it falls under the MIS managerial domain - MIS students may end up managing such call centers. The data model, implemented in the database management package Microsoft Access, consists of time, customer, application, and problem dimensions and two facts: time spent on hold and time to service the technical support call. The dataset consists of approximately 26,000 rows of data generated from a simulation model of a call center. Each row is a single phone call. Like the previous module, the students are given a tutorial consisting of a background section and a combination of instruction, hints, screen shots, and questions. Again, the tutorial is an HTML document, ExcelPivotTutorial (http://ite.pubs.informs.org/Vol3No3/Isken/ index.php#ExcelPivotTutorial), and contains hyperlinks to all of the data files and spreadsheets used. INFORMS Transaction on Education 3:3 (23-75) After becoming familiar with the data model, we use the MS Access QBE (Query by Example) tool to create basic queries and quickly find that SQL is far from ideal for ad-hoc exploratory analysis. This motivates the need for flexible OLAP tools such as Pivot Tables. The students are provided with an Excel workbook containing the results of a query that combines fields from the dimension and fact tables. The bulk of the tutorial focuses on the basics of creating Pivot Tables and Charts to answer business questions such as the volume of calls by various combinations of customer, application, problem and time values. Specific topics include creating the pivot table, sections of pivot tables, pivoting, changing display types, drilling down to detail data, formatting, general table options, controlling subtotals, filtering of field values and creating pivot charts. We recently used this tutorial with a group of healthcare executive MBA students (using data from a surgical recovery room) and it was received extremely well. They are frustrated by the lack of data to support their decision making and quickly saw the potential for these types of tools. Many of their questions focused on how to go about getting access to such data from healthcare information systems. This provided a nice lead in to the section in the tutorial where we see how to connect Excels Pivot Table to data in an external database such as Access. Discussion of commercial OLAP tools (e.g. EssBase, WebFocus, Business Objects, Oracle Express, SQL Server Analysis Services) complements 30 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support this topic since such tools usually are capable of working with data in a myriad of database and file formats. sales (L.L Bean, Lands End), healthcare, police and emergency services, fast food, insurance companies, and product support. While historical data is useful for analyzing current system performance, we need something else to handle What if? or Whats best? type questions. We need models. Try to think a bit about what makes questions 1−4 difficult and well discuss modeling with spreadsheets next time. As the tutorial nears the end, the students are presented with their task. They are placed in the role of call center manager with staffing and scheduling responsibilities and challenged to analyze the data in response to customer complaints. To begin to address this issue, they are first guided to look for relationships between time and hold and staffing levels by time of day and day of week. They use pivot charts to illustrate such relationships. Then the tutorial closes with a series of managerial questions and a preview of things to come: FOLLOW-ON ANALYTICAL POSSIBILITIES The previous two modules have set the stage to discuss the use of queueing models to explore the relationship between capacity and system performance and customer service measures such as average and upper percentiles of time on hold, employee utilization, and lost calls. We use the spreadsheet modeling based textbook, Practical Management Science (Winston and Albright, 2001) which includes a chapter on basic queueing models. The students also receive a tutorial which introduces queueing models in the context of our call center example. The basics physics of the M/M/s queue are discussed and the students work with a pre-developed spreadsheet model that calculates performance measures for user defined parameter values such as the call arrival rate and service time. The objective is to have students discover that, 1) retrospective data analysis is simply not sufficient to answer these kinds of basic operational questions, 2) mathematical models can provide a way to address such questions, 3) models are inherently based on a host of simplifying assumptions and 4) they should have paid better attention in their math and probability/statistics courses. In this module, youve analyzed call center data using pivot tables and pivot charts. Youve taken the perspective of the manager of the call center. As the manager, certain questions would likely arise based on the analysis. 1. It seems like there are times where staff utilization is really low. If we reduce staffing how will the average time on hold will be affected? 2. There are times when average time on hold and staff utilization are both pretty high. I bet we could reduce average time on hold by adding staff (thus reducing staff utilization). How much staff would we need to reduce average time on hold to 30 seconds? 15 seconds? 3. Average time on hold isnt the only important performance measure. Were also interested in the percentage of customers who have to hold and the percentage of customers who hold less than 1 minute. How could we calculate these measures from the current data and how could we figure out how they will be affected by changes in staff levels? The last objective is a real issue (Groleau, 1999). It is valuable for students to see whats going on underneath the hood of these mysterious things called mathematical models. As one graduate MIS student recently commented, it was nice to learn that there was no man behind the green curtain. However, the mathematical background of many of the students is sparse and care must be taken to present the material so they do actually gain insight from it. By exploring the queue length and wait time distribution formulas for the M/M/s queue, students see that everything in business cannot be described by linear regression lines, and that very powerful and informative statements about real business systems can be made by equations containing nothing 4. We currently just have our tech support staff working 8-hour shifts. The midnight shift start is from 12a-8a, the day shift is from 8a-4p, and the afternoon shift is from 4p-12a. If we were better able to estimate our staffing needs by hour of day and day of week, how would we figure out how to schedule our staff (e.g. shift start times and shift lengths) to meet these staffing needs? These are not easy questions but are the type routinely faced by managers of a myriad of customer service related systems including software tech support, retail INFORMS Transaction on Education 3:3 (23-75) 31 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support more complex than summations, exponentiation, and factorials. model based DSS topics using Excel. These tutorials naturally lead the students to difficult managerial questions involving call center staffing and scheduling for which mathematical models prove extremely helpful. The greatest pedagogical challenge has been to structure the DSS course (recently renamed Business Analysis and Modeling) so as to move at a pace the students can handle and yet cover the range of topics necessary for the students to actually build Excel based DSSs for their end of the term project. In addition to the topics discussed above, students are taught to program in VBA (using Albright, 2001), develop basic user forms that communicate with their spreadsheet models, do spreadsheet simulation with @Risk, and decision analysis with PrecisionTree (specific topics have varied slightly over semesters). Obviously, in a single semester, we can only explore each topic to a limited degree. Ideally this would be a two course sequence with the VBA and other application development topics moved into the second course. The lack of a required management science course makes this difficult to accomplish and so we have decided to err on the side of packing a bit too much into this one course. At the very least, students leave the class as power users of Excel and with a newfound appreciation and respect for this unassuming tool. Then students answer basic questions using the M/M/s spreadsheet model concerning the relationship between wait time and changes in call volume, staffing to meet service level goals, and use of Littles Law. They also are asked to examine the formula for the probability of an empty system in the M/M/s queue and to discuss the relative merits of cell based formulas versus procedures written in VBA for such iterative calculations. This brings MIS development issues back into a discussion about modeling. Like others, we have struggled with the difficulty of illustrating the dynamics of queueing systems within the spreadsheet environment. We usually demonstrate a discrete event simulation model of a simple queueing system to try to augment the spreadsheet based queueing models. Recently, this issue has been addressed by contributors to this journal (see Ashley (2002) and Grossman and Ingolfsson (2002)) and we are actively working to integrate these ideas into the queueing module. Finally, we discuss the fact that though we can use this queueing model to estimate staffing needs by time of day and day of week, our job is far from over. Now we have to figure out how to develop work schedules that meet our staffing targets and conform to policies concerning shifts lengths, start time, days worked per week, and other employee desires and constraints concerning schedules. This provides a way to introduce optimization models using Solver in Excel. Over several class sessions, the students are introduced to the basics of linear and integer programming using Solver. They explore and enhance a basic employee scheduling model (Albright, 2001). We then use this model to provide a bridge to talking about creating full blown decision support systems on the Excel platform, complete with a graphical user interface that hides complex mathematical models from the user (see Albright (2001), Zobel, Ragsdale and Rees (2000), and Ragsdale (2001)). SUMMARY We have developed a number of related modules that take students on a journey through a range of data and INFORMS Transaction on Education 3:3 (23-75) 32 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support REFERENCES Warehouse Systems, McGraw-Hill, Boston, MA. Albright, S.C. (2001), VBA for Modelers, Duxbury, Pacific Grove, CA. Powell, S.G., (1997), The Teachers Forum: From Intelligent Consumer to Active Modeler, Two MBA Success Stories?, Interfaces, Vol. 27, No. 3, pp. 88-98. Ashley D.W. (2002), An Introduction to Queuing Theory in an Interactive Text Format, INFORMS Transactions on Education, Vol. 2, No. 3, http://ite.pubs.informs.org/Vol2No3/Ashley/ Powell, S.G., (1998), The Teachers Forum: Requiem for the Management Science Course?, Interfaces, Vol. 28, No. 2, pp. 111-117. Dhond, A., Gupta, A., and S. Vadhavkar (2000) Data Mining Techniques for Optimizing Inventories for Electronic Commerce, Proceedings of KDD 2000, Boston, MA. Powell, S.G. (2001), Teaching Modeling in Management Science, INFORMS Transactions on Education, Vol. 1, No. 2, http://ite.pubs.informs.org/Vol1No2/Powell/ Foulds, L. and C. Thachenkary (2001) Empower to the People: How Decision Support Systems and IT Can Aid the Analyst, OR/MS Today, Vol. 28, No. 3, pp. 32-37. Pyle, D. (1999), Data Preparation for Data Mining, Academic Press, San Diego, CA. Ragsdale C. T. (2001), Teaching Management Science With Spreadsheets: From Decision Models to Decision Support, INFORMS Transactions on Education, Vol. 1, No. 2, http://ite.pubs.informs.org/Vol1No2/Ragsdale/ Fylstra, D., L. Lasdon, J. Watson, A. Waren (1998), ”Design and use of the Microsoft Excel Solver”, Interfaces, Vol. 28, No. 5, pp. 29-55. Thomsen, E.(2002), OLAP Solutions: Building Multidimensional Information Systems, Wiley Computer Publishing, NY. Groleau, T.G. (1999), Spreadsheets Will Not Save OR/MS!, OR/MS Today,.Vol 26. No. 1. pp. 6. Winston, W. (1996), The Teachers Forum: Management Science with Spreadsheets for MBAs at Indiana University?, Interfaces, Vol. 26, No. 2, pp. 105-111. Grossman, T. A. Jr.(1997) End User Modeling, OR/MS Today, Vol 24. No. 5. pp. 10. Grossman, T.A. Jr. (2001), Causes of the Decline of the Business School Management Science Course, INFORMS Transactions on Education, Vol. 1, No. 2, http://ite.pubs.informs.org/Vol1No2/Grossman/ Winston, W. (2001), Executive Education Opportunities: Millions of Analysts Need Training in Spreadsheet Modeling, Optimization, Monte Carlo Simulation and Data Analysis, OR/MS Today, Vol. 28, No. 4. pp. 36-39. Grossman , T.A. and A. Ingolfsson (2002), “Graphical Spreadsheet Simulation of Queues, INFORMS Winston, W. and Albright, S.C. (2001), Practical ManTransactions on Education, Vol. 2, No. 2, http://ite.pubs.informs.org/Vol2No2/IngolfssonGrossman/ agement Science, Duxbury, Pacific Grove, CA. Zobel, C., Ragsdale, C.T., and L. Rees (2000), Handson Learning Experience: Teaching Decision-Support Systems Using VBA, OR/MS Today, Vol. 27, No. 6. pp. 28-31. Kimball, R. (1996), The Data Warehouse Toolkit, John Wiley & Sons, Inc., NY. Mallach, E.G. (2000), Decision Support and Data INFORMS Transaction on Education 3:3 (23-75) 33 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support APPENDICES DEALING WITH MUDDY DATA USING EXCEL Background Perceived value of data buried in corporate information systems, the growth in enterprise resource planning (ERP), workflow and e-commerce systems along with the availability of inexpensive computing resources have contributed to massive growth in the data warehousing and OLAP markets. Typically, such efforts involve a significant amount of effort to get data out of one place and into another. This phase, optimistically called extraction−transform−load (ETL), can be an unbelievably difficult exercise in dealing with missing, contradictory, ambiguous, undefined, and just plain messy data. In general, much time is spent on getting data out of systems for: • Analysis (spreadsheet, database, statistics package, etc.) • Transformation and input into new system (ERP, data warehouse, etc.) Ideally, systems can export in nice comma delimited, tab delimited, spreadsheet or database formats. Thats not always the case with legacy systems. Often the fastest way to get data from such systems is to redirect a printed report to disk. Now the business analyst is faced with a huge electronic file that needs to be manipulated in order to make it useful. Similarly, automating parsing or mining of log files, web pages, and other irregularly formatted text files leads to similar problems. This is a necessary and important part of working with information systems. It is often dismissed as tedious yet mindless work, kind of a dogs could do it mentality. This is far from the truth. We all dont start out as strategic IS visionaries and even if we do, small companies require people who are willing to sweep the floor as well as envision the future of the Internet, in the same day. Today we are going to sweep the floor (or more appropriately, take out the garbage.) Objective In this tutorial we will learn techniques for dealing with ASCII (text) data files that we want to get into Excel and/or Access. We will learn about string parsing and other string functions, date/time functions, and the Text Import and Text to Columns Wizards. We will also learn some tactics for dealing with really large text files that have been created from mainframe reports being dumped to an ASCII file. Our MIS mindset abhors manual data cleaning/transformation/manipulation and is always looking for structure in data files that we can exploit to create automated approaches to data cleaning and transformation. The Tutorial We will use data from an ACD system at a major tertiary care hospital. As analysts, we are interested in studying call center performance and suggesting staffing changes that would improve customer service and/or reduce operating costs. The ACD system generates many summary reports but it is difficult to get at this data in an electronic fashion. Fortunately, the system was capable of sending reports to disk. So we had a large number of reports for different areas in the hospital (each area is called a Split) and different dates generated and dumped to disk. Heres what one report for one split for one day looks like. INFORMS Transaction on Education 3:3 (23-75) 34 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support Hmm, doesnt look too bad. Well look at two different scenarios to illustrate a whole host of techniques for dealing with data in Excel. In the first example, well start simple by just dealing with one report from one date as in the screen shot above. Our goal is to get the report into multiple columns in Excel so we can analyze the data. We wont do any analysis here, were just concerned with the data loading, cleaning, and transformation problem. Since theres just one report, we dont need a ton of elaborate techniques to deal with the report header and footer info. In the second example we will deal with one huge text file that has many such reports for different splits and different dates all in the same file. Well use more sophisticated techniques to deal with this very common scenario. Example 1: One report for one date The file example1.txt is available in the Downloads section of the course web. It contains the report you see above. Lets look at some quick ways to get the data in that report into different columns in Excel. First, open the file in WordPad and take a look around the file. OK, close the file. Method 1 Fixed Width Multi−Column Text Import In Excel, open example1.txt (http://ite.pubs.informs.org/Vol3No3/Isken/Example1.txt) as a text file: Youll see the first screen of the Text Import Wizard. Notice there are two main types of imports: Delimited and Fixed Width. Lets do Fixed Width since we saw in WordPad that the columns are lined up nicely in the data section. You can use the horizontal and vertical scroll bars to view your data as you go through the Wizard. Also, notice the Start Import at Row spinner. You can use this to not import the report header information. Lets leave it at row 1 for now. INFORMS Transaction on Education 3:3 (23-75) 35 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support Click Next and Excel will guess at where you want the column breaks. Heres the result of its guess (notice Ive scrolled down a bit to look at the data). Excel will often do a nice job of setting the column breaks with nicely formatted data. You can modify the breaks by clicking and dragging/dropping per the on-screen instructions. In this case, Excel has done a fine job (look for yourself) and well click Next. The next screen lets you go column by column setting data types and/or skipping columns. This can be very useful but well just click Finish. INFORMS Transaction on Education 3:3 (23-75) 36 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support Ta da. Notice that each data value in the body and totals section of the report are in separate columns which means we can actually analyze the data. Notice also that we got lucky in that the date in Q2 fell into a column. Now lets look at a few basic techniques for dealing with the not so nice parts of this spreadsheet. This is a warm-up for Example 2 in which well really have to take advantage of Excels rich function library as well as our own ingenuity. INFORMS Transaction on Education 3:3 (23-75) 37 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support Split number and name Heres a look at A4:D4 Heres a table showing the values in cells A4:D4 A4 B4 C4 D4 SPLIT: 2 (OUT PAT. CLI NIC REG. ) We want to create a formula in cell A3 that will return the split number, i.e. 2. Assume that the split number is always <=100. HINT: Use the Value() and Mid() functions. We also want to create a formula in cell J3 that will return the split name, i.e. OUTPAT CLINIC REG. (assume the name is always surrounded by parentheses). HINT: Do this in multiple steps. After the four steps below, a screen shot is shown with the results. Step 1. Concatenate A4:G4 with a formula in B3. Use the & operator or the Concatenate() function. Going out to column G should catch even very long split names. Step 2. Find the numeric position of the left parentheses in the result in cell B3. Use the Find() function in cell F3. Step 3. Use the result of Step 2 along with the Mid() function to get OUTPAT CLINIC REG.) in cell G3. Step 4. Now we just need to get rid of the right parentheses. Use Left() and Len() to do this. The final result is in J3. See if you can figure out how to do these. If you cant, see the file Example1-splitnum.xls and look at the formulas in A3, B3, F3, G3 and J3. INFORMS Transaction on Education 3:3 (23-75) 38 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support Field Names Notice that the column headers in the report are spread over multiple rows. If we want to eventually get this data into a database such as Access or if we want to create a Pivot Table, well need unique field names on one row. Use the & operator to do this. A Little Date/Time Problem Notice that in column A are time periods that look like: This is a string representing a range of time stored 07:00-07:30AM in a single cell. How might you create formulas in two separate cells that contain the actual date/time values corresponding to the beginning and end of the period? Recall that the date is up in cell Q2. So, you want to do this: Give it a try. HINT: Use the Mid() and TimeValue() functions on the end time first and understand how Excel 11/2/92 7:00 AM 11/2/92 7:30 AM (and most other software for that matter) stores dates and times to get the beginning time. If you get stuck, check out the formulas in A43 and B43. How do those formulas work? Unprintable Character Codes Often when reports are sent to disk, youll see unprintable characters which will show up as empty boxes. For example, see A39 in the following screen shot: Take a look at the formula in B39, =CODE(A39). The Excel function CODE() returns the ASCII code number for the given character. So it turns out that our unprintable character in A39 is a ˆZ (or Ctrl−Z) which I believe is an end of file character. INFORMS Transaction on Education 3:3 (23-75) 39 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support A Different Approach Using Text to Columns A similar approach to importing this file can be used when you really dont want the header or footer information to get chopped up. Close all of your worksheets and try to reopen example1.txt. This time, select Delimited in the first screen of the Text Import Wizard. Leave it set to Tab delimited. Since there are no tabs in our file, all of the data will come into a single column in the spreadsheet. Just click Finish and heres what youll see. INFORMS Transaction on Education 3:3 (23-75) 40 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support A useful trick at this point is to change the font from Arial (which is a variable width font) to a fixed width font such as Courier New so you can see if the columns are naturally aligned. Much nicer, no? However, the data is still not usable since its embedded in the strings in column A. Select the range A10:A36 (the data rows) and pick Text to Columns from the Data menu. Youll get a Wizard that looks a lot like the Text Import Wizard. You can now either select Delimited (by what?) or Fixed Width and Excel will parse column A into multiple columns. Try it. This ends the warm up. Now lets tackle something a little more difficult. Example 2: Many days of reports in one file Completing this second part of the tutorial is your job. There are seven questions within the tutorial that you should answer in addition to completing the overall task outlined in the tutorial. A common scenario is to get a huge report which has been directed to an electronic file. Well use the file called example2.txt. It has about sixty ACD reports all in one file. Open it in WordPad and scroll through it a bit. Each report is for some date and some split. There are three different splits. Our goal is to eventually create a spreadsheet that looks like the following (with many more rows and columns.) INFORMS Transaction on Education 3:3 (23-75) 41 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support In other words, weve gotten rid of all the extraneous rows and we have the report date and split number on each row. This file is now in good shape for pivoting or for importing into a database or statistics package. Lets start by opening it in Excel. Open it as a tab delimited text file so it all comes in one column (A) just like we did at the end of Example 1. Change the font to Courier New so we can better see the structure of the file. Youll see its just a sequence of the report we dealt with in Example 1 for different dates and different split numbers. Save it as an XLS file called acddata− [your name].xls. Step 0: Add a Sequence Number column Its not a bad idea to insert a column at the far left of the sheet and sequentially number each row in the spreadsheet. I also added a row at the top and created the field names Sequence and Line. It looks like: Notice that the second report starts on Sequence=36. The Sequence field provides means of recovering the order of the rows in the report and as a reference back to the original report after we do a bunch of data gymnastics. This sheet is the sheet named Step0 in acddata-finished.xls (http://ite.pubs.informs.org/Vol3No3/Isken/acddatafinished.xls) .Note: This file contains VBA code. Make sure you select Enable Macros when opening the file and that you have Macro Security set to Medium (Tools | Macro | Security). The basic strategy used to get this file in the desired format is summarized in the following table. Each step is described in a bit more detail following the table. Each step has a corresponding tab in acddata-finished.xls (http://ite.pubs.informs.org/Vol3No3/Isken/acddata-finished.xls) so you can see what was done. INFORMS Transaction on Education 3:3 (23-75) 42 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Add record sequence number. Identify unneeded rows using indicator formulas. Sort the records by Keep (descending) and Sequence (ascending) so the unneeded rows sort to the bottom of the worksheet but all other rows kept in correct order. Classify all the remaining rows into a small number of record type categories. We used a little VBA function called GetRecType() which contains a sequence of If-Then type rules to classify each record. Copied the results (not including the unneeded rows) from Step 2 and Paste Special Values into a new worksheet to freeze all the formulas. Inserted two new columns: one for report data and one for report split number. Wrote formulas to fill these columns. Copied the results (not including the unneeded rows) from Step 3 and Paste Special Values into a new worksheet to freeze all the formulas. Sorted it by split, data, and then sequence. Use Data Filter to just see the record type were interested in (the data rows). Copy and paste the filtered data into a new worksheet, delete unneeded columns. Then do a Data Text to Columns on the Line field to get the data into separate columns. Finally, insert two columns and use formulas to break up the time bin string into separate start and end times as we did in Example 1. Paste Special - Values the column headings that we created in Example 1. Step 1 - Identify unneeded rows using indicator formulas Scanning the data usually results in identification of several types of unneeded rows. In this example, there are three of these: 1. DAILY SPLIT REPORT header 2. Blank rows 3. Rows with many hyphens used as lines In sheet Step 1 we added three columns, one for each of these unneeded row types. They are in indicator columns B, C and D and are labeled Daily, Blanks, and Hyphens. In each indicator column there is a formula that returns a 0 if this row was that type of unneeded row and returns a 1 if not. INFORMS Transaction on Education 3:3 (23-75) 43 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support Notice that Sequence=1 is a DAILY SPLIT REPORT header so the value in the Daily column is 0 indicating that we dont want this record for that reason. Likewise, Sequence=3 is a blank row and Sequence=8 is a hyphen row. So, if a row has a 0 in columns B, C or D, we know we dont want it. Then a fourth column was inserted immediately to the right of column D. The formula in this new Column E (labeled Keep) is just =PRODUCT(B1:D1) which will equal 0 if any of the indicators in that row is equal to 0. Now we can sort the data by column E (descending) and column A (ascending) to move the unneeded rows to the bottom. By cleaning out obviously unneeded stuff, it can be easier to find structure in the remaining records. Question 1: How does the formula in cell B2 work? The formula is: =IF(ISERROR(FIND(”DAILY”,F2)),1,0) Question 2: How does the formula in cell D2 work? The formula is: =IF(LEN(TRIM(F2))=0,0,1) Step 2 Record Type Identification Function Usually you can categorize the remaining records into a few types (in this case, seven). While we could take the indicator column approach again to try to automatically classify each row, instead 1 2 3 4 5 6 7 Line with report date Line with split number and split name First line of the data column header Second line of the data column header Third line of the data column header A data detail line (this is the type were really interested in) The totals line we will use a small user-defined VBA function called GetRecType() to do this. Its important to remember that you can write your own functions in VBA for Excel (and Access, and any other product for that matter that supports VBA). These functions then can be used just like any other Excel function. Like Access, the VBA code lives in the file but is only visible through the Visual Basic editor (Tools|Macros|Visual Basic Editor). Heres the code: INFORMS Transaction on Education 3:3 (23-75) 44 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support Step 3 Date and Split Formulas Question 6: How does the formula in cell G2 work on sheet Step3? The formula is: =IF(F2=1,DATEVALUE(RIGHT(I2,8)),G1) Question 7: How does the formula in cell H2 work on sheet Step3? The formula is: =IF(F2=1,VALUE(MID(I3,8,3)), IF(F2=2,VALUE(MID(I2,8,3)),H1)) Yep, these are a bit disgusting but are pretty typical. Step 4 Data Filter Put your cursor anywhere within a list of data with column headings (so Excel guesses to treat it as a database) and select Data | Filter | AutoFilter. Now you can control which data rows are displayed by making selections in one or more of the drop down boxes that Excel has added to the header row. If you make selections in multiple columns, Excel treats it as a logical AND condition and only rows meeting all filter conditions will be displayed. Data filtering is a powerful data exploration tool that we are using to help with data cleansing. By selecting RecType 6, we filter out all of the unneeded rows in the data set, i.e. all rows which are not data detail lines. Check out the Advanced Filter feature to see how to create more complex filter conditions combining logical ANDs and ORs. Step 5 Parse and Transform Copy and paste the filtered data into a new worksheet. You can delete unneeded columns such as the indicators in B-E. Then do a Data | Text to Columns on the Line field to get the data into separate columns. Now, insert two columns to the right of the Line column and use formulas to break up the time bin string into separate start and end times as we did in Example 1. Step 6 Field Headings It is important that each column (or field in database terminology) have a unique heading (field name) if one wants to begin exploring the data using Pivot Tables/Charts, a statistics package or moving the data into a database. We did this back in Example 1 with the concatenation operator. You can copy and paste from that example or type in new field names if you wish. After completing these last two steps, your data should look something like the tab labeled Step5-6 in acddatafinished.xls. Finally, we are ready to analyze the data. INFORMS Transaction on Education 3:3 (23-75) 45 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support Summary This concludes our brief journey into the wonderful world of muddy data. You might be surprised to find out how much time is wasted in the corporate world doing tasks like this manually. With very little programming (which we could have avoided if we wished), we were able to use built in Excel tools to deal with extracting useful data embedded in electronic reports. If we had to routinely process this file, it would be relatively easy to automate the entire process within Excel with a little VBA. While this type of work may not be glamorous, it is a fact of life in the world of information systems and business analysis. It is well worth your time to become more familiar with the huge number of functions available in Excel (and Access) for manipulating all types of data. If you need to do this kind of thing all the time on the same report, you might want to look into commercial products such as Content Extractor from Data Junction or Monarch from Data Watch. DEALING WITH MUDDY DATA USING EXCEL SOLUTIONS Question 1: How does the formula in cell B2 work? The formula is: =IF(ISERROR(FIND(”DAILY”,F2)),1,0) The FIND() function will look for the string DAILY in cell F2. If it finds it, it will return the character position of the letter D in F2. If it does not find it, it will return an #VALUE! error. The ISERROR() function is used to trap this error. If DAILY is in F2, the FIND() will not return an error, the ISERROR() will return FALSE, and the IF() will return 0 (the else part of the if−then−else) indicating that this is a row we do not want to keep. Question 2: How does the formula in cell D2 work? The formula is: = IF(LEN(TRIM(F2))=0,0,1) We are checking to see if F2 is empty. The TRIM() function gets rid of any leading or trailing spaces in the cell and the LEN() function returns the length of the trimmed text in F2. If LEN() returns 0, the IF() will return 0, indicating that this row is blank and we do not want to keep it. Step 2 Record Type Identification Function Usually you can categorize the remaining records into a few types (in this case, seven). While we could take the indicator column approach again to try to automatically classify each row, instead 1 2 3 4 5 6 7 Line with report date Line with split number and split name First line of the data column header Second line of the data column header Third line of the data column header A data detail line (this is the type were really interested in) The totals line we will use a small user-defined VBA function called GetRecType() to do this. Its important to remember that you can write your own functions in VBA for Excel (and Access, and any other product for that matter that supports VBA). These functions then can be used just like any other Excel function. Like Access, the VBA code lives in the file but is only visible through the Visual Basic editor (Tools|Macros|Visual Basic Editor). Heres the code: INFORMS Transaction on Education 3:3 (23-75) 46 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support User defined functions are a nice way to introduce VBA programming. They can be used to encapsulate complex logic and avoid giant multiple nested IF() type formulas. The passed in input parameter, strLine, is a string and contains the cell contents used as the argument to this user defined function. Questions 3−5 deal with various parts of the user defined function used to check if the string contains certain interesting information. Question 3: If IsDate Trim(Left(strLine, 10))) Then intRecType = 1 ’Date GetRecType = intRecType Exit Function End If If the leftmost 10 characters (after removal of leading and trailing spaces via the Trim() function) of the cell can be interpreted as a valid date, then the function returns a 1 to indicate that this is a record of type 1 the report date. The IsDate() function is a VBA function, not an Excel worksheet function. INFORMS Transaction on Education 3:3 (23-75) 47 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support Question 4: If InStr(strLine, ”SPLIT”) Then intRecType = 2 ’Split GetRecType = intRecType Exit Function End If The VBA function InStr() is used to search the input string for an occurrence of the word SPLIT. InStr() will return a 0 if it does not find it. If it does, then we return a 2 to indicate that this is a line with the split number and split name. Question 5: If IsNumeric(Left(Trim(strLine), 6)) Then intRecType = 7 ’Totals line GetRecType = intRecType Exit Function End If If the leftmost 6 characters (after removal of leading and trailing spaces via the Trim() function) of the cell can be interpreted as a valid number, then the function returns a 7 to indicate that this is a record containing the totals for the report. The IsNumeric() function is a VBA function, not an Excel worksheet function. There is an Excel worksheet function called ISNUMBER() which serves the same purpose. Step 3 Date and Split Formulas The next two questions deal with how we fill in repeating dates and split numbers for each line of each report. Question 6: How does the formula in cell G2 work on sheet Step3? IF(F2=1,DATEVALUE(RIGHT(I2,8)),G1) The formula is: = The basic strategy is to determine if a new report is starting on the current line. If it isnt, the date can be obtained from the row immediately above the current row. One way to determine if a new report is starting is to check if the RecType column has a value of 1 a report date line. This is the = IF(F2=1 part of the formula. If this is true, we extract the date using RIGHT() as a string, and then convert it to a valid Excel date with the DATEVALUE() function. This is a useful function that can convert strings that look like dates into actual dates. Question 7: How does the formula in cell H2 work on sheet Step3? The formula is: =IF(F2=1,VALUE(MID(I3,8,3)), IF(F2=2,VALUE(MID(I2,8,3)),H1)) For the split numbers, we use a similar strategy as used for the dates. If its a report date line (F2=1), we extract the split number from the next row. If its a split name an number line (F2=2), we extract the split number from the current row. If F2¿2, the split number can be obtained from the row immediately above the current row. The string function MID() pulls out the split number by starting at position 8 of the text in column I and taking the next 3 characters. The VALUE() function converts the resulting string into an actual number. INFORMS Transaction on Education 3:3 (23-75) 48 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support EXCEL PIVOT TABLE TUTORIAL: AN EXAMPLE OF AN OLAP-TYPE ANALYSIS Objective Introduce OLAP-type analysis via Excels Pivot Table and Pivot Chart features. We will also use some more advanced Access and Excel functions which are useful for data analysis. Our perspective is one of a business analyst using data for decision support. The Tutorial Introduction We will use the call center example database discussed in class. The call center handles technical support calls for MS Office products. We are interested in using concepts from multidimensional data modeling, OLAP (online analytical processing), and data analysis functions to do some quantitative business analysis. The file is called CallCenter-DataWarehouse.mdb and can be found in the Downloads section of the course web. Make a copy of the database for yourself. Since its a multidimensional data warehouse, the data model is a star-schema. The main fact table, tblFACT Scen01 CallLog, contains over 25,000 records from 10/3/99−12/25/99. Each record is a single phone call. See the table design for descriptions of each of the fields. The two measures included in the fact table are: OnHoldMins ServiceTimeMins Time in minutes customer spends on hold before talking to tech support person. Time in minutes spent on the phone by the tech support person with the customer. INFORMS Transaction on Education 3:3 (23-75) 49 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support There are four dimension tables in the star-schema. Look inside each table in both datasheet and design view in Access to learn more about them. tblDIM tblDIM tblDIM tblDIM Customer Application Problem Time About the customers. About the MS Office Applications supported. About the nature of the problem. About each hour of each date in 10/3/ 99 10/25/99. (12 weeks) Date/Time Analysis with Access Queries Before getting into OLAP with Excel Pivot Tables and Pivot Charts, lets explore some of the capabilities in Access for analyzing date/time data. Both Access and Excel have a wide range of functions to facilitate date/time analysis as this is a very common business analysis need. Some of the Access functions included Month(), Year(), Weekday(), and the very general DatePart() function. Using these and/or perhaps other date/time related functions in Access, create the following queries using ONLY the fact table tblFACT Scen01 CallLog (i.e. you cant use tblDIM Time). Save your query with the same name that appears in the title bar of each query result. Problem 1a Number of phone calls arriving by year and month Problem 1b Average time on hold by hour of day and day of week INFORMS Transaction on Education 3:3 (23-75) 50 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support Looking through the previous query, do you see some potential problem times and days during the week? Problem 2 General Exploratory Analysis with SQL The type of problems for which multidimensional models were created were queries such as summing volume over different categories. For example, create the following query. As you scroll through the table, notice that it can be difficult to see patterns. Also, the QBE window in Access is OK, but not great, for interactive, exploratory analysis. Problem 3 Limitations of SQL A common business analysis problem is to compare some measure such as average time customers spent on hold in different time periods. In Access, try to figure out how to create queries using just tblFACT Scen01 CallLog that will eventually give the following query result, where the numbers are the average of the OnHoldMins field for the month in the column heading. This is not terribly easy (if you get stuck, move on) and is meant to show how the limitations of SQL for relatively simple business questions. INFORMS Transaction on Education 3:3 (23-75) 51 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support OLAP Style Analysis with Excel Pivot Tables and Pivot Charts Excels Pivot Table and Pivot Chart tools are one of the most sophisticated features in Excel. They provide a simple, yet powerful, method for interactively slicing and dicing and displaying data. Multidimensional data is particularly well suited to Pivot Table analysis. Pivot Tables can be used to analyze data that resides in Excel or in an external data source such as Access or some other database. There is an Excel file called CallCenterPivot.xls (http://ite.pubs.informs.org/Vol3No3/Isken/CallCenterPivot.xls) which you can get from the Downloads section of the course web. Make a copy of it for yourself in your folder to work in. In order to facilitate the analysis, the data youll analyze is in the sheet entitled, Data. Well use it for Pivoting. All of fields included in the sheet are defined in the file OLAP-DataDictionary.xls (http://ite.pubs.informs.org/Vol3No3/Isken/OLAP-DataDictionary.xls). Ive also created a query called qselDataCube in the call center database, CallCenter-DataWarehouse.mdb (http://ite.pubs.informs.org/Vol3No3/Isken/CallCenter-DataWarehouse.mdb), that draws fields from the fact table as well as the four dimension tables. Well use this query to see how Excel can Pivot on data in Access. Start Excel and Read About Pivot Tables No sense repeating the stuff in the help files. Click on the Assistant, type pivot table and select About PivotTable reports: interactive data analysis from the help menu when the Assistant answers. Read it. First, well look at basic Pivot Table functionality using (http://ite.pubs.informs.org/Vol3No3/Isken/CallCenterPivot.xls). INFORMS Transaction on Education 3:3 (23-75) 52 the data within CallCenterPivot.xls c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support Creating a Pivot Table Select Pivot Table and Pivot Chart from the Data menu and youll get the first screen of the Wizard. Select the Data to use for Pivot Table INFORMS Transaction on Education 3:3 (23-75) 53 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support Select the Location to put the Pivot Table The Pivot Table Interface Now youll see: You build your pivot table by dragging and dropping fields from the Pivot Table toolbar into the different sections of the pivot table: rows, columns, pages, and data areas. Lets revisit the query we did in Problem 2. Drag Customer into the Page area, AppName into the Row area and ProbName into the Column area. In order to count the number INFORMS Transaction on Education 3:3 (23-75) 54 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support of calls, we can drag any non-null field into the Data area. Drag the CallID field into the Data area. Excel will create the pivot table and will guess that you want to sum the CallID field. The first result will look like this: There are many ways to modify the data field to change the summation to a count. On the Pivot Table toolbar there is a Wizard button. Push it (you can also right click anywhere in the pivot table and select Wizard from the menu). When you get into the Wizard, push the Layout button. Youll see: INFORMS Transaction on Education 3:3 (23-75) 55 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support Obviously you can drag and drop fields in this screen as well. Notice that Excel has summed the CallID field. Double click that button and youll get: You can get this same dialog box if you select the field in your pivot table, right click, and select Field Settings from the menu. Heres the revised pivot table: The drop down arrow on the Page field controls which data is displayed. Your choices are any one value or all. The drop down arrows on the row and column fields let you selectively exclude certain values. Try them to create the following: INFORMS Transaction on Education 3:3 (23-75) 56 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support Theres no law against having multiple fields in any section, so heres another variation on this layout. Try it. Subtotals for each field can be controlled by selecting the field, right clicking, and selecting Field Settings. Youll see: There are also a whole host of options for the entire pivot table. Activate any cell in the pivot table, right click, Table Options. Youll see: INFORMS Transaction on Education 3:3 (23-75) 57 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support See online help to describe each of these options. Some are a bit complex. Formatting the Pivot Table You can pretty up your pivot table with various built in Report Formats. Yep, right click INFORMS Transaction on Education 3:3 (23-75) 58 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support Refreshing the Pivot Table If you make changes to the underlying data of a pivot table, you need to refresh the pivot table. As always there are several ways to do this. See the Pivot Table toolbar or right click while anywhere in the Pivot Table. Normal and Special Display of Values So far we have just displayed data in its normal form. Pivot tables also let you show data in many interesting ways. For example, we might be interested in the percentage of total calls placed by each customer. First, modify the pivot table so just total calls for each customer is displayed. Then activate a cell in the data area, right click, and select Field Settings. Then push the Options>> button and click the drop down arrow. Notice that Ive selected % of Column. The result is: INFORMS Transaction on Education 3:3 (23-75) 59 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support Pivot Charts It is easy to create a Pivot Chart from a pivot table. Just select the pivot table, right click, Pivot Chart (or click the Pivot Chart button on the Pivot Table toolbar). I changed our Pivot Table to display the data normally again (as opposed to % of column) and then created a Pivot Chart. Heres what you get: Pivot charts are live charts in the sense that you can manipulate them just like pivot tables. You can drag new fields in, reorient the axes, change the formatting, etc. For example, create the following pivot chart: Pivoting Using An External Database Pivot Tables arent just for Excel based data. For example lets see how to connect to an Access database. Start a new pivot table in an Excel worksheet. In the first screen of the Wizard, select External data source. Push the Get Data button. Select Access. INFORMS Transaction on Education 3:3 (23-75) 60 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support Browse to the call center database weve been using, CallCenter-DataWarehouse.mdb (http://ite.pubs.informs.org/Vol3No3/Isken/CallCenter-DataWarehouse.mdb). Youll then see a list of all the tables and queries in that database. Select the query qselDataCube and push the > button. Youll see: Click Next and then Next two more times. On the last screen of the Wizard, you have a few options regarding what to do with the data. By default, the first option, Return Data to Microsoft Excel, is selected. Leave it and just click Finish. Youll be returned to the Pivot Table Wizard where you left off. Now just keep going as you would with any other pivot table. INFORMS Transaction on Education 3:3 (23-75) 61 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support Drilling Down to Underlying Data Double clicking on a data cell in a pivot table will create a new worksheet with the underlying data detail records that correspond to that cell. Try it. We have only briefly touched on the basics of pivot tables and pivot charts. There are many more advanced features which you can explore as you find the need. For example, there are ways to create calculated custom fields and data items within pivot tables. Also, in the last screen of the Wizard for getting external data that we just used, there was an option to Create an OLAP Cube from the data. This is a slightly more sophisticated way of structuring the data in which hierarchical relationships such as time periods (minutes, hours, days, weeks, months, quarters, years) are modeled explicitly and which facilitates automatic roll-ups of data. Read in help About creating an OLAP cube from relational data if youre interested. Your Job Now that youre a bit familiar with pivot tables, lets use them to do some exploratory analysis. You are the manager of this call center. Youve heard some grumbling about long waits on hold from some customers. Since you dont want to overreact to anecdotal evidence, you decide to explore the data warehouse a bit to find out more since OnHoldMins is a field in the database. Also, the actual staffing level of the call center is summarized by day of week and hour of day in tblStaffLevelSummary. That table includes a field called WeeklyHourBin which ranges from 1-168 and represents each hour of the week starting on Sunday at midnight. Some basic questions you might want to address are: 1. Does overall average time on hold differ by time of day, by day of week, by both? 2. Do different customer types experience different hold times on average? What about by time of day and day of week? As the manager, your primary way of controlling caller time on hold is through staffing and scheduling. Youd like to see a summary showing average hold time and average staff utilization by day of week and time of day. Seems like a pretty straight-forward thing for a manager of a call center to want. Turns out its a bit of a bear to create this using SQL in Access. Heres part of the final summary query result (see qsumUtilization Step4 FinalSummary in the call center database). If you want to see how this was done, see qsumUtilization Step1,2, & 3. INFORMS Transaction on Education 3:3 (23-75) 62 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support Your job is to get the results of this query from the call center database into Excel somehow (obviously, without retyping it J) and create a graph or two (using either Pivot Charts or regular old Charts) which provide some insight into the relationship between staff utilization, time on hold, time of day, day of week, etc. Use your imagination. Heres two things I did. It would be nice to see a line showing staff utilization on a second y-axis as well on the previous graph. See if you can figure out how to do that. Heres a scatter graph (not a Pivot Chart) relating average time on hold to staff utilization for each of the 168 hours of the week. Hmm, seems to be some sort of pattern? What do you think? From Data Analysis to Modeling In this module, youve analyzed some call center data using pivot tables and pivot charts. Youve taken the perspective of the manager of the call center. As the manager, certain questions would likely arise based on the analysis. INFORMS Transaction on Education 3:3 (23-75) 63 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support 1. It seems like theres some times where staff utilization is really low. I wonder if we reduce staffing how the average time on hold will be affected? 2. There are times when average time on hold and staff utilization are both pretty high. I bet we could reduce average time on hold by adding staff (thus reducing staff utilization). How much staff would I need to reduce average time on hold to 30 seconds? 15 seconds? 3. Average time on hold isnt the only important performance measure. Were also interested in the percentage of customers who have to hold and the percentage of customers who hold less than 1 minute. How could we calculate these measures from the current data and how could we figure out how they will be affected by changes in staff levels? 4. We currently just have our tech support staff working 8-hour shifts. The midnight shift start is from 12a-8a, the day shift is from 8a-4p, and the afternoon shift is from 4p-12a. If we were better able to estimate our staffing needs by hour of day and day of week, how would we figure out how to schedule our staff (e.g. shift start times and shift lengths) to meet these staffing needs? These are not easy questions but are the type routinely faced by managers of a myriad of customer service related systems including software tech support, retail sales (L.L Bean, Lands End), healthcare, police and emergency services, fast food, insurance companies, and product support. While historical data is useful for analyzing current system performance, we need something else to handle What if? or Whats best? type questions. We need models. Try to think a bit about what makes questions 1−4 difficult and well discuss modeling with spreadsheets next time. INFORMS Transaction on Education 3:3 (23-75) 64 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support EXCEL PIVOT TABLE TUTORIAL: AN EXAMPLE OF AN OLAP-TYPE ANALYSIS Objective Introduce OLAP-type analysis via Excels Pivot Table and Pivot Chart features. We will also use some more advanced Access and Excel functions which are useful for data analysis. Our perspective is one of a business analyst using data for decision support. The Tutorial Introduction We will use the call center example discussed in class. The call center handles technical support calls for MS Office products. We are interested in using concepts from multidimensional data modeling, OLAP (online analytical processing), and data analysis functions to do some quantitative business analysis. The data consists of over 25,000 records from 10/3/99-12/25/99. Each record is a single phone call. The two measured facts are: OLAP OnHoldMins ServiceTimeMins Time in minutes customer spends on hold before talking to tech support person. Time in minutes spent on the phone by the tech support person with the customer. Style Analysis with Excel Pivot Tables and Pivot Charts Excels Pivot Table and Pivot Chart tools are one of the most sophisticated features in Excel. They provide a simple, yet powerful, method for interactively slicing and dicing and displaying data. Multidimensional data is particularly well suited to Pivot Table analysis. Pivot Tables can be used to analyze data that resides in Excel or in an external data source such as Access or some other database. There is an Excel file called CallCenterPivot.xls (http://ite.pubs.informs.org/Vol3No3/Isken/CallCenterPivot.xls) which you can get from the Downloads section of the course web. Make a copy of it for yourself in your folder to work in. In order to facilitate the analysis, the data youll analyze is in the sheet entitled, Data. Well use it for Pivoting. All of fields included in the sheet are defined in the file OLAP-DataDictionary.xls (http://ite.pubs.informs.org/Vol3No3/Isken/OLAP-DataDictionary.xls). Start Excel and Read About Pivot Tables No sense repeating the stuff in the help files. Click on the Assistant, type pivot table and select About PivotTable reports: interactive data analysis from the help menu when the Assistant answers. Read it. INFORMS Transaction on Education 3:3 (23-75) 65 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support First, well look at basic Pivot Table functionality using (http://ite.pubs.informs.org/Vol3No3/Isken/CallCenterPivot.xls). the data within CallCenterPivot.xls Creating a Pivot Table Select Pivot Table and Pivot Chart from the Data menu and youll get the first screen of the Wizard. Select the Data to use for Pivot Table INFORMS Transaction on Education 3:3 (23-75) 66 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support Select the Location to put the Pivot Table The Pivot Table Interface Now youll see: You build your pivot table by dragging and dropping fields from the Pivot Table toolbar into the different sections of the pivot table: rows, columns, pages, and data areas. Lets revisit the query we did in Problem 2. Drag Customer into the Page area, AppName into the Row area and ProbName into the Column area. In order to count the number INFORMS Transaction on Education 3:3 (23-75) 67 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support of calls, we can drag any non-null field into the Data area. Drag the CallID field into the Data area. Excel will create the pivot table and will guess that you want to sum the CallID field. The first result will look like this: There are many ways to modify the data field to change the summation to a count. On the Pivot Table toolbar there is a Wizard button. Push it (you can also right click anywhere in the pivot table and select Wizard from the menu). When you get into the Wizard, push the Layout button. Youll see: Obviously you can drag and drop fields in this screen as well. Notice that Excel has summed the CallID field. Double click that button and youll get: INFORMS Transaction on Education 3:3 (23-75) 68 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support You can get this same dialog box if you select the field in your pivot table, right click, and select Field Settings from the menu. Heres the revised pivot table: The drop down arrow on the Page field controls which data is displayed. Your choices are any one value or all. The drop down arrows on the row and column fields let you selectively exclude certain values. Try them to create the following: Theres no law against having multiple fields in any section, so heres another variation on this layout. Try it. INFORMS Transaction on Education 3:3 (23-75) 69 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support Subtotals for each field can be controlled by selecting the field, right clicking, and selecting Field Settings. Youll see: There are also a whole host of options for the entire pivot table. Activate any cell in the pivot table, right click, Table Options. Youll see: See online help to describe each of these options. Some are a bit complex. INFORMS Transaction on Education 3:3 (23-75) 70 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support Formatting the Pivot Table You can pretty up your pivot table with various built in Report Formats. Yep, right click Refreshing the Pivot Table If you make changes to the underlying data of a pivot table, you need to refresh the pivot table. As always there are several ways to do this. See the Pivot Table toolbar or right click while anywhere in the Pivot Table. Normal and Special Display of Values So far we have just displayed data in its normal form. Pivot tables also let you show data in many interesting ways. For example, we might be interested in the percentage of total calls placed by each customer. First, modify the pivot table so just total calls for each customer is displayed. Then activate a cell in the data area, right click, and select Field Settings. Then push the Options>> button and click the drop down arrow. Notice that Ive selected % of Column. INFORMS Transaction on Education 3:3 (23-75) 71 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support The result is: Pivot Charts It is easy to create a Pivot Chart from a pivot table. Just select the pivot table, right click, Pivot Chart (or click the Pivot Chart button on the Pivot Table toolbar). I changed our Pivot Table to display the data normally again (as opposed to % of column) and then created a Pivot Chart. Heres what you get: Pivot charts are live charts in the sense that you can manipulate them just like pivot tables. You can drag new fields in, reorient the axes, change the formatting, etc. For example, create the following pivot chart: INFORMS Transaction on Education 3:3 (23-75) 72 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support Drilling Down to Underlying Data Double clicking on a data cell in a pivot table will create a new worksheet with the underlying data detail records that correspond to that cell. Try it. We have only briefly touched on the basics of pivot tables and pivot charts. There are many more advanced features which you can explore as you find the need. For example, there are ways to create calculated custom fields and data items within pivot tables. Also, in the last screen of the Wizard for getting external data that we just used, there was an option to Create an OLAP Cube from the data. This is a slightly more sophisticated way of structuring the data in which hierarchical relationships such as time periods (minutes, hours, days, weeks, months, quarters, years) are modeled explicitly and which facilitates automatic roll-ups of data. Read in help About creating an OLAP cube from relational data if youre interested. Your Job Now that youre a bit familiar with pivot tables, lets use them to do some exploratory analysis. You are the manager of this call center. Youve heard some grumbling about long waits on hold from some customers. Since you dont want to overreact to anecdotal evidence, you decide to explore the data warehouse a bit to find out more. As the manager, your primary way of controlling caller time on hold is through staffing and scheduling. Youd like to see a summary showing average hold time and average staff utilization by day of week and time of day. The file WeeklySummary.xls (http://ite.pubs.informs.org/Vol3No3/Isken/WeeklySummary.xls) was created from the data warehouse and contains 168 rows, one for each hour of the week. Columns I and J contain fields called StaffUtil and AvgOnHoldMins, which give the overall percentage of time the staff is utilized fielding calls and the average customer time on hold, respectively. Some basic questions you might want to address are: 1. Does overall average time on hold differ by time of day, by day of week, by both? 2. Do different customer types experience different hold times on average? What about by time of day and day of week? Your job is to create a graph or two (using either Pivot Charts or regular old Charts) which provide some insight into the relationship between staff utilization, time on hold, time of day, day of week, etc. Use your imagination. Heres two things I did. INFORMS Transaction on Education 3:3 (23-75) 73 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support It would be nice to see a line showing staff utilization on a second y-axis as well on the previous graph. See if you can figure out how to do that. Heres a scatter graph (not a Pivot Chart) relating average time on hold to staff utilization for each of the 168 hours of the week. Hmm, seems to be some sort of pattern? What do you think? From Data Analysis to Modeling In this module, youve analyzed some call center data using pivot tables and pivot charts. Youve taken the perspective of the manager of the call center. As the manager, certain questions would likely arise based on the analysis. 1. It seems like theres some times where staff utilization is really low. I wonder if we reduce staffing how the average time on hold will be affected? 2. There are times when average time on hold and staff utilization are both pretty high. I bet we could reduce average time on hold by adding staff (thus reducing staff utilization). How much staff would I need to reduce average time on hold to 30 seconds? 15 seconds? 3. Average time on hold isnt the only important performance measure. Were also interested in the percentage of customers who have to hold and the percentage of customers who hold less than 1 minute. How could INFORMS Transaction on Education 3:3 (23-75) 74 c INFORMS ISSN: 1532-0545 ISKEN Data Cleansing and Analysis as a Prelude to Model Based Decision Support we calculate these measures from the current data and how could we figure out how they will be affected by changes in staff levels? 4. We currently just have our tech support staff working 8-hour shifts. The midnight shift start is from 12a-8a, the day shift is from 8a-4p, and the afternoon shift is from 4p-12a. If we were better able to estimate our staffing needs by hour of day and day of week, how would we figure out how to schedule our staff (e.g. shift start times and shift lengths) to meet these staffing needs? These are not easy questions but are the type routinely faced by managers of a myriad of customer service related systems including software tech support, retail sales (L.L Bean, Lands End), healthcare, police and emergency services, fast food, insurance companies, and product support. While historical data is useful for analyzing current system performance, we need something else to handle What if? or Whats best? type questions. We need models. Try to think a bit about what makes questions 1-4 difficult and well discuss modeling with spreadsheets next time. INFORMS Transaction on Education 3:3 (23-75) 75 c INFORMS ISSN: 1532-0545