WebTrends 7 Implementation Guide
Transcription
WebTrends 7 Implementation Guide
WebTrends 7 Implementation Guide July 2004 Edition © 2004 NetIQ Corporation Disclaimer This document and the software described in this document are furnished under and are subject to the terms of a license agreement or a non-disclosure agreement. Except as expressly set forth in such license agreement or non-disclosure agreement, NetIQ Corporation provides this document and the software described in this document “as is” without warranty of any kind, either express or implied, including, but not limited to, the implied warranties of merchantability or fitness for a particular purpose. Some states do not allow disclaimers of express or implied warranties in certain transactions; therefore, this statement may not apply to you. This document and the software described in this document may not be lent, sold, or given away without the prior written permission of NetIQ Corporation, except as otherwise permitted by law. Except as expressly set forth in such license agreement or non-disclosure agreement, no part of this document or the software described in this document may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, or otherwise, without the prior written consent of NetIQ Corporation. Some companies, names, and data in this document are used for illustration purposes and may not represent real companies, individuals, or data. This document could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein. These changes may be incorporated in new editions of this document. NetIQ Corporation may make improvements in or changes to the software described in this document at any time. © 1996-2004 NetIQ Corporation. All rights reserved. U.S. Government Restricted Rights: If the software and documentation are being acquired by or on behalf of the U.S. Government or by a U.S. Government prime contractor or subcontractor (at any tier), in accordance with 48 C.F.R. 227.7202-4 (for Department of Defense (DOD) acquisitions) and 48 C.F.R. 2.101 and 12.212 (for non-DOD acquisitions), the government’s rights in the software and documentation, including its rights to use, modify, reproduce, release, perform, display or disclose the software or documentation, will be subject in all respects to the commercial license rights and restrictions provided in the license agreement. Trademarks WebTrends is a registered trademark of NetIQ Corporation. Additional trademarks of NetIQ Corporation include: FastTrends, WebTrends SmartView, WebTrends Report Exporter, GeoTrends, WebTrends Express Viewer, WebTrends SmartSource Data Collector, WebTrends SmartReports, WebTrends On Demand, WebTrends Tech Tools, Log Analyzer, WebTrends Live, and WebTrends Reporting Center. Other brands and their products are trademarks or registered trademarks of their respective holders. ii WebTrends Implementation Guide Support Sales and General Contact Information Service and Support Online Resources NetIQ Corporation 3553 N. First St. San Jose, CA 95134 Direct Technical Support: Customer Resource Center. A portal to resources that can help you make the most of your on-line web initiatives. Phone: 1-408-856-3000 Fax: 1-408-273-0578 Sales: 1-888-323-6768 Email: info@netiq.com Americas: +1 503-223-3023 Asia Pacific, Australia, New Zealand: +1 503-223-3023 http://www.netiq.com/webtrends/resourcecenters.asp Europe, Middle East, Africa: +353 (0) 91 -782 677 http://www.netiq.com/ support WebTrends Portland, Oregon 851 SW 6th Ave. Suite 700 Portland OR 97204 WebTrends Consulting and Training: http://www.netiq.com/ services/analytics.asp Knowledge Base. Answers to questions most commonly asked: http://www.netiq.com/support/kb/default.asp Phone: 1-503-294-7025 Fax: 1-503-294-7130 US Toll Free: 1-888-932-8736 Email: info@webtrends.com Web:http://www.webtrends.com • iii iv WebTrends Implementation Guide Table of Contents Chapter 1 Introducing Web Analysis .......................................................................11 The Purpose of This Book .............................................................................................................. 11 WebTrends Edition Icons .................................................................................................12 Who Should Read This Guide? ....................................................................................................... 13 What is Web Analysis? ..................................................................................................................... 13 Developing intelligence about web customers ...............................................................14 How This Guide Fits with Your Strategy ...................................................................................... 15 Measurable Improvement Cycle ..................................................................................................... 18 Problems That You Will Solve ....................................................................................................... 20 Chapter 2 Defining Your Objectives and Critical Metrics...................................... 27 Your Site’s Higher-Level Goals ...................................................................................................... 27 Your Site’s Specific Objectives ....................................................................................................... 27 Your Site’s Business Metrics ........................................................................................................... 30 Content sites ........................................................................................................................32 Commerce sites ...................................................................................................................33 Lead-generation sites ..........................................................................................................34 Self-service sites ...................................................................................................................36 Intranet sites .........................................................................................................................37 Branding sites .......................................................................................................................37 Summary ............................................................................................................................................ 38 Objectives and Critical Metrics Worksheet ................................................................................... 39 • v Chapter 3 Collecting Your Web Activity Data......................................................... 41 Data Collection Methods ..................................................................................................................41 Using web server logs ........................................................................................................ 42 Using client-side tagging .................................................................................................... 49 Combining web server logs and client-side tagging ...................................................... 52 Hosted Versus Installed Software Solutions ..................................................................................52 Choosing a Data Collection Method ..............................................................................................53 Data Collection Worksheet ..............................................................................................................54 Chapter 4 Visitor Identification...............................................................................57 Defining Web Activity ......................................................................................................................57 Determining Unique Visitors ...........................................................................................................59 Sessionizing Your Visits ....................................................................................................................59 Visitor Identifiers ...............................................................................................................................61 Client IP address or domain name ................................................................................... 62 Combination of IP address and agent information ....................................................... 63 Cookies ................................................................................................................................. 64 Session IDs or IDs embedded in URLs .......................................................................... 67 Authenticated username .................................................................................................... 68 Summary ..............................................................................................................................................70 Finding the Features in WebTrends Products ..............................................................................71 Visitor Identification Worksheet .....................................................................................................72 Chapter 5 Defining Behaviors .................................................................................73 Focusing the Scope of Analysis .......................................................................................................75 URL classification ............................................................................................................... 75 WebTrends methods of URL classification ................................................................... 77 vi WebTrends Implementation Guide Other site structure issues ..................................................................................................87 Summary ............................................................................................................................................. 90 Finding the Features in WebTrends Products .............................................................................. 91 Defining Behaviors Worksheet ....................................................................................................... 92 Chapter 6 Filtering and Analyzing Your Data ........................................................ 93 Setting Up Your Profile—Initial Filtering ..................................................................................... 94 Hit and Visit Filters ........................................................................................................................... 95 Hits ........................................................................................................................................95 Visits ......................................................................................................................................95 Hit filter criteria ...................................................................................................................96 Visit filter criteria .............................................................................................................. 104 Handling Multiple Filters ............................................................................................................... 108 Data aggregation ............................................................................................................... 109 Table filtering .................................................................................................................... 110 Custom Reports ............................................................................................................................... 112 Parent-child profiles—a structural alternative to custom reports and/or filters ... 115 Summary ........................................................................................................................................... 116 Finding the Features in WebTrends Products ............................................................................ 116 Filtering Worksheet ........................................................................................................................ 117 Chapter 7 Acquisition Metrics ............................................................................... 119 Introduction ..................................................................................................................................... 119 What the Business Person Wants to See ..................................................................................... 120 Entry/Landing page ........................................................................................................ 120 Collecting the Right Data ............................................................................................................... 122 Referrers ............................................................................................................................ 123 Ad campaigns .................................................................................................................... 126 • vii Search engines ...................................................................................................................130 Email marketing ................................................................................................................134 Summary ........................................................................................................................................... 136 Finding the Features in WebTrends Products ........................................................................... 136 Acquisition Metrics Worksheet ..................................................................................................... 137 Chapter 8 Conversion Metrics ............................................................................... 139 Introduction ..................................................................................................................................... 139 Understanding Navigation Measurement ................................................................................... 141 Path analysis .......................................................................................................................142 Scenario analysis ................................................................................................................147 Internal Search ................................................................................................................................. 152 Exit Page and Exit Ratio Analysis ................................................................................................ 152 Visit-to-exit ratio ...............................................................................................................153 Dead-End Paths .............................................................................................................................. 154 Gleaning Demographic Information Through Registration Forms ....................................... 154 Evaluating Visitor Behavior by Browsing Your Site ................................................................. 156 Summary ........................................................................................................................................... 157 Finding the Features in WebTrends Products ........................................................................... 157 Conversion Worksheet ................................................................................................................... 158 Chapter 9 Retention Metrics.................................................................................. 159 Introduction ..................................................................................................................................... 159 Visitor Segmentation and Behavior Segmentation .................................................................... 160 Lifetime Value ................................................................................................................................. 162 Visitor History ................................................................................................................................. 164 Unique Visitors, Unique Buyers ................................................................................................... 167 Finding the Features in WebTrends Products ........................................................................... 168 Retention Worksheet ...................................................................................................................... 169 viii WebTrends Implementation Guide Chapter 10 Data Integration and Exploration ......................................................... 171 Data Integration and a Web Data Warehouse .......................................................................... 172 Tying your data to external databases ........................................................................... 172 Reporting from a web data warehouse ......................................................................... 175 Deeper Reporting and Exploration Using Excel ....................................................................... 176 Drill Down capability ...................................................................................................... 177 Working with dimensions and measures ...................................................................... 179 Overhead and monetary costs ........................................................................................ 183 Using reports for continuous improvement ................................................................ 184 Data Integration and Exploration Worksheet ............................................................................ 185 Chapter 11 Optimizing Your Analysis Environment...............................................187 Physical Data Storage Issues ......................................................................................................... 187 Log file rotation/rollover ................................................................................................ 187 Storage and performance issues ..................................................................................... 189 Performance issues .......................................................................................................... 196 Finding the Features in WebTrends Products ............................................................................ 198 Optimizing Worksheet ................................................................................................................... 199 Glossary..................................................................................................201 Index .................................................................................................... 235 • ix x WebTrends Implementation Guide Chapter 1 Introducing Web Analysis Like any integral part of your business that requires dedicated time, money, and employees, your web site needs to prove its worth. You need more information such as who is visiting your web site, which web pages they are visiting, the order of web pages they are visiting, and which pages they are ignoring. Fortunately, the answers to these questions are available through a technology called web analysis. But web analysis is more than just a sophisticated software package and some hardware that runs it. You will need to apply a fair amount of thought and work to implement and make effective use of web analysis to improve your web site. The Purpose of This Book This book helps to demystify the mechanics of web analysis, removing the barriers that have long kept organizations from reaping the benefits of the solutions provided by web analysis. This book discusses: • How to collect web traffic data • How to set up your web analysis solution to give you the answers you need • How to work with your software to get optimal performance with your web analysis • What to consider when setting up your organization to run web analysis These topics cover most of what any organization needs to know when choosing and implementing a web analysis solution. An in-depth discussion about these topics will give you an overall understanding of the field of web analysis and will help you initiate the process of analyzing your web site. By reading this book, you will obtain a comprehensive overview of all the options you have, which lets you make the choices that best suit your organization’s needs. You will also find a “Finding the Features in WebTrends Products” section in Chapters 4 through 11 that will link many of the chapters’s topics directly to WebTrends products. • Introducing Web Analysis 11 As an additional benefit, worksheets with pertinent questions are provided at the end of Chapters 2 through 11 to help you in your quest to find the right web analytic solution. Also, please consult the Glossary on page 201 for a brief explanation of many terms used in this book. WebTrends Edition Icons You will find icons for each WebTrends edition throughout the documentation. If a feature or content section applies to your edition of WebTrends, you will find the appropriate icon at the beginning of the section. For example, if you are licensed as a WebTrends On Demand, Small Business Edition user, features and content areas applicable to you include this icon: If the content does not apply to your WebTrends edition, you will see a “not applicable” version of your product icon: Important: Note that while your edition of WebTrends may include a feature, your ability to use it may be restricted by either licensing or your WebTrends Administrator. If you do not have access to a feature that is included in your edition, please see your WebTrends Administrator. Table 1-1. Edition Icons This icon: Represents this product: WebTrends Small Business Edition WebTrends Professional Edition WebTrends Enterprise Edition WebTrends On Demand Small Business Edition WebTrends On Demand Professional Edition WebTrends On Demand Enterprise Edition 12 WebTrends Implementation Guide Who Should Read This Guide? You should read this guide if you have purchased or are considering purchasing WebTrends products and you manage or are tasked with making these products work with your company’s web servers, customer relationship management databases, and other decisionmaking support tools. This guide contains a wealth of technical information that you will not find anywhere else. If you work in marketing, product management, business development, sales, or related fields, you might not be interested in many of the technical details involved in setting up and using WebTrends products, but you probably do want to know how to use web analytics to achieve insight into who your customers are, what they do on your site, and what goods, services or information they want from you. This guide discusses what kind of information you can get from web analytics, and how you can fit that information into the larger context of your e-commerce efforts. What is Web Analysis? The answer is not simple, because web analysis means a lot of different things to different people. Consider the following examples: • To an executive, web analysis can help to determine if the web site has been worth the financial investment. Does the site produce results (defined at a high level) and are these results improving over time, especially after a redesign? • To product managers, web analysis can help to reveal customer interest in an array of products and, consequently, affect product offering and pricing. • To an IT manager, web analysis involves determining how much traffic the site experiences so that he or she can ensure that web servers can deliver web content flawlessly. • To a technical support person, web analysis involves discovering whether a new series of online technical papers reduced customer support calls on a particular topic. • To a marketing professional, web analysis means finding out whether ad space purchased on an external site was actually effective. • To a web site programmer, web analysis means understanding which browsers and browser versions most visitors use so that the site can be designed to work optimally in those versions. • Introducing Web Analysis 13 • To a web content developer, web analysis is discovering traffic patterns that influence his or her design improvements. • To a sales person, web analysis is tracking which individual customers and prospects have been visiting on the web site in order to narrow the sales approach for a given customer or prospect. Yet these perspectives are actually the applied definition of web analysis. The mechanics of web analysis are a little different. From a mechanics perspective, web analysis is a three step process in which you: 1. Collect web activity data. 2. Analyze the data interests you. 3. Create meaningful reports on that data. The catch is that you can accomplish these three steps in many different ways. In the end though, each method arrives at a similar place—reports that help you determine whether your web site or a part of your web site is meeting its objective. But why is web analysis so frequently misunderstood? According to a Forrester Research report, only 23 percent of companies use web analysis to improve their online operations. The reason for this low turnout in the web analysis field is most likely because the basic concepts of web analysis and its implementation have never been fully discussed. Web analysis is often viewed as black magic that only a few, gifted individuals know how to perform. In fact, many organizations have web analysis applications but experience so much frustration when using them that they abandon them altogether. Still other organizations find that the solutions they chose are either not comprehensive enough or are too comprehensive for their needs. Developing intelligence about web customers By using WebTrends, you can develop more sophisticated and customer-centric information about your customers. Figure 1-1 shows how this intelligence can lead you on a path from vague, general statistics to a sharp picture of who your customers really are. 14 WebTrends Implementation Guide Figure 1-1. The evolution of web customer intelligence How This Guide Fits with Your Strategy The overall strategy for your web site probably involves a combination of quantitative, datadriven, “factual” approaches along with subjective judgements, gut feelings, and emotional reactions. Equally, in your efforts to improve your web site, it is important to combine the “soft” approach with the “hard” to get the best results. This means involving a range of people in coming up with site enhancement propositions. From a strategy perspective this guide discusses how to measure and analyze your web site data, the results of this analysis then feeds back into the soft and hard sides of your strategy, which then drives recommendations and improvements in your web site. • Introducing Web Analysis 15 Figure 1-2 shows an overview of how this guide relates with your overall web site strategy. Figure 1-2. Overview of web site strategy and this guide As part of your web site strategy, you need to identify the following: • The primary goals of your organization • The primary goals of your site • Goals of individual sections of the site • Successful visit profiles • The drivers to successful visits 16 WebTrends Implementation Guide WebTrends Consulting and Training Your strategy may require the help of WebTrends Consulting and Training so that your organization can implement, manage, and understand your WebTrends solution. WebTrends has expert consultants and trainers to help you meet the business requirements of your organization. WebTrends consulting engagements are focused on helping you implement your WebTrends solution and enabling you to manage WebTrends successfully on a day-today basis. Specialized training courses help your organization explore what WebTrends has to offer and gain a common foundation and understanding of how WebTrends applies to you. Together, WebTrends Consulting and Training helps you make the most of your WebTrends solution. What you’ll get: • Consulting and training from experienced web analytics industry experts • Faster return on investment and reduced time and resources required by your organization • Valuable knowledge transfer on how to manage WebTrends successfully Figure 1-3 demonstrates the phased approach that WebTrends recommends to optimize your use of the WebTrends reports. Defining your eBusiness strategy and performance metrics is a key starting point. With key metrics and reporting requirements defined, a reporting solution is then implemented which provides information critical to guiding and strengthening your eBusiness strategies. Figure 1-3. WebTrends Consulting and Training • Introducing Web Analysis 17 Measurable Improvement Cycle You can implement a simple process that will help you to improve your web site by following a few proven steps. This process is called the Measurable Improvement Cycle, and creates a continuous improvement loop in which efforts are repeatedly refined through measurement. Figure 1-4 shows the Measurable Improvement Cycle. Figure 1-3. The Measurable Improvement Cycle By applying this process to all web site decisions, it will help you to focus your benchmarks and make critical adjustments to your web site, helping you to improve each time you complete the cycle. Stage 1: Report Report on the key metrics for each of your site’s objectives: • Define the measurements you need. • Configure your analysis solution and web site as per your measurements. • Process and assemble site’s raw data into analysis reports. 18 WebTrends Implementation Guide • Provide analysis reports to appropriate department and individuals as needed. Stage 2: Analyze Use WebTrends to determine the performance of key metrics and site goals. Analysis in the form of reports allows you to: • Set baseline performance. • Evaluate the impact of site changes. Stage 3: Decide Determine what to do based on what the measurements tell you. Decisions might involve: • Changing your web site. • Altering marketing efforts. • Revising content strategy. • Updating your business model. Stage 4: Act Armed with the tables and graphs of your reports, you can optimize your site to improve performance of key metrics. • Change your web site’s pages according to your data. For example, you might tweak the steps in the shopping cart scenario. Remember that small incremental improvements are the goal. • Try A/B testing. On the web this means that you are sending 50% of your traffic to one page and 50% of your traffic to another page. However, A/B testing may result in a reduction of the desired action that you want from your visitors—such as registering or purchasing. • Filter x% of traffic to test against as an alternative to A/B testing. Just divert a small percentage of visitors to the alternate web page that you want to test. This may allow you to gather more accurate testing results. • Perform usability testing on the changes you made to your web site. • Introducing Web Analysis 19 Stage 5: Visitors React The visitors to your web site may behave differently than you expected. For example, in tweaking your shopping cart scenario, you may have caused some visitors to drop out of the process. You respond by measuring their reaction. Ongoing process You will experience more success as you keep with the improvement cycle. Effective incremental changes involve a process rather than an end-result. Sometimes you may need only to change one or two things before you do another analysis. Incrementally refining your changes might help you more than making wholesale alterations. Problems That You Will Solve This section looks at some sample problems that you might want to solve and directs you to sections of this manual that contain relevant information. Web Site Goal: Sell more products online The following concepts allow you to understand who is looking at your products and buying them—or abandoning the buying process and when that happens. To sell more products online, you want to streamline the navigation through your site, so that people can see the products and offers that you intended for them. • Path analysis Path analysis will tell you if people are easily navigating to your products or if they are showing some confusion in getting to your products. See “Path analysis” on page 142 for more information. • Scenario analysis A more specialized type of path analysis is scenario analysis. This type of analysis helps you discover if people are visiting all the pages in a scenario that you intended for them. For example, you can analyze a checkout sequence to see whether people complete the sequence or abandon it. 20 WebTrends Implementation Guide One typical problem that scenario analysis helps to identify is when shipping information is only available within the checkout process. In such cases, you’ll see a high number of abandonment on the page showing the shipping charge. These abandonments are from customers who are simply browsing and want to compare shipping charges with the competition. See “Scenario analysis” on page 147 for more information. • Filters Filtering allows you to understand which segments of people are looking at your products and buying them. See Chapter 6 “Filtering and Analyzing Your Data” on page 93 for more information. Web Site Goal: Find resellers for my products If you are a company that manufactures products such as clothing, you can use web analytics to help you identify resellers for your products. • Registration location and scenario analysis If your web site has one or more pages with a special link or button that allows visitors to sign up as resellers, then you can do two things. 1) You can vary the location of the registration link or button on the web page to determine if you get more clicks on it, because of its location. 2) You can use scenario analysis if the registration process has a sequence of pages. If potential resellers abandon the registration process at a certain point, then perhaps it is too complicated, and you may need to simplify the process. See “Scenario analysis” on page 147 for more information. • Filters If you are looking for a reseller in a certain part of the world, then you can filter your web traffic based on geography. See Chapter 6 “Filtering and Analyzing Your Data” on page 93 for more information. • Introducing Web Analysis 21 Web Site Goal: Distribute international leads If your company is getting online sales leads from many parts of the world, you will want to distribute those leads to the appropriate salespeople. • Content groups You could look at products that can be grouped together because they are of a similar type and then look at the people who selected those products. You might find that some products are being heavily selected from a certain part of the world and then assign those sales leads to the appropriate salespeople. See “Content groups” on page 77 for more information. • Filters You can filter web traffic based on geography. For example, you might look at sales opportunities that came from the United Kingdom and simply forward those leads to your UK salespeople. See Chapter 6 “Filtering and Analyzing Your Data” on page 93 for more information. • Custom reports You may want to develop custom reports after compiling information about visitor history and/or looking at registration information. The resulting custom reports can be tailored to the needs of your sales teams. See “Visitor History” on page 164 for more information. Web Site Goal: Sign up for newsletter Companies that have a newsletter can use a variety of tools to track how effective they are in getting people to view and sign up for that newsletter. • Ad views and clicks. If you want to find out how many visitors have viewed or clicked on the link to your newsletter, you should understand the concepts of “ad views” and “ad clicks.” See “Advertising views” on page 85. • Reverse path analysis You can see what route people took to get to your newsletter. You can examine those sections or pages are so inspiring that people decide they want to stay in touch with you. Also, using path analysis, you can also see where those people go after viewing the 22 WebTrends Implementation Guide newsletter. See “Path analysis” on page 142 for more information. • Parameter Analysis If you allow visitors to sign up for a variety of new topics (like a graduated opt-in), you could use parameter analysis to report on topics in which visitors are most interested. Additionally, you could correlate those topics of interest with other web site activity, such as Content Groups. • Scenario analysis If you have a specific set of steps that you want your visitors to take, and one of those steps (such as in a checkout sequence) offers the visitor an opportunity to sign up for your newsletter, then you will want to use scenario analysis to determine if the offer is placed in the correct step of the sequence. If people abandon the sequence at the point in which they should sign up for your newsletter, then perhaps the web page needs to be designed differently. See “Scenario analysis” on page 83 for more information. • Content groups You might want to find out what product or set of products have been visited the most over the past few months and then make that product or product set a centerpiece of an upcoming newsletter. To find out how groups of products are faring, you’ll use a concept called content groups. See “Content groups” on page 77 for more information. Web Site Goal: Optimize for search engines Search engines are often a catalyst that drive visitors to your web site. They play an increasingly important role in the web environment. • Search engine analysis Examine search phrases and keywords to see what words are bringing visitors to your site to learn whether your site is getting traffic from all the terms that you expect them to use. The results will let you take actions on your weak keywords. Which search engines are the most successful and least successful? You might also want to evaluate the quality of the traffic that the search engines brought to the site. Did various conversions occur? Did visitors spend a lot of time on the site? How many calls to action have been followed? • Introducing Web Analysis 23 See “Search engines” on page 130 for more information. • Ad campaigns If you set up an ad campaign—which is tied to a specific search engine—as a referrer, landing page, or landing page parameter, you can examine how effective that campaign is. This could help you to determine which “paid” search engines are most effective. Which ad campaigns are the most successful and least successful? You might also want to evaluate the quality of the traffic that the ad campaign generated. Did various conversions occur? Did visitors spend a lot of time on the site? How many calls to action have been followed? See “Ad campaigns” on page 126 for more information. • Spider and robot report You can determine how much of your raw traffic is attributed to spiders, which ones are indexing your site, and how deep in your site they are going. Spiders and robots are automated programs that crawl through the Internet to collect and index information, usually on behalf of a search engine or a monitoring company. You can use the report analysis to block spiders and robots from your web site. Web Site Goal: Increase customer retention Using web analysis, you can determine how well your web site is retaining customers. Consider these concepts: • New vs. returning visitors Learn about your new visitors and repeating visitors. Find the conversion rate of new vs. returning visitors. See “Determining Unique Visitors” on page 59 for more information. • Visitor behavior—frequency, recency, and latency Identify which visitors frequently return to your web site, how quickly they return to your site, and how much time elapses between visits. Once you’ve understood your visitor’s behavior you can present them appropriate advertising and thereby increase their monetary value. Understanding the visit cycle length can also influence how often you change the look of your hope page, rotate featured products, and add new products. See “Visitor Segmentation and Behavior Segmentation” on page 160 for more information. 24 WebTrends Implementation Guide • Path analysis Compare the navigation of visitors who purchase products to those who do not, and then fine tune your web site according to what you’ve learned. See “Path analysis” on page 142 for more information. • Introducing Web Analysis 25 26 WebTrends Implementation Guide Chapter 2 Defining Your Objectives and Critical Metrics Your Site’s Higher-Level Goals Every web site has primary goals. Some sites sell products or provide information. Others specialize in games to provide entertainment. For example, the ultimate goal of a self-service web site is to save costs associated with other methods of customer service (such as email and technical support) versus generating revenue. Many sites serve a combination of purposes. For example, a symphony orchestra’s web site typically provides information about the organization and sells tickets to its concert season. Web sites for large companies often consist of individual sections with differing goals for each section. For example, one section might contain a series of pages devoted to commerce while another section offers customer service and a link to another section for lead generation. Keep in mind what constitutes a successful visit to your site. For a commerce site, this usually means that a visitor purchased a product online. And that visitor probably went through several steps—as defined in a shopping cart scenario—to complete the purchase. How do you improve your web site? What would make more visitors buy your products, play your games, complete the lead generation questionnaire? After you define what you want to improve, you can look more closely at your site’s objectives. Your Site’s Specific Objectives Although web sites are unique entities that serve a variety of purposes, some objectives are considered universal to nearly every site: • Defining Your Objectives and Critical Metrics 27 • Increase visitor satisfaction - making site more convenient and valuable to visitors • Decrease acquisition costs • Increase conversion rates • Improve customer/visitor retention • Increase your web ROI. However, since no two web sites are alike, each site can have individually tailored objectives. Table 2-1 identifies several types of web sites and some corresponding objectives. Table 2-1. Site Objectives Site Objective Business Goal Visitor Goal Web Analysis Focus Commerce • Increase sales and generate revenue • Complement offline channels • Increase average order size • Research products • Buy products • Buying & research behavior • Obstacles to purchase • Visitor-to-buyer conversion • Abandonment analysis • Campaign effectiveness • Purchase drivers Lead Generation • Generate quality leads • Increase sales opportunities • Research products/ services • Collect more information • Contact a representative • Research behavior • Visitor-to-lead ratio • Lead quality & cost • Campaign effectiveness • Call to action optimization Informational • Distribute information • Enhance marketing and service • Reduce costs • Find information • Conduct research • Info-seeking behaviors • Ease of use and success • Electronic vs. traditional costs • Other supporting goals • Ad tracking, sponsorships, etc. 28 WebTrends Implementation Guide Site Objective Business Goal Visitor Goal Web Analysis Focus Entertainment • Develop audience loyalty • Monetize through ads or commerce • Brand building Have fun • Frequency, depth, and length of visits • Popular audience interests for targeting/ segmenting • Ad tracking, sponsorships, etc. • Conversion from “entertainment” visits to other “revenue” or “branding” behavior visits Portals & Media Sites • Generate revenue through ads, referrals, paid search placements, visitor services • Build loyalty • Increase page views per visit • Increase visit frequency • Subscriptions to magazine, newspaper, and online publications • Quickly and easily find information • One-stop information source • Frequency and quality of visits (are they an ad clicker?) • Advertising revenue generated • Visitor interest in content and preferences for segmentation • Audience growth, loyalty, engagement Customer SelfService • Provide service online • Reduce service costs • Speed resolution rate • Offer problem resolution • Offer knowledge base information • Quickly and easily find answers to resolve issues • Visit frequency and duration • Issue resolution rate • Tracking of email inquiries after reviewing help pages • Most successful type of help content/pages • Defining Your Objectives and Critical Metrics 29 Site Objective Business Goal Visitor Goal Web Analysis Focus Corporate Portal/Intranet • Leverage Knowledge Base • Streamline operations • Provide access to critical applications Quickly and easily perform duties • Visit frequency and duration • Most popular content/ pages • Completion of a series of steps (scenario) • Visitor/departmentlevel activity. Of course, most sites have multiple objectives and consequently fall into several of the above categories. Businesses generally focus on more than just one task. For example, a company selling products will be concerned about customer service and lead generation for higher-end products. Also, large companies with multiple divisions may share portions of a web site and have numerous objectives. The message is clear: you must look at the chief characteristics of your web site. What does your web site do? What are the handful of metrics that will tell you that you are successful? Your Site’s Business Metrics What are the metrics that show you whether or not you are achieving the goals for your site? You need concrete measurements to know what you can improve. Regardless of your specific site objectives, you’ll want to measure the conversion rates of some scenarios (or steps through your site) to get a high-level view of your site’s effectiveness. For example, a commerce web site will probably examine the percentages of visitors that: 1. Visit your shopping section. 2. View a product. 3. Add a product to the shopping cart. 4. Start checking out. 5. Finish checking out. 30 WebTrends Implementation Guide A customer self-service web site may be interested in the percentages of visitors that 1. Log in to members page. 2. Visit various pages with pertinent topics. 3. Print or download information. 4. Log out. By measuring the visitors in each step of a scenario, you can determine where in the process you are losing the most people and then take action to improve the situation. The following subsections discuss metrics for several general web sites. The vast majority of web sites represent a combination of the following five business models, as shown in Figure 2-1. Figure 2-1. Web site business models • Defining Your Objectives and Critical Metrics 31 Content sites Content sites refer to media sites and specialty portals that are supported by sponsors and ads, subscriptions, premium services, and other means. Examples are Yahoo, CNN.com, Salon.com, and Consumer Reports. Content sites are typically interested in the following metrics: Average page views per visit Content sites desire an increasing amount of pages views per visit. By examining this metric in relation to content groups, you may gain more perspective on what areas are generating the most interest. Average visits per visitor How often are visitors returning each day, week, or month? This is an important metric that may indicate the success of a particular campaign. Clickthroughs of onsite ads Since many content sites are supported through advertising, monitoring the number of clickthroughs of these ads help you gauge the value of the ad. First-time versus returning visitors Does the content effectively engage visitors enough to make them return? By tracking the ratio between new and return visits over a period of time, you can determine if your site is attracting enough returning visitors. Average visit frequency and recency You will want frequency to be high and recency to be low to retain and grow your audience. Content group activity and history metrics If a content group experiences fewer and fewer visits, then you can investigate and take action. Number of search engine referrals The number of visits referred by search engines is usually a critical metric for most content sites. Specialized conversion rates Conversion rates typically explore how many visitors move from one step to the next in a scenario that you are monitoring. Media sites may want visitors to 32 WebTrends Implementation Guide register for topical newsletters to increase ad revenues and drive repeat traffic to the site. Commerce sites Commerce sites are sites where companies sell their products and services. Examples are Amazon.com, WalMart, Converse, and Diamond.com. Commerce sites are typically interested in the following metrics: Gross margin Companies with high gross margins (gross revenue less cost of goods) have more money to spend on business operations such as research and development. Gross margin return on Investment (GMROI) GMROI is Gross Margin divided by demand creation expense for that order. That is, Gross Margin dollars are divided by the cost of the demand creation activity that drove the sale. This comes from being able to track the most recent campaign. Net profit Represents the gross revenue minus taxes, interest, depreciation, cost of goods sold, and other expenses. Total sales Represents the total invoice value of sales, before deducting for customer discounts, allowances, or returns. Average order size Represents gross sales divided by the number of orders—this reveals the average amount spent on each order. The higher the average amount, the better you are at motivating buyers to purchase more. Accessory attachment rate This the overall rate at which accessories are added to an order. This is the measurement of the number of orders which have an accessory attached to the order, divided by the total number of orders. This measurement determines how to grow the overall average order size, as well as growing the gross margin/profit of a single order. Accessories typically have the highest gross margin on a site and significantly increase the profitability of an order. For example, the cables on • Defining Your Objectives and Critical Metrics 33 a DVD Player order may have as much profit dollars as the player. Sales conversion ratio Represents the ratio of visitors to sales and visits to sales. Customer retention rate Represents the number of repeat customers divided by the number of total customers over a period of time. Commerce sites strive for repeat business. Cost per sale Represents marketing expenses divided by the number of sales during a period of time. Low cost per sale means efficient marketing and a higher net profit. Customer acquisition cost This is marketing expenses divided by the total number of orders from unique, first-time buyers over a period of time. If it costs a lot to acquire new customers, then you may have to retool your marketing effort. Average lifetime value What is the value of your customers over a period of time. Is it increasing? Specialized conversion rates Conversion rates typically explore how many visitors move from one step to the next in a scenario that you are monitoring. An example of a specialized conversion rate for a commerce site: your site invites visitors to register for a newsletter or sign up for a contest. Compare how many visitors see the offer with how many actually sign up. Lead-generation sites Lead-generation sites offer information for sales processes by actively “capturing” visitors as leads. This usually occurs after visitors register or contact a sales representative. Examples include B-to-C web sites such as autos and homes, and Business-to-Business (B2B) web sites such as Siebel, Peoplesoft, and Boeing. 34 WebTrends Implementation Guide Lead-generation sites are typically interested in the following metrics: Visitor-to-lead conversion ratio This represents the percent of visitors that register or otherwise become a lead over a period of time. If this metric dips or peaks, you should evaluate conversion rates by acquisition source (campaigns). Total number of leads If the number of leads does not grow, then a site may need to be re-evaluated. Consider examining the number of leads from search engines, campaigns, partners, or the number of leads for different products or from a geographic region. Cost per lead Represents marketing expenses divided by the number of leads generated during a period of time. This metric contributes to understanding the cost of marketing campaigns and collateral. Lead close ratio This is the percentage of collected leads that ended up closing as a sale. If leads are “closed” through channels other than your web site, you may have to track lead closure manually. Average visits or page views per visitor If your site is seen as a resource, it may attract more leads that value the content. Marketing campaign conversion rate This is the general effectiveness of campaigns at driving visitors to register as leads. Specialized conversion rates Conversion rates typically explore how many visitors move from one step to the next in a scenario that you are monitoring. An example of a specialized conversion rate for a lead-generation site: your site wants to evaluate which methods (such as a newsletter or a webcast) lead to the highest closure rates. • Defining Your Objectives and Critical Metrics 35 Self-service sites Self-service sites focus on helping customers resolve issues and/or learn about uses of the product or service without the aid of human interaction. Self-service sites are often a component of another model but can stand alone. Examples are support/knowledge base sites of most manufacturers and software developers, and online banking. Self-service sites are typically interested in the following metrics: Average visits per visitor An increase or decrease of average visits per visit may be seen as positive or negative, depending on the site’s objectives. On the one hand, an increase is good for a governmental web site or an intranet maintained for employees, because it shows that visitors are performing many tasks, such as scheduling vacations, reading corporate policies, or checking on 401K plans. On the other hand, a software manufacturer may want the visits per visitor to decrease, indicating that people are finding what they need quickly. Average page views per visit The same considerations apply here as with visits per visitor. Compare average page views per visit with content groups to know whether a decrease or increase in activity is good or bad. Knowledgebase searches per visit How easy is it for visitors to find the information they want? If some knowledgebase articles are searched quite often, you may have to put better explanations into your product. Number of zero result queries This represents how often a visitor searches on a term and receives zero search results. You need to add new content if visitors received zero results after querying the same or similar keywords. Online resolution rate This rate is the percentage of site visits that resolve issues online versus those that need additional help over the phone or email. Percentage of total support requests handled online This information helps to identify which support options visitors are using and to what degree. If a certain option gets more attention than others, then you might consider upgrading the corresponding part of your product. 36 WebTrends Implementation Guide Specialized conversion rates Conversion rates typically explore how many visitors move from one step to the next in a scenario that is being monitored. An example of a specialized conversion rate for a self-service site: a cellular company might want to allow its customers to edit their general account information, modify their calling plans, or download new ring tones. Intranet sites Intranet sites are primarily company or organization sites that provide service for employees. Employees typically use intranet sites to schedule vacation, to download and print medical forms, to check up on company policies, and a variety of other tasks. Intranet sites have a lot of the same issues as self-service sites except that you know your total number of visitors (the employees). Therefore, the resulting reports will accurately reflect usage in relationship to a known number of visitors. Intranet sites would use the same metrics as the self-service sites. For example, by using scenario analysis you could look at the steps in a process such as filling out a vacation request form. Perhaps you would find that some employees abandon the process at a certain step because they are still unsure about their vacation plans. This would be similar to the steps explored in the Specialized Conversion Rate mentioned in the metrics for self-service. Branding sites Branding sites are those that seek to promote interaction with visitors and engage them with a brand. Sponsored by companies, initiatives, and/or events, branding sites intend to generate buzz, interest in a product/company, or stimulate sales. Note that these sites do not justify their existence on sales/leads generated or ad revenue. Examples of branding sites are absolut.com, movie sites, and Coca-Cola. Branding sites are typically interested in the following metrics: Unique visitors Monitoring unique visitors by day, week, month, quarter, and year helps to evaluate the effectiveness of your online branding. Depth of exploration This includes measures such as average page view per visit, length of time, and content group exposure. When tied to a campaign, you can find out to what • Defining Your Objectives and Critical Metrics 37 “depth” that campaign affected visitors. Repeat/returning visitors Successful branding sites attract multiple, continuous interactions with visitors. Average visit frequency, recency, and latency by content area visited These measurements continue the concept of sustained interaction with visitors. Loyal visitors, for example, are the ones that typically purchase more products. Specialized conversion rates The rate at which visitors play games, download coupons or screen savers, enter contests, etc., and then register with your site is very important. Summary After your company has firmly determined the objectives for its web site and determined which specific metrics to track, you can use WebTrends to get the reports that you need. These reports will influence the way you change your web site. You might, for example, improve the content in a sequence of steps that leads to the purchase of an item. In most cases, it is best to make small, incremental changes to your web site. You can then direct WebTrends to measure your visitors and get a new set of results to study. Of course, after you’ve made your changes, you may need to re-examine your site’s goals and objectives, and then add a new set of measurements. This is part of the continuous Measurable Improvement Cycle that was discussed in Chapter 1 on page 18. To help you think through the objectives and critical metrics of your web site, you can refer to the “Objectives and Critical Metrics Worksheet” on page 39. To begin understanding how to collect the data that you will explore with web analytics, continue to the next chapter, Chapter 3, “Collecting Your Web Activity Data” on page 41. 38 WebTrends Implementation Guide Objectives and Critical Metrics Worksheet Use the information you’ve just learned about high-level goals, specific objectives, and web site metrics to fill out this worksheet. Consideration Comments What are the high-level goals of your web site? What would a successful visit to your web site be? What business model is your site? (Commerce, Content, Self-Service Lead Generation or Branding/ Campaign) How would you improve your web site? What are more specific objectives for your web site? • Business goals • Visitor goals What do you need to measure to improve your site? • Defining Your Objectives and Critical Metrics 39 40 WebTrends Implementation Guide Chapter 3 Collecting Your Web Activity Data Now that you’ve established your site objectives and critical metrics, you can start collecting web activity data to be used for analysis. It’s likely that you already have some activity data for your web site in the form of traffic logs that are routinely collected by your hosting servers. Web activity data is a record of what web visitors clicked on, what web pages they visited, what time they visited a particular page, what browser they used to view your page, what page referred them—basically anything about what visitors did on your site. You might also be able to obtain information about who did that activity—not necessarily their name, but maybe their age range, whether they’re from Lincoln, Nebraska or Ouagadougou, Burkina Faso, and what salary bracket they’re in. Traditionally, web site analysis has relied on web server log files to provide insightful data on web activity. In fact, web site analysis was born almost accidentally when a couple of engineers realized that they might be able to make a marketable product—maybe even a few bucks for themselves—by re-packaging and clearly presenting the data that was recorded in a web server’s web data activity file. Since then, data collection methods have grown and kept pace with increasingly sophisticated and expanding web sites. Data Collection Methods Currently, WebTrends employs two of the leading methods to gain information about your web site. • The first method involves using a web server log file, which contains some basic information about the activity on your site. See “Using web server logs” on page 42 for more information. • The second method involves collecting data from the visitor’s machine by using clientside tagging to create a more detailed and customizable kind of log file than the standard logging available from server software. See “Using client-side tagging” on page 49 for more information. Many customers use both web server logs and client-side tagging. Web server logs can help • Collecting Your Web Activity Data 41 them to obtain IT-based metrics such as spiders, downloads, bandwidth, and errors. Clientside tagging can help them to get business metrics such as screen resolution and java enabled browsers. There are many other differences in the data collected by these two methods that may or may not be relevant to your analytics needs, and these differences are discussed in the irrespective subsections. Using web server logs Each time a visitor attempts to view something on your web site, download a file from your site, or in some other way requests something from your site, the web server—which holds and delivers the content for your site—adds a record to a log file. This record contains some basic information about the request the visitor made. Some of this information is known directly by the server, such as the time, date, what’s requested, and the size of what’s requested. Other information is obtained through a cooperative and heavily standardized relationship between the browser and the server, in which the visitor’s browser is programmed to send certain information, such as the IP address of the computer it’s running on and specifics about the browser version and operating system of the visitor’s computer. Most web server log files are text files that contain the following pieces of information: • Date and time that the visitor asked for something from the web server Required for time-sequencing records and identifying paths • The IP address (Internet Protocol address) or domain name of the visitor’s computer Not required, but strongly recommended. This may be used for visitor tracking—to get the domain of the visitor—and for looking up geographical information. • The web server’s name—on your web site Not required; not used • The web server’s IP address—on your web site as seen from the outside world Not required; not used • The method used in the request—such as GET, POST, and HEAD Not required, but it used for determining the type of action the visitor took, such as a page request or an upload. • The URL of the requested content Required. All content-related information is derived from this field. 42 WebTrends Implementation Guide • Any query parameters, if additional information is needed Not required but strongly recommended. Used for analyzing dynamic content. • The return code—successful or failed delivery of the request Not required. Used for reporting on user and system errors. • The number of bytes sent by the web server to the client Not required. Used for reporting on bandwidth usage. • The number of bytes sent by the client to the web server Not required. Used to report on the amount of data sent from visitors to the website. • The amount of time (in milliseconds) to fulfill the request Not required, but if present, this is used for reports involving server response time. • The port on the client machine used to send requests and receive the requested data Not required. Not generally used. • The client machine’s browser type and version number (also know as “the agent”) Not required. This is used for determining which browsers are in use, and for recognizing various types of spiders and search engine robots. • Cookie information, if the client machine has a cookie for your site Not required, though very useful for tracking unique visitors. Also, cookies can contain other, site-specific information, which can be analyzed and reported on. • Referrer information, if the visitor was sent to your site from an external site Not required. Used for recognizing how visitors arrived at your site, especially via search engines. Note: This logged information and the order in which it appears has been specified by the software contained in the web server that keeps the log files. For Microsoft systems, the software is called Internet Information Services (IIS). You can program the software to reorder or drop pieces of information that you might find unnecessary, but it is best to do this only after you’ve gained some expertise with web analytics. Each log entry appears as information on one very long line in the file. The following sample log entry has been split over several lines so that you can read it more easily: • Collecting Your Web Activity Data 43 2002-09-16 00:01:58 65.70.31.3 W3SVC82 HERC 209.224.1.170 GET /products/thingamajigger.html 200 4199 363 266 80 HTTP/1.0 Mozilla/4.72+[en]C-SBI-NC472++(Windows+NT+5.0;+U) WEBTRENDS_ID=192.168.32.180-3425858080.29527895 http://www.awebsite.com/thingamajiggerad.html Figure 3-1 explains this log entry by relating each bulleted item above to the corresponding information in the sample log entry. Figure 3-1. Sample log file explanation. Your log file can vary from this example, because you can configure your server to include the information you want. Also, the information available may vary according to the brand of server software (for example, IIS, iPlanet, or Apache). Please refer to the server software’s documentation for directions on how to activate logging. Note that if in IIS you enable logging for Process Accounting, you may cause a lot of unnecessary headaches. Note: For a more complete sample of a log file according to the format provided by Microsoft IIS versions 4 and 5, see NetIQ’s Knowledge Base article NETIQKB2382 (www.netiq.com/ kb/esupport/consumer/esupport.asp?id=NETIQKB2382). 44 WebTrends Implementation Guide In cases in which odd URLs have been produced by some content management systems, you may need programmers who can write scripts (that is, special code in a language such as Perl) to preprocess log files before giving them to WebTrends software. Note: WebTrends offers a built-in method, called conduit scripting, which can be used to massage log files from content management systems such as Vignette, BroadVision, and Macromedia Spectra. Log file rotation/rollover Web server logs can grow quickly and fill up your server, so you might need to transfer (also known as “rotation” or “rollover”) them from that server to another storage unit on an ongoing basis. Whether you keep log files on the same server or transfer them for storage, you will probably want to compare information over a period of time. This process is called historical analysis. For example, you can compare your top pages from month to month and find out if there is a trend. For many organizations a transfer of web server logs might occur on a set interval—perhaps once a day—but if a site experiences enormous amounts of traffic, these log files may be rotated off the server even more frequently, perhaps every hour. After they’re rotated off the server, a new log file begins. Each log file receives a name that makes it relatively easy to track. For example, the log file for a web site for October 3, 2002 might be called ex021003.log, in which the naming format is year/month/day. Log files for a busy site can have daily files that can reach gigabytes in size. To save disk space, once a file is rotated off the server, it is often compressed with an application like PKZip or WinZip. Fortunately, because the log files may contain many repeated elements—dates, URLs, browser and browser versions—you can compress the log files down to 5 to 10 percent of their original size. Figure 3-2 shows an example of how log files are rotated off of your servers and placed in a zipped archive or database. Note: For more information about log file rotation/rollover, see “Log file rotation/rollover” on page 187 in Chapter 11. • Collecting Your Web Activity Data 45 Figure 3-2. Sample rotation of log files Log file access The use of web server logs means that you will have to tell WebTrends the location of the web log data and how to access it. Most log files are either stored on a mapped network drive or on a remote server that may be accessed via File Transfer Protocol (FTP). It is often recommended that you store log files on the hard drive of the WebTrends machine. This helps to ensure availability (for example, when a server or network is down) and efficiency. If you use a scheduling mechanism to access the log files, you may need to provide WebTrends with the required username and password authentication information. If you choose to import the log files on a regularly scheduled basis (which is what most organizations would do), you need to realize that log files imported via FTP or HTTP are brought over in their entirety. You cannot transfer only the first 10,000 lines of a log file or the last 3,000 entries. Figure 3-3 on page 47 shows an example of how log files are rotated off of co-located servers. 46 WebTrends Implementation Guide Figure 3-3. Sample rotation of log files off of co-located servers • Collecting Your Web Activity Data 47 How frequently you import the log files depends on how much activity your site experiences. As a general rule, most sites bring over their log files once a day. However, if your site has high levels of activity and generates extremely large log files, you may need to transfer files more frequently. This reduces the data volume that must be handled at any given time. WebTrends is designed to recognize which files have already been imported, and only brings in files that contain new data. In comparison, accessing your log files from a network drive is a more familiar way of obtaining your log file data because WebTrends treats it as though the log files were stored locally. Don’t be fooled though, because in reality the data still needs to come across the network from the mapped drive. This data transfer greatly slows the entire analysis process. Note: One week’s worth of log file data will give you a snapshot of the volumes of activity on a site, but you will probably need three months’ worth of data to get a real insight into the trends. Once you understand the trends, then spikes and anomalies become evident and usually their cause can be traced and evaluated. Benefits of log files In general, the benefit of web server log files is that they tell you about the mechanism of delivering web pages, and—with a bit more work—they provide business metrics. • Most web servers generate them, so they are typically easily and immediately available. • You don’t have to decide in advance exactly what data you want to report on. Web server logs allow you to go back to the raw data at any point and change what you want to analyze—as long as the fields in the raw data were being logged initially. • Even when a server goes down, it does not lose the web server log data, because the data collection device and the server are one and the same. • Log files capture all downloads and non-HTML files in addition to HTML files. • You can get lots of IT-based metrics such as reports on spiders, downloads, bandwidth, load-balancing, and errors. Drawbacks of log files • If an ISP hosts your site, you may not have access to your log files. • Log files collect everything, even data you don’t care about. This may require more storage space. 48 WebTrends Implementation Guide • Corrupt log files – If the log file is there, but WebTrends cannot read it, then the log file might be corrupt. • Missing log files – Are you sure that they are not written elsewhere on the system? • Log file hell – If the web site is hosted on geographically disperse servers, WebTrends has to collect all the log files in one place and have a means of ordering the records from all the log files. It must then determine which hits are part of the same visit. If time stamps on the various web server logs are not in sync, results can be inaccurate. You must also have a way to handle server disruption, or the results can be inaccurate. • Log files can’t record repeat requests when a page is accessed from a caching server. • Inaccurate information because of proxy servers and content delivery networks, such as AOL, AT&T, and Earthlink. (See “Proxy server buffers” on page 63 for more information.) • Depending on the level of sophistication, the software installation and configuration may take time. The learning curve for this software is sharp and steep. • You must maintain the equipment and software yourself—unless an ISP does this for you. • You must write scripts (or purchase software containing ready-made scripts) to handle odd URLs that may need more processing to understand correctly. Using client-side tagging A second and increasingly popular method of collecting web activity data is through the use of client-side tagging. A tag is a small segment of code, called a script, which contains instructions that you can put on the web page you want to track and analyze. Client-side tagging works like this: when a user makes a request for a page that is being tracked with a tag, one of two things happens: either a web server plug-in automatically embeds a tiny script in the page as it is delivered to the visitor, or the web site manager manually embeds a small script in any page that he or she intends to track. Either way, the page delivered to the client contains some JavaScript code, which: 1. Creates a variable that contains the value of the URL, the URL query parameters that are present, the referring page, the date, and the time of the visit. 2. Makes an HTTP request to the data collection server, which is called the WebTrends SmartSource Data Collector (SDC). • Collecting Your Web Activity Data 49 The key to data collection is in the HTTP request, which is a transparent 1 pixel by 1 pixel image. In reality, the image request is just a transport vehicle for the variable, which contains the visit information. The information in the variable gets transported to the data collection server in the request. At the data collection server, the information in the variable is used to add a new record to a web activity file that you can use for web site analysis. Figure 3-4 shows a typical client-side tagging process. Figure 3-4. Sample tagging process Here are the basic steps of the tagging process: 1. A visitor wants to view a page on your site. This initiates a page request to your web server. 2. Your server sends the page to the visitor, and this page contains a JavaScript tag. 3. The tag triggers a request for a GIF with parameters attached. 4. The GIF file is sent to the visitor. 5. The request with the parameters is analyzed. The tagging method can actually be hosted externally, or you may end up hosting it onsite. Typically, if you want deeper analysis capabilities, you would handle the data collection internally to keep the data on hand. Most external hosting companies do not hold your data for an extended period, they simply offer you standard reports on summary web activity data. The tags put information into a dedicated data file for analysis. A typical data-file record might look like this: 50 WebTrends Implementation Guide 2001-03-04 00:08:18 proxy7.hotmail.com W3SVC3 web1 192.168.1.1 GET /ads/default.asp redir=products&ad=http%3A// www.boatdealer.com&WT.mc_n=Boat%20Dealer%20Campaign&WT.mc_t=Banner &WT.mc_s=3/3/2001&WT.mc_c=60&WT.ad=P-32,%20P-58,%20P72%20Options%20Offer&WT.sv=Web%20Server%201&WT.ti=Advertising%20Re direct&WT.tz=420&WT.ul=en&WT.cd=32&WT.sr=1024x768&WT.jo=Yes&WT.js= Yes&WT.co=Yes 200 0 1 75 1 80 HTTP/1.1 Microsoft+Internet+Explorer/ 4.40.305beta+(Windows+95) WEBTRENDS_ID=192.168.16.1481615253808.29527727 http://www.boatdealer.com/dealers/pacific/ dealerlist.htm The italicized text contains client-side tagging parameters, which were used to fetch the data from a database that populated the web page template, default.asp. Note that the increasing amount of information gathered for each record may quickly fill your SDC server. Therefore, this server must be monitored closely. You may need to transfer the data files to another server, as discussed in “Log file rotation/rollover” on page 45. Note: For its tags, WebTrends has developed special parameters called WebTrends SmartSource Parameters. In the above example, all WebTrends SmartSource Parameters begin with “WT.” Benefits of client-side tagging In general, the client-side tagging is extremely effective for attaining business metrics but not for examining the underlying web server behavior. • Client-side tags capture data for only the pages you want to track. This reduces the amount of data you have to store or process. • Client-side tags act as an automatic filter, because they don’t collect images and other kinds of hit data that you don’t want to collect. This automatic filtering helps reduce the size of your data files. • If you are using a hosted service, you can write off the cost of the service as an operating expense. • Client-side tags can be implemented on your web pages quickly. • Client-side tagging avoids problems of co-located servers and content served from multiple sites. • Because the script runs each time the page loads, you have accurate visit and page counts, even when pages are loaded from a caching or proxy server. • Collecting Your Web Activity Data 51 Drawbacks of client-side tagging • Client-side tags require additional hardware to run the data collection server. • Client-side tags require time or software to embed the script in each page you want to track. • Unless error pages have the script embedded in the tags, you cannot track errors. • If a browser is not enabled to run the scripts, you can only get page and visitor counts, not details about what was visited. • If a redirect page does not contain the script, it will not get counted. This could be crucial if you are using redirect pages to track advertisements. • Downloads are very difficult to track with client-side tagging. • If a page load is interrupted before the script is run, the visit to the page does not get recorded. • If a crawler or spider does not run the script (which most don’t), its visit is not captured. • Without custom configuration, client-side tags only capture HTML pages (such as .htm, .asp, .html). Consequently, downloads are very difficult to track. Combining web server logs and client-side tagging Companies that analyze data from web server logs and client-side tagging have the best of both worlds. They can use the log files to get information about the web server activity— primarily IT-based metrics such as reports on spiders, downloads, bandwidth (for example, bytes delivered), load-balancing, and errors. They then use client-side tagging to get higherlevel business metrics. Hosted Versus Installed Software Solutions After choosing whether to use web server logs or client-side tagging, you need to determine if you want to hire a service to do that for you (called a hosted solution) such as WebTrends On Demand, or if you would rather be responsible for collecting and analyzing all the data yourself (called a non-hosted solution) and purchase stand-alone software such as WebTrends Enterprise. 52 WebTrends Implementation Guide Using a hosting service is an attractive option for several reasons. The foremost reason is that you don’t have to maintain the web analysis software or hardware, and you can write off the service as an operating expense. Also, a hosting service arrangement doesn’t require the additional setup time that complex software solutions require. If you don’t like the service, you can easily cancel or finish out the contract and disable the data collection. In contrast, installed software (non-hosted) solutions provide greater flexibility regarding the data you can analyze and in the way you can present that data. With data collected from web server data files—the most common kind of non-hosted solution—you can store web activity data indefinitely in raw log file format or processed in a web data warehouse. This means that at any time you can re-analyze the data, combine it with external data sources, or run deeper analyses using third party software. Another key advantage of installed software is privacy, because you control the data, which is never stored on a third party server. Privacy is especially important for financial industries, such as banking and insurance. The main drawback of installed software is that you must maintain the software and hardware associated with your analysis solution. For this reason, the expenses are viewed by accounting as company assets, which are only depreciable and not deductible. Traditionally, the client-side tagging model has been primarily used as a hosted solution with products such as WebTrends On Demand and web server data file analysis has most often been used with software (non-hosted) solutions. However, with the advent of data collection servers, organizations can now use client-side tagging to collect activity data themselves (as a non-hosted solution) and either report on that data directly, or store the data in a web data warehouse. Choosing a Data Collection Method Which data collection method you should use really depends on the method that best meets your analysis needs and budget. If you know exactly what data you wish to analyze and you only want some basic web activity reports, using hosted client-side tagging may be the sensible choice. This method reduces the amount of data that you have to collect and minimizes web data activity file storage issues. For small businesses, the hosted client-side tagging is also the least expensive method that delivers basic reports such as Pages, Visitors, and Referring Site. On the other hand, if you think that you may want to shift your analysis approach down the road, and want to keep all your options open, collecting the raw web server log data or using a non-hosted data collection server gives you far more flexibility. • Collecting Your Web Activity Data 53 Some organizations choose to combine both web server log and client-side tagging methods. They generate standard reports using client-side tags or .asp model, but collect and store web server log data to allow flexibility later on. In the future, many organizations will probably find that using non-hosted client-side tagging solution with a data collection server may be more attractive than using web server logs. They will be able to collect and store the same information that web server logs can, allowing more in-depth and flexible analysis and reporting, yet also offering immediate report generation on standard data. Data Collection Worksheet Use the following worksheet to understand how you want to collect data about your web site. Consideration Need access to log files? (Note: Hosted services don’t allow access to log files.) Need to keep data for an extended period of time to do comparisons? Capture information on all downloads (HTML and non-HTML files)? Use multiple or co-located servers? All servers are available at all times? Can afford up-front investment in terms of capital and training time? 54 WebTrends Implementation Guide Yes No Comments Consideration Yes No Comments Can maintain additional hardware equipment and software? Need to write off costs as an operating expense? Capture data of only specific web pages? Quick install/uninstall? Can afford extra hardware? Can embed code in each page to be tracked (also redirect pages)? Only care about HTML pages and business metrics (don’t care about IT-based metrics)? Prepared for software costs related to licensing? Have the people (IT) resources? Know what kinds of information needed (business and/or IT)? Have the storage retention/ space (time/how long)? • Collecting Your Web Activity Data 55 56 WebTrends Implementation Guide Chapter 4 Visitor Identification The main objective of web analysis is to understand how web visitors are using your site (what pages are visited and what actions are taken) so that you can determine if they are doing what you want them to do. • Are visitors responding to ads? • Are visitors making purchases? • Are visitors reviewing your technical support materials rather than calling your technical support personnel? These are questions that you can answer by using WebTrends. Your web activity data file, whether generated by the web server itself or collected and created by a data collection server, can tell you more about the activity on your site. But how can you tie activity to individual visitors? How can you tell whether a hit to a product information page and a hit to the pages of a shopping cart were all done by the same visitor? If you knew that, you could say that a particular visitor read the product’s description, decided to purchase it online, and then completed all the steps required for making a purchase. Tracking visitor activity can be quite complex, so it is important to keep in mind that you will spend more time, effort, and resources as you strive for more clarity and accuracy in understanding who your visitors are. Defining Web Activity From a high level, web activity includes which areas of the web site were visited, which products were viewed, and which actions were taken with those products. Visitors typically go through a path of pages. After you determine what the actions were and who did them, you can derive meaning from the activity (presented in easy-to-understand reports from WebTrends) and take action such as revise your web site or tailor messages for special • Visitor Identification 57 customers. From a low-level, you will want to know the definitions of several terms that are commonly used when discussing web activity. Hit Represents any individual item that is delivered from the server to the client. A single visitor action could result in dozens of hits. For example, when a web page is delivered to a client’s screen, it may arrive with graphics, icons, flashing ads, sidebars with links, frames, and other items that all count as hits. While the volume of hits is an indicator of web server traffic, it in not an accurate reflection of how much real information is being looked at. Important: “Hit” is one of the most misunderstood terms in web analytics. Please take time to understand this term rather than assume that you already know what it means. Page View A hit to any file classified as a page (such as, html, htm, psp, and asp pages). Note: For sites still using frames, an actual page viewed may consist of several HTML documents. Visit Denotes a sequence of a visitor’s hits up until the point in which the gap between two successive hits is greater than the defined timeout session length (usually thirty minutes). Much marketing research focuses on statistics for visitor sessions for a more accurate picture of user activity, multiple requests can be made within a single visitor session.Visits are equal to sessions, which is explained in more detail in “Sessionizing Your Visits” on page 59. Note: If you modify the session timeout length, you will get a different session visit count. For example, shortening the timeout length will increase the count in the number of visits. The payoff in your analysis of the web activity is in finding the visitor. Visitor Represents the person or agent that generates the visits. Agent indicates a program, such as a robot or spider that is used to visit web sites. 58 WebTrends Implementation Guide Determining Unique Visitors In order to associate web site activity to the actual visitors who performed that activity, you first need to uniquely identify the visitor responsible for each hit in a web data activity file. Once you have identified the unique visitor for each hit, you can then group all of the hits from a specific visitor into a visit session. In fact, WebTrends can do this for you by assigning all of the hits in a web data activity file to the visitors responsible for those hits. WebTrends also lets you track returning visitors. This means that when a visitor comes back for a new visit to your site, you can associate that visitor with his or her previous activity. By tracking what all your visitors are doing over time, you can establish major trends in visitor behavior on your site. The key here is to distinguish one web visitor’s actions from all other visitors’ actions. You don’t need to identify specifics about that visitor, such as John Smith, who lives at 204 Crest Circle, Chapin, South Carolina. By understanding what group of actions each unique visitor did, you can discern how visitors in general are using your site. Questions you can answer by identifying unique visitors include: • How many new visitors came to my site during a specific time interval? • How many returning visitors came to my site during a specific time interval? • Is the majority of the activity coming from new or returning visitors? • How much time are visitors spending on your site? So let’s look at the concept of sessionizing to understand who your unique visitors are. Sessionizing Your Visits Sessionizing is the process of assigning a unique visitor to one or more actions that occurred within a defined time period, or session. A session denotes a sequence of hits up until the point in which the gap between two successive hits is greater than the defined timeout session length (usually thirty minutes). The following example shows records from a typical data record. 2002-01-01 2002-01-01 2002-01-01 2002-01-01 00:12:12 00:19:59 00:24:43 00:29:59 217.194.141.67 - W3SVC3 HERC 192.168.1.1 GET 217.194.141.67- W3SVC3 HERC 192.168.1.1 GET 66.67.2.10 - W3SVC3 HERC 192.168.1.1 GET 24.166.12.188 - W3SVC3 HERC 192.168.1.1 GET • Visitor Identification 59 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 2002-01-01 00:40:46 00:41:22 00:44:00 00:44:17 00:46:13 00:48:24 00:59:59 01:01:13 01:03:02 01:04:40 01:06:32 01:09:01 01:09:18 01:10:51 01:11:30 01:12:22 01:14:48 01:17:06 00:29:59 01:19:52 03:19:59 03:21:02 03:23:29 03:25:34 03:33:55 03:39:59 03:43:08 03:59:59 04:00:00 24.166.12.188 - W3SVC3 HERC 192.168.1.1 GET 217.194.141.67 - W3SVC3 HERC 192.168.1.1 GET 165.91.171.109 - W3SVC3 HERC 192.168.1.1 GET 24.166.12.188 - W3SVC3 HERC 192.168.1.1 GET 66.67.2.10 - W3SVC3 HERC 192.168.1.1 POST 66.67.2.10 - W3SVC3 HERC 192.168.1.1 POST 206.213.251.31 - W3SVC3 HERC 192.168.1.1 GET 38.151.150.118 - W3SVC3 HERC 192.168.1.1 GET 217.194.141.67 - W3SVC3 HERC 192.168.1.1 GET 217.194.141.67 - W3SVC3 HERC 192.168.1.1 GET 206.213.251.31 - W3SVC3 HERC 192.168.1.1 GET 206.213.251.31 - W3SVC3 HERC 192.168.1.1 GET 38.151.150.118 - W3SVC3 HERC 192.168.1.1 GET 217.194.141.67 - W3SVC3 HERC 192.168.1.1 GET 12.47.246.6 - W3SVC3 HERC 192.168.1.1 GET 38.151.150.118 - W3SVC3 HERC 192.168.1.1 GET 217.194.141.67 - W3SVC3 HERC 192.168.1.1 GET 12.47.246.6 - W3SVC3 HERC 192.168.1.1 GET 24.166.12.188 - W3SVC3 HERC 192.168.1.1 GET 12.47.246.6 - W3SVC3 HERC 192.168.1.1 GET 38.151.150.118 - W3SVC3 HERC 192.168.1.1 GET 12.47.246.6 - W3SVC3 HERC 192.168.1.1 GET 217.194.141.67 - W3SVC3 HERC 192.168.1.1 GET 12.47.246.6 - W3SVC3 HERC 192.168.1.1 GET 192.11.223.116 - W3SVC3 HERC 192.168.1.1 GET 217.194.141.67 - W3SVC3 HERC 192.168.1.1 GET 63.232.193.82 - W3SVC3 HERC 192.168.1.1 GET 24.140.30.88 - W3SVC3 HERC 192.168.1.1 GET 217.194.141.67 - W3SVC3 HERC 192.168.1.1 GET If you look at the activity of 217.194.141.67 (remember that this is a visitor’s IP address), you will notice that it has two sessions, which are separated by a gap of at least thirty minutes. Figure 4-1 shows the two sessions: Figure 4-1. Sample of web data activity file sessions 60 WebTrends Implementation Guide In general, sessionizing requires two basic elements: • A time stamp, to determine the beginning and end of a visitor's session and to order hits in a time sequence • A visitor identifier that ties each hit in the web data activity file to the web visitor responsible for the hit The time stamp requirement is easily handled because web servers and data collection servers can add a time stamp to any hit recorded in a web data activity file. As long as Greenwich Mean Time (GMT) is used to indicate the time, servers that are located in different time zones will not have any problem understanding the time sequence of the data. The more complicated requirement is the visitor identifier. Visitor Identifiers You have several different methods at your disposal for identifying the visitor associated with web site activity. These methods include: • Client IP address or domain name • Combination of IP address and agent information • Cookie (persistent or session-only) • Session IDs • Data embedded in the URL • Authenticated user These methods are listed in order of increasing accuracy. The order also corresponds with the complexity of your site management. At the very minimum, you can examine the client’s IP addresses. The next best thing is the combination of IP and agent, but the very best method is authenticated users. In other words, the IP address is easy to identify while the authentication of users is much more difficult. Though each method has its strengths and weaknesses, you may encounter such issues as: The ambiguity of the visitor identifier If two visitors can have the same identifier at the same time, they will appear as a single visit by the same visitor. The problem with aliasing of a visitor identifier within a single session If a single visitor has more than one identifier (for example, an alias) within a • Visitor Identification 61 session, that visitor will appear to be multiple visitors, each having its own visit session. The problem with the persistence of the identifier across multiple sessions If a single actual visitor has two different identifiers from one session to the next, that visitor will appear to be two separate visitors. This causes an inaccurate count of unique visitors and new versus returning visitors. It also doesn’t allow you to accurately accumulate a single visitor's activity over the lifetime of that visitor. As we discuss the various methods for identifying visitors, you will recognize how each method has one or more of these three issues to contend with. Client IP address or domain name The easiest method by which to identify unique visitors involves using the visitor’s IP address or domain name. The domain name is the text name corresponding to the numeric IP address. Note: Domain Name Service (DNS) is the method that the internet uses to convert difficultto-remember numbers, such as 10.17.243.32, to easy-to-remember names, such as www.yahoo.com (which are easier to read and comprehend than a series of numbers). The reason for this conversion is because the underlying protocol for the internet, TCP/IP, uses difficult-to-remember numbers to connect to other computers. When a visitor comes to your site, either that machine’s IP address or the domain name of the IP address automatically gets recorded in the web server data activity file. Which of these two identifiers gets recorded in your web data activity file depends on how your web server is configured to log hits. They can be configured to perform Domain Name Service (DNS) lookups while logging entries, or they can be configured to simply record the IP address. Many web servers do not perform lookups while logging information because it slows down delivery of the web visitor's requested content. However, if IP addresses are not resolved during creation of the web data activity file, you can always perform a DNS lookup after the web data activity file has been created. One of the major benefits of using IP addresses and domain names to identify the visitor is that many DNS servers contain additional information about the IP address or domain name, such as the location and company. This tells you where your visitors are coming from. In general, geographical information about your visitors can contribute to your customer research and marketing database. You may even be able to discern if web visitors are coming from direct competitors, and this additional information could be valuable for your competitive analysis database. 62 WebTrends Implementation Guide Pitfalls with using client IP addresses or domain names There are a few pitfalls involving IP addresses and domain names when identifying visitor activity. These pitfalls may cause your results to be inaccurate. Proxy server buffers A major problem with using IP addresses and domain names as identifiers frequently arises when web visitors access web sites through ISPs or from within the network of a large corporation. When this occurs, web visitors may be routed through a proxy server before getting to the content. Consequently, it appears that the web hit comes from the proxy server rather the actual visitor. For example, most AOL users go to the Internet via a proxy server and show up as that proxy server in the reports (instead of as the actual user’s IP address). You can also have problems with aliasing across a single session when a service provider load balances using multiple proxy servers. The first hit by the visitor may be handled by one proxy server, while the next hit from the same visitor may be handled by a different proxy server to distribute the workload. When this happens, the IP address or domain name of the proxy server gets logged, making it appear that the hits came from separate visitors. Those visitors are the proxy servers, however, not the actual client machine. Computer usage And similar to the problems mentioned in the cookies section, when multiple users visit your site from the same machine, or when a single user visits your site from more than one computer, associating visitors to web activity via a computer’s IP address cannot be done accurately. Combination of IP address and agent information The next best method by which to identify unique visitors involves the use of IP addresses in combination with agent information (this is the client’s browser, type, and version—see Figure 3-1 “Sample log file explanation” on page 44). IP addresses and agent information allow you to get around the problem of multiple visitors who use the same IP address through proxy servers, because on each machine behind any given proxy each visitor often uses a different version of the browser. Therefore, you can get a clearer picture of the visitor based on the browser (which is included in the agent information) type and version used. • Visitor Identification 63 Cookies Probably one of the most commonly used and most accurate methods of tracking visitor sessions is through the use of a persistent cookie. A cookie refers to some text that a web server sends back to a client machine the first time that client machine visits a web site. This cookie text gets stored on the client machine’s hard drive, and in subsequent requests to that web site by the client machine, the cookie is sent to the web server. Here’s an example of a typical cookie text: COOKIE_ID=10.21.151.222-92873123.102983222 Figure 4-2 shows the cookie process. Figure 4-2. Cookie Process Here’s the process in three steps: 1. The client machine sends a request to the web server of a particular site for the first time. At this point, the client machine has no cookie information for that web site stored on its hard drive. 2. The web server processes that request and recognizes that the client request contains no cookie information. It then serves up the content requested by the client machine plus a cookie. Of course, for the cookie to function as a visitor ID, the cookie text delivered to the client machine must be unique. The web server also specifies a domain for which that cookie is valid. This way, the client machine knows which cookie to send for a given site 64 WebTrends Implementation Guide since client machines may have hundreds of cookies for a variety of web sites. 3. The cookie gets stored on the client machine’s hard drive, and during subsequent visits to the web site, the client sends the cookie to the server in the request. The cookie is logged into the cookie field of the web server log, and may be used later to associate the visitor to all other logged hits containing that same ID in the cookie field. The SmartSource Data Collector (SDC) has a cookie server component that delivers a cookie to a visitor if that visitor is new. Subsequent visits by that same visitor result in the cookie, which contains the visitor identifier, being sent to the SDC along with the web activity information. The cookie is generated by SDC and consists of the IP address sent in the original request appended to a decimal-separated number based on the time stamp of the request. Because the decimal-separated number uses the time stamp down to the nanosecond level, this combination results in a number that is almost guaranteed to be unique. Persistent vs. session cookies You can issue two types of cookies: persistent and session. A persistent cookie is one that is written to the disk on the client’s computer. Therefore, it can stay or “exist” for an extended period of time. A session cookie is never written to the disk of the client’s computer. It “exists” for the length of the session and expires at the end of the session or when the visitor’s browser is closed. Therefore, the session cookie “lives” only in memory and for the duration of the session. Most companies and organizations prefer to use persistent cookies. Nevertheless, session cookies are useful, because it allows web servers to track visitors throughout a session. The federal government, for example, uses session cookies because it doesn’t want to put data on a client’s computer, because that is a privacy issue. If you use persistent cookies, WebTrends can recognize visitors over a period of days or longer. If you use session cookies, WebTrends can still recognize visitors who are coming via proxy servers or are sharing IP addresses, because the session cookie provides a unique identifier for that session. Accuracy regarding correlation of behavior within a visit is very accurate, but the unique visitor count will be too high because every visit will be seen as from a unique visitor. Pitfalls to using cookies There are pitfalls with using the cookie field to identify a visitor’s activity. These pitfalls may prevent you from using cookies at all or cause your results to be inaccurate. • Visitor Identification 65 People share computers Consider a situation in which a family has one computer at home. Let’s say that Dad goes to a home improvement site and visits the power tools section. Let’s also assume that this is the first time anybody has used that computer to visit that home improvement site. Later on, Mom goes to visit the gardening section of that same site. Because a cookie was created when Dad first visited the site, when Mom visits that site, the cookie generated by Dad’s visit is sent with her request. When the web server logs are analyzed, it erroneously appears that Mom and Dad are the same visitor. (Note that this is possible when people share log-in IDs.) People use multiple login IDs The previous example is not true if each person logs out after using a machine. Sometimes family members have their own desktop icons that they use for logging in and out. Because cookies are stored per log ID, the same person can get two cookies on the same machine. People use more than one computer Bill Smith works in a cubicle for a high-tech company and occasionally surfs the web while taking a break. He’s been interested in purchasing a bicycle lately, and for the last few weeks during his breaks he has been researching several different bicycle models. He finally figures out which one he wants to buy, so when he gets home, he jumps right to the shopping cart portion of the site and immediately makes a purchase without conducting further research. An analysis of his activity would be inaccurate, because all his research would be tied to his work computer’s cookie for that site, while his purchasing behavior would be tied to his home machine's cookie for that site. Instead of making it appear that he was a visitor who conducted a fair amount of research and then made a purchase, it would seem that two people visited the site: one who did a lot of research and then did not make a purchase, and another who did no research, but immediately made a purchase. Some visitors reject cookies Many people worry that a cookie could capture information about them without their permission; so they set their browser to reject cookies. Consequently, no unique ID is recorded in the web activity log, and each repeat visit to a web site is logged as being a new visitor, not a returning one. 66 WebTrends Implementation Guide Cookies can expire or be deleted Sometimes web visitors decide to delete their cookies—or for a number of other reasons— their cookies get deleted. When this happens, any previous site activity associated with that erased ID cannot be related to any new activity on the site carried out by the same person. This can also happen if a cookie expires before a user returns to the site. Note: Studies have shown that the number of people rejecting cookies is unlikely to be higher than 3% of user registration and that login cookies are “good enough” for unique user identification and preferable to using IP addresses, because the margins of error are that much less. Session IDs or IDs embedded in URLs Certain web sites, especially those with shopping cart pages and registration pages, insert a unique web visitor ID into the URL. This ID then gets recorded in the web data activity file as part of the URL field. WebTrends can use this ID to identify the web visitor by stripping it out of the URL, and then pasting it into the cookie field of your web data activity file. WebTrends can then use the cookie field to sessionize your hits. The major restriction to using this method is that every web page URL for the site must contain the unique visitor ID; otherwise, visits to pages without the visitor ID will appear to be from a new visitor. Here is an example of a visitor ID in the URL field: /store/product/3425858080.29527895/overview.html Some web sites attach a session ID to the user’s activity, and this ID is either recorded directly to the cookie field or in the URL query parameters of the web data activity file. Similar to processing visitor IDs, WebTrends can cut the session ID out of the query parameters field and paste it into the cookie field, but session IDs—as the name implies—are only good for a given session. They do not persist across multiple sessions. In some cases, a session ID may have its own place in the record of a web data activity file and look like this: SID=jhmbobkcb111inehlpkjhopabbe • Visitor Identification 67 Authenticated username Probably the most accurate way to identify visitors is by using the authenticated username that they enter into an authentication dialog to access restricted portions of a site. In this case, an authuser entry is made in the web data activity file, with the value being the username the visitor entered into the dialog. In the following example for the record of a web data activity file, John Smith is the name of the authenticated user. 2002-01-01 00:12:12 server2.att.com John_Smith W3SVC3 HERC 192.168.1.1 GET This could be an extremely reliable method if a web site made its entire site password protected. However, there are many reasons that web sites tend to only designate portions of their site as password protected. Typically, these are areas of content that the visitor paid a subscription to access, as in the case of an online newspaper, or pages in which the user enters information that they wish to keep secure, such as credit card numbers, contact information, and other personal data. For the authenticated username method to work, the entire site would need to be password protected so that each visited page would result in the username being logged in the authuser field. Here is another example of how authenticated usernames work: Consider the Yahoo sub-site, My Yahoo. To gain entrance to My Yahoo, you first had to register for the site. You probably entered your first name and last name, your address, your email address, your phone number, your zip code, and perhaps answered a survey with information about your background such as single versus married, income level, interests, occupation, and more. Yahoo takes that registration information that you entered and creates an external visitor database. Each time you log in to the site, you enter your username and password. That user name shows up in the authuser field for any web data activity file hit made to an authenticated area of the site. The value in the authuser field is then used as a key to tie these hits to your visitor characteristics data in the external database Therefore, anytime a visitor visits a site, no matter what computer that visitor logs in from, his or her username remains the same. By using authenticated usernames you can also eliminates aliasing that occurs when two or more visitors use the same machine to get to a site. Each user must enter their unique username and password. Figure 4-3 shows a sample report of authenticated user names that visited most often. 68 WebTrends Implementation Guide Figure 4-3. Authenticated Usernames report • Visitor Identification 69 Summary In order to gain more meaningful insight into visitors behavior on your web site, you need to be able to assign each hit in a web activity data file to the visitor responsible for that hit. You then need to be able to look at a specific visitor’s activity and determine that this activity occurred during one continuous visit session or over multiple visit sessions. The key to all this is how you associate a visitor with each web log record. There are several different identifiers that you may use to do this: • Client IP address or domain name • Combination of IP address and agent information • Cookie (persistent or session-only) • Session IDs • Data embedded in the URL • Authenticated user A cookie, session ID, or authenticated username provides fairly accurate visitor identification, though you will likely have some background work to do in order to use these as identifiers. Your other main options are an IP address or a domain name. These two identifiers are readily available, but both are severely limited in how accurately they can identify visitors. Determining how your visitors behave on your web site is one of the most powerful aspects of web analytics. For this reason, you may want to invest the time that it takes to employ one of the more accurate means of identifying your web visitors. 70 WebTrends Implementation Guide Finding the Features in WebTrends Products You will find the topics discussed in this chapter in WebTrends. Simply highlight a sample profile and click on the menu commands as per the instructions below. Figure 4-4 shows the WebTrends Admin Console. Figure 4-4. WebTrends Admin Console Paths to the features: Session Termination Time Frame Click on Options > Session Tracking > New Session Tracking Definition. Domain Name Click on Options > Analysis > Domains. IP Addresses, Cookies, and Authenticated Usernames Click on Options > Session Tracking > Edit a session tracking definition. • Visitor Identification 71 Visitor Identification Worksheet Use the following worksheet to determine who your visitor really is. Consideration How accurate does your clients’ data need to be? Note: The more you require of clients, the more you drive traffic away. Do you assign cookies to clients who visit your site? Use persistent cookies? Use session cookies? Do you want a hosted service to handle all of the cookie information? Do you want to require authenticated user names from your clients? Do you want to keep an authenticated user database and manage it? If you have to migrate users from one system to another, are you prepared to migrate that authuser database? Do you want to use DNS or will this slow down your system too much? 72 WebTrends Implementation Guide Yes No Comments Chapter 5 Defining Behaviors After you understand how to collect activity data and what it looks like (both discussed in Chapter 3), and you understand the concepts involved in identifying your visitors (discussed in Chapter 4), you are ready to understand how to convert this raw activity data into something that matches the organization of your web site. WebTrends web analysis provides a set of pre-defined reports on a variety of visitor behaviors—the top pages visited, the top visitors, the top entry pages, the top referrers—all standard information available from data files whether captured traditionally or via a clientside tag. Figure 5-1 on page 74 shows a sample Pages report. • Defining Behaviors 73 Figure 5-1. Pages report To create basic measurement reports, you don’t have to do much more than tell WebTrends where the web activity data is located. Basic reports can be useful indicators of general web site activity, but there’s a lot more you can learn from WebTrends if you’re willing to put in a little effort. The real benefits of WebTrends are found when you use it to identify and improve those areas of your site that are not working optimally or are reflecting traffic patterns far different than what you expected. For example, are people linking to a specific page on your site after viewing an advertisement that you intended for them? If not, you may want to reconsider the advertisement. Do people who begin to make a web-based purchase actually complete that purchase? If they abandon the purchasing process, then perhaps it’s 74 WebTrends Implementation Guide time for you to examine that process more closely. So how can you determine whether your web site provides the functionality and gets the results that you intended? The answer is by understanding how your site is designed and then focusing your web site analysis on those functional site areas. Specifically, you need to tell WebTrends what the specific parts of your site were created to do. Focusing the Scope of Analysis It can be overwhelming to try to figure out what’s happening with every single page of a large web site. Most people within an organization have an interest in specific areas of the site, not the entire thing. For example, if you work for a large company that sells computer processors to consumers and businesses, but your focus is on consumer sales, your primary interest is tracking content that is related to consumer sales (unless of course, you were comparing consumer sales versus business sales). In other words, you need to focus on analyzing the parts of the web site that matter to you. URL classification So how do you focus your analysis on just the web site content that matters to you (or to the person who asked you to report on this content)? The answer is actually straightforward: tell WebTrends which pages, groups of pages, and other web-based content you want to examine. In WebTrends lingo, this is referred to as URL classification. URL means Uniform Resource Locator. The URL is the address of a resource, or file, available on the Internet. The URL contains the name of the protocol required to access the resource (for example, http or ftp), a domain name that identifies a specific computer on the Internet, a directory and pathname on the computer, and sometimes query parameters—for dynamic web sites. Figure 5-2 shows the URL format. Figure 5-2. URL format • Defining Behaviors 75 If the URL is the address of a static web page, then query parameters are not involved. Static pages send exactly the same response to every request. For example, a page on the internet may be located at http://www.ietf.org/rfc/rfc2396.txt. This information describes a web page to be accessed with an HTTP (web browser) application that is located on a computer named www.ietf.org. The pathname for the specific file in that computer is /rfc/rfc2396.txt. If the URL is the address of a dynamic web page, then query parameters are involved. These parameters, not the page names, identify the page’s content. The dynamic web page is simply a way to dynamically generate larger sites from database architecture, making it significantly easier to maintain pages as the site grows. For example, http://clothingshopping.com/category.aspx?catID=211 indicates a specific page at clothingshopping.com that sells children’s clothing. In URL classification, you use a page’s URL and perhaps also its URL query parameters to identify and then classify that page according to its function. An example of URL classification For example, on a product’s ordering page for a site that sells phone accessories (for example, Zedesco Communications), a visitor could select a cell phone cover from the products list, and then select sunburst yellow for the color option. The URL that would appear on the page might be: www.zedesco.com/cart/order.asp To learn which product is being selected, however, you need to examine the URL query parameters. In the example of the sunburst yellow cell phone cover, the URL, followed by the URL query parameters would look something like: www.zedesco.com/cart/order.asp?order_ID=10334& product=cellaccessories&type=cellcover&opt_type=color&opt= sunburst%20yellow You could classify the page using only the URL stem (cart/order.asp) to collect all visits to the order page, regardless of what type of product was ordered. In this case the function of the pages would be to let web visitors order products. However, to get more information, you would use the URL query parameters to classify the page visit in more detail. In this case, you would classify the page as belonging to the group of cell phone accessories items ordered. WebTrends analysis products allow you to easily associate URL query parameters with an item or a group of items ordered. 76 WebTrends Implementation Guide Note: This book draws on examples from a hypothetical company called Zedesco Communications that sells electronics. Consequently, this book often refers to the Zedesco Communications web site, www.zedesco.com. URL classification and the SmartSource Data Collector Although the concept of URL classification was developed for web server log entries, the WebTrends SmartSource Data Collector (SDC), which collects web activity data using the client-side tagging method, also relies on URL classification to track specific pages. The way it goes about doing this, however, differs from the method used by web server logs. Instead of waiting to perform URL classification on web data activity files after they have been created, SDC applies URL classification as the web data activity file is being created—which increases performance and efficiency of the data collection process. WebTrends methods of URL classification WebTrends offers several different types of URL classification, with each method designed to help track a specific function. Some of the types of URL classification available include: • Content groups • Product groups • Scenario analysis • Advertising views Content groups Content groups designate pages with related subject matter. This grouping allows you to track the visitor interest in subject matter rather than in individual pages, which makes interpreting visitor interest far more intuitive. By grouping together related pages, you can also track web activity on your site from perspectives that may not be inherently possible with your site’s current organization. Let’s look at two example of content groups: one for a site with static web pages and another for a site with dynamic web pages. • Defining Behaviors 77 Content group example (static site) On a web portal that contains information such as stock quotes, news articles, and weather, you may wish to compare visitor interest in domestic versus international news. To do this, you might create a content group called international news, which contains all international news articles, and a content group called domestic news, which contains all domestic news articles. If the content were posted on a static site, you would likely have a structure of news/international/article1.htm news/international/article2.htm news/international/article3.htm and news/domestic/article1.htm news/domestic/article2.htm news/domestic/article3.htm These content groups specify that you gather visits to some pages in the international folder and visits to other pages in the domestic folder. Content group example (dynamic site) A dynamic version of this site would require that you use the parameters of the requested URL to group each related article in the right content group. A visit to an international and a domestic article on such a site might appear as: default.asp?div=news&type=international&article=1 default.asp?div=news&type=international&article=2 default.asp?div=news&type=international&article=3 and default.asp?div=news&type=domestic&article=1 default.asp?div=news&type=domestic&article=2 default.asp?div=news&type=domestic&article=3 In this case, you would track the page default.asp that had the parameter div with a value of news and the parameter type with a value of domestic or international. With web server logs, you have to tell WebTrends which pages belong in each content group. As WebTrends parses the records, it looks for entries that belong to a given content group. By contrast, when using a data collection server, content group information is accumulated as 78 WebTrends Implementation Guide the pages are served. This is because when pages are created, if they belong in a specific content group, you can include the name of the content group in the page’s META tag information. The SmartSource Data Collector knows to look for this information, and then sends it on to WebTrends for reports or to a web data warehouse. By using SmartSource Data Collector, you only have to configure a page one time to associate it with a content group. Of course, even if your are using SmartSource Data Collector, the WebTrends engine can still be configured to recognize content groups from the raw URLs. Figure 5-3 shows a sample Content Groups report. This report identifies the most popular groups of web site pages and how often they were visited. Figure 5-3. Content Groups report • Defining Behaviors 79 Product groups Product groups are a specialized type of content group that help you to track pages specifically related to products you sell or promote on your site. WebTrends analysis products track product groups separately because products are such a high profile component of most sites. Product group example Let’s say that Zedesco wants to track web activity visits to content about cell phones and cell phone accessories. To do this, they create a product group that includes product pages for cell phones and their accessories. If one directory contains all the cell phone content and no other type of content, they can simply specify that directory. However, if cell phones and their accessories are stored in different directories, and other, non-cell phone content is included as well, they will have to do a little more work to define their product group. Assume that the site is structured with the following product pages: products/phones/cordless phones/SBC-2905.htm products/phones/cordless phones/SBC-7205.htm products/phones/cordless phones/SBC-3205.htm products/phones/cell phones/XT2100.htm products/phones/cell phones/SCH-N300.htm products/phones/cell phones/N-3285.htm products/phones/accessories/travel-charger.htm products/phones/accessories/covers.htm products/phones/accessories/headset.htm products/phones/accessories/videogame.htm Keep in mind that some of these pages represent cordless phones, others represent cell phones, while still others are cell phone accessories (in the accessories directory). Note: A large, database-driven site that uses dynamic URLs would use the following structure: products/info.asp?prod=1783&cat=13 where 13 represents cordless phones 1783 identifies SBC-2905 If you wanted to capture cell phones and cell phone accessories in a product group, you would capture the following, assuming that the travel chargers, car-kits, headsets, and the video games are cell phone accessories: products/phones/cell phones/XT2100.htm products/phones/cell phones/SCH-N300.htm 80 WebTrends Implementation Guide products/phones/cell phones/N-3285.htm products/phones/accessories/travel-charger.htm products/phones/accessories/covers.htm products/phones/accessories/headset.htm products/phones/accessories/videogame.htm However, note that headsets could overlap into a cordless phone accessories product group. It is common for pages to have several places where they might be logically grouped. To capture cell phones and their accessories, you would tell WebTrends to take all content in the \products\phones\cell phones directory, and group them with the individual pages for the remaining items. In this case, that would mean you would tell WebTrends to group visits to the cell phones directory pages with visits to the following accessory pages: travelcharger.htm, covers.htm, headset.htm, and videogame.htm. Figure 5-4 on shows a sample Product report. It represents the number of visits during which product-related pages were viewed. • Defining Behaviors 81 Figure 5-4. Product report 82 WebTrends Implementation Guide Scenario analysis In the context of defining your site’s structure for WebTrends, you need to know which areas of your site, if any, contain sequences of pages that make up a web-based task you want your visitors to complete. These sequences of pages are called scenarios, and some of the most common examples of scenarios are registering as a user of a web site, making an online purchase, or filling out a survey. For example, Zedesco has a registration process that requires web visitors to fill out the following pages to complete their registration: • Start of information request • Verified information • Completed registration These steps constitute a registration scenario. Another common scenario is the shopping cart scenario, in which your visitors proceed through a series of steps to purchase products. Other, less familiar sequences on your site may also be important to track—for example, a sequence of product pages that you want to make sure visitors are viewing, or if you are a travel web site, a set of pages that your visitors must complete to track prices for their top flight itineraries. Figure 5-5 shows a Registration Conversion Funnel report. This analysis offers insight into each step along the information request process. Each step shows a drop-off as visitors move through the funnel. • Defining Behaviors 83 Figure 5-5. Registration Conversion Funnel report 84 WebTrends Implementation Guide Advertising views If your company hosts advertisements on its site, it can be very important to show your customers how much traffic the ad you’re hosting for them generates. In addition, the development of pricing schedules may be heavily dependent on where the ad is placed. You may need to provide numbers to potential customers that show how valuable a particular piece of web real estate is for advertising. Reports on traffic generated by ads placed in various areas of the site can let your customers balance level of exposure versus cost when making their decision about posting their ad. Advertisements can be broken into two parts: • Ad View – Visitor views a page containing the ad graphic or link. • Ad Click – Visitor actually clicks on the ad and opens its content. Depending on the ad hosting method, both the ad itself and the content it links to may be hosted on your site. However, it is also common to host the ad on your site, yet have the content of that ad hosted by your customer, on their site. In the first method, the Ad View and the Ad Click that results in the ad content display are both logged to your web server log because all activity occurs on your web site. In the second method, the Ad View activity is logged to your web server log, but the act of displaying the ad content display is logged to your customer’s web server log, not yours. You can get around this issue by implementing server-side scripting (for example, CGI, Perl, or ASP) to perform a redirect to the destination URL. A very common Perl script is redir.pl. This redirect command sends the hit information back to your web server’s data activity file, and is recognized as an indicator that the ad was opened. Of course, if you are using a data collection server or client-side tagging method, you can easily collect this information by running a script each time an Ad View or Ad Click occurs. An ad click is an indicator of greater interest in the ad than an ad view is because it implies that the user focused directly on the ad and was interested enough to click on it. Figure 5-6 on page 86 shows an Onsite Ad Impressions report that shows how often specific ads were viewed. • Defining Behaviors 85 Figure 5-6. Onsite Ad Impressions report In the Onsite Ad Impressions report note that the Ad Views Visits column refers to the number of visits by visitors who saw the specified ad. A visit is a series of actions that begins when a visitor views a first page from the server and ends when the visitor leaves the site or remains idle beyond the idle-time limit. The default idle-time limit is thirty minutes. This time limit can be changed by the system administrator. Therefore, a visitor may see an ad more than once during a visit, but the ad will only be counted once in this table and graph. 86 WebTrends Implementation Guide Other site structure issues Before you begin to tell WebTrends how your site is structured by the patterns it can find in your URLs, you need to define your home page definition and how to classify files based on their extensions. These two items affect how WebTrends counts visits to your site's home page and how it interprets the files you request. Unless your home page name changes, typically you will specify these settings just once; whereas the settings you configure in URL classification will likely change according to your analysis needs and according to modifications of your site. Handling dynamic pages—URL rebuilding Sites that make heavy use of dynamic pages require a little extra thought. Typically, dynamic sites are driven from a few scripts that use parameters to control the content of each page as it appears to the visitor. Yet the name of the script by itself—for example, default.asp or catalog.php—is not very descriptive of what the visitor sees. These names do not illuminate reports that list the top pages or show paths through a site, because it looks like only a handful of pages are visited. For example, listing pages only by the URL filename could result in the following: Page Visits Page Views default.asp 1431 14,252 catalog.php 231 986 Parameters to those pages control their actual content, and so it is those parameters that need to be included along with the page name. For example, using the dynamic URL: default.asp?type=domestic&div=news&article=104&sessionid=155428642 You may find it most informative to know which division and type of articles are being viewed. It makes sense to include those parameters in the page’s URL for reporting. Including the sessionid is, however, not at all desirable, since it makes every page access appear to be different content. WebTrends allows you to “rebuild” the URL, specifying just which parameters to use. In the • Defining Behaviors 87 example above, you may want to include the “div” and “type” parameters only. This could be used to transform the URL above into: default.asp?div=news&type=domestic Using the URL rebuilding feature, the Pages report becomes more enlightening. Page Visits Page Views default.asp?dif=news&type=domestic 528 6,243 default.asp?dif=financial&type=domestic 431 2,511 default.asp?dif=ads&type=domestic 366 3973 default.asp?dif=news&type=international 132 674 catalog.php?dept=clothing 89 694 catalog.php?dept=hardware 67 185 catalog.php?dept=kitchen 44 56 catalog.php?dept=advice&type=domestic 42 177 catalog.php?dept=food 31 51 Note that the parameters are sorted alphabetically. This ensures that two URLs which differ only in order or parameters are still considered to refer to the same content. 88 WebTrends Implementation Guide Home Page definition Counting visits to your home page can help you determine whether the bulk of your visitors come to your site via your home page, or whether they entered your site from somewhere else —perhaps from a bookmarked page, an ad, or some other link. The home page, just like any other page, has a filename. But typically, especially when visiting a site for the first time, the visitor types in the site’s name. Why? Because most sites names are easy to remember. The visitor may even know the division within the site, so will enter a folder such as “products” or “news” after the site’s name. The file name for the homepage—whether it is the top-level home page or the home page for a division within the site—is not so easy to remember. Consider the variety of possible names that you see: default.htm, default.asp, index.htm, index.asp are all standard names, but there’s nothing that prevents a site designer from making up a completely unusual name. For this reason, web servers are designed to recognize that the visitor is attempting to visit the home page when they omit the home page’s file name, and serve up that content. If a visitor entered www.zedesco.com, the web server would deliver the home page, www.zedesco.com/ default.asp. The problem is that whatever is requested gets recorded in the web data activity file. You want the entries to be viewed as the site’s home page, not separate pages. To make this happen, you have to tell WebTrends that web data activity file entries that appear as: GET/ and GET/default.asp are actually visits to the same page—the home page. This allows you to obtain an accurate count of home page visits. File types As web site development and publishing have become more involved, so have the types of content that can be hosted by your site. In addition to standard HTML documents, sites also host downloadable files for Flash presentations, Microsoft Word documents, Adobe Acrobat .pdf files, compressed files, video files, audio files, executables, and so forth. You need to tell WebTrends how you want it to view various file types based on their file extensions. While at first it may seem obvious which file types are documents and which are downloadable files, consider how you might classify the following Adobe .pdf file. /club/kb/Nokia C23/owners_manual.pdf Is it a downloadable file or a document? Really, it depends on how you expect visitors to use it. For ambiguous cases such as these, you must tell WebTrends how to treat each file extension. That way, when your analysis software parses the web data activity file and encounters a record such as the request for the Nokia C23 Owner’s Manual, it knows what to do. • Defining Behaviors 89 You will want to divide files this way so that you can determine whether or not your visitors look at certain types of files. If you devote a substantial portion of the budget to creating multimedia pieces for your site, you want to know that your investment is paying off. You may also have the same information presented in multiple formats and want to know which format your visitors use the most: static documents or interactive elements. Summary Many people who have WebTrends never realize the full potential that lies in the features it provides. Instead, they only venture as far as using the standard reports that ship with WebTrends and track information about the entire site, not specific pages or areas of the site. The real value in web analytics is in identifying and examining specific areas of your site in detail. Typically, these areas are ones that allow web visitors to complete an action, such as making a purchase, researching a product, or solving an issue by reviewing online support materials. The tools provided with WebTrends allow you to track visitor behavior: visits to content and product groups, the steps in a scenario, clicks on advertisements, and the paths that visitors took through your site. All of these tools can help you focus on your site to find what is working and what needs some improvement. 90 WebTrends Implementation Guide Finding the Features in WebTrends Products URL Classification Click Web Analysis > Report Configuration > URL Parameters also Click Web Analysis > Profiles & Reports > Edit a profile > Advanced > URL Parameter Analysis Content Groups (and Product Groups) Click WebAnalysis > Report Configuration > Content Groups or Click Web Analysis > Profiles & Reports > Edit a profile > Advanced > Content Groups Scenario Analysis Click Web Analysis > Report Configuration > Scenario Analysis or Click Web Analysis > Profiles & Reports > Edit a profile > Advanced > Scenario Analysis Advertising Views (and Clicks) Click Web Analysis > Report Configuration > Onsite Advertising or Click Web Analysis > Profiles & Reports > Edit a profile > Advanced > Onsite Advertising Home Page Definition Click Web Analysis > Profiles & Reports > Edit a profile > Analysis > Home File Types Click Web Analysis > Options > Analysis > Page File Types URL Rebuilding Click Web Analysis > Profiles & Reports > Edit a profile > Advanced > URL Rebuilding • Defining Behaviors 91 Defining Behaviors Worksheet Use the following worksheet to focus on your web site functionality – what is working and what needs improvement. Consideration Your web site is organized so that it can be searched according to content groups? You need to know the visits and hits for each content group? Your web site is organized so that it can be searched according to product groups? You need to know the visits and hits for each product? Your web site has scenarios that you would like to analyze (for example, shopping cart)? You need to know the number of Ad Views and Ad Clicks on your ads? Your home page name changes now and then? 92 WebTrends Implementation Guide Yes No Comments Chapter 6 Filtering and Analyzing Your Data If you were packing for vacation, you wouldn’t open your dresser drawers and closet and dump the contents directly into your suitcase. If you did, you would end up with a truckload of suitcases to lug around when you’d only use a fraction of those clothes. Instead, you might put all your clothes out on a bed, examine what you have, and then select and pack what you need. In this situation, you would include only those items you know that you’ll use. Conversely, you might return items you don’t want to the closet or dresser drawers, leaving only the items that you do want to pack. By going through this process, you’ve narrowed down all your clothes to just the clothing that you know you will need. This not only reduces how much clothing you have to store in your suitcase, but it also saves you from having to sift through all your clothing each time you get dressed. Consider approaching your web server data files in much the same way you would when packing for a vacation. If WebTrends had to sift through all your data, your system would be working harder than it needs and would unnecessarily be using up storage space. In addition, once analysis is done, you would prefer to review results that have meaning for you, not all possible results. Filtering is the process of preparing to run a web activity analysis that allows you to select only pertinent data. Filters allow you to determine such things as new versus returning visitors, which visits were initiated by a campaign, and which visitors were internal employees or external visitors. First, you need to determine if you want to filter all of your activity data or just some parts of it. If you apply filters to all of your activity data, then you must keep in mind they affect all of the analysis. In most cases, you will want to filter out images (such as JPEGs and GIFs), spiders, robots, and anyone from your company who is testing the site. Keep in mind that global filters select a portion of data for analysis. Most of your filtering will probably achieve the best results at the custom report level. This means that the filters will be applied on a per table basis to generate reports that are specifically tailored to your needs (hence “custom” reports). Through custom reports you can achieve greater visitor segmentation and site segmentation. Site segmentation means that you can examine specific areas of your web site—for example, the directories that deal only will technical support. • Filtering and Analyzing Your Data 93 After understanding global and local filters, you can consider two types of filters that allow you to specify which data to analyze: include filters and exclude filters. • Include filters specify the data to use in the analysis. • Exclude filters specify what not to include in the analysis. Sometimes it doesn’t matter which filter you use, but at other times, one kind of filter is distinctly more convenient to use than the other. You can easily apply the concepts of including versus excluding data with two different levels of filtering: filtering on hits and filtering on visits. The remainder of this chapter describes how include and exclude filters work with hit filters and visit filters. By understanding the concepts involved, you will analyze data that pertains to your needs. If you choose to apply no filters to your web-activity files, the analysis software analyzes all the data. However, this may impact performance and analysis time, because your data records will contain information about images and other kinds of data that contain no real value. Setting Up Your Profile—Initial Filtering A profile is a group of settings with which you identify the visitor activity data to be collected, filtered, analyzed, and displayed in your WebTrends reports. Typically a profile is created for an individual web site, but often a separate profile will also be created to report on a portion of a site, or to roll multiple sites together. Through the profile, you define the data source location, any activity you want filtered from the reports, and user rights to the resulting reports. More information about profiles (especially parent-child profiles) is presented in “Parentchild profiles—a structural alternative to custom reports and/or filters” on page 115. When you first WebTrends, you most likely will set up your profiles, which gets the web data set up and the reports that you want to create. Then you will run your profiles. Afterwards, for deeper inspection and further analysis, you go back and filter the data in various ways. 94 WebTrends Implementation Guide Hit and Visit Filters To understand why you need hit and visit filters, you must first understand the concepts of hits and visits. Note: The concept of hits and visits was introduced in “Defining Web Activity” on page 57. Hits When your web server or data collection server records visitor activity, each line in the record represents a hit to the server. Hits are the individual activities that combine to make up a visit to a single page. Think of the contents of a typical web page. Most consist of some text and one or more graphics. When users request a page, they are actually making requests for each item on the page–maybe a GIF image of a company logo, some HTML text, and a JPEG image. The server either successfully or unsuccessfully handles each item, and then logs the results of the request for that item, or hit, along with other information about the hit. One record in the web activity data file equals one hit. Actually, with web server data files, this one record does equal one hit. However, for clientserver tagging, WebTrends SmartSource Data Collector server data files do not typically record hits to graphics images. In the case of a SmartSource Data Collector server log, you will typically only have page hits. Visits A visit, or a visitor session, includes all the pages a unique visitor requests during a period of continuous activity on your site. Consequently, it includes all the hits associated with those pages in the visit. Visits are considered closed after the visitor remains inactive for a specified period of time. As a general rule, a visitor session should be closed if the user remains inactive for 30 minutes, although your WebTrends administrator may wish to specify a timeout period that is more in keeping with your web site analysis requirements. • Filtering and Analyzing Your Data 95 Hit filter criteria When you filter on hits, you filter in or out each individual piece based on some specified criteria, not all of them at once. Hit filters allow filtering at a more granular level than do visit filters. With a visit filter, you are filtering all hits associated with an entire visit session-not so with hit filters. The following subsections discuss several different criteria on which you can filter hits. These criteria may include: • • • • • • • • • • • • • Requested URL HTTP Method Cookie Multi-homed Domain Client Browser Return Codes IP Address File Directory Ad Views and Clicks Day of the Week Hour of Day Authenticated Username Requested URL You may decide that you need to include or exclude certain pages from analysis so that you can focus more directly on specific areas of the site. For example, if you are part of an IT organization, you may wish to determine whether your web visitors are viewing your knowledge base articles, all of which have a prefix of “kb_”. You could either list all of the knowledge base articles you wish to track, or, since WebTrends supports wildcard usage, you could specify that your filter includes all files beginning with “kb_”. If your site uses a content management system, then instead of specifying pages to include or exclude, you may need to specify a page and any URL query parameters that grabbed the content displayed in that page. An example of knowledge base articles that you may wish to track web activity for could be for issues with the P100 cellphone. The excerpt below is a hypothetical web data activity file entry that shows how this could appear: 2001-03-04 00:25:51 proxy1.thegrid.com - W3SVC3 web1 192.168.1.1 GET / support/default.asp product=p100&id=kb_5 96 WebTrends Implementation Guide The query parameters are product and id, where product=P100, and id=kb_5. You could track activity for P100 articles by specifying that your analysis include all hits with the page, default.asp, the product query parameter having a value of P100, and any records with an id value that contains the prefix kb_. HTTP method Your web server log may show requests using several different HTTP methods, but most frequently, you will encounter GET requests. These requests, when logged, contain more useful information for analysis purposes than any other method. A GET request returns whatever information is identified by the request URL and associated query parameters. For example, if you are using the Internet, and you click on an image, the actual request for that image might look like this: GET /picture.jpg HTTP/1.1 In a distant second place is the POST method, which some web sites use to post forms. A couple of other rarely used methods are PUT and HEAD. These methods seldom contain useful information for web analysis, and because they are used infrequently, they may never appear in your web data activity file. Typically, your web traffic analysis will process GET requests, though if your site has forms that use the POST method, you may wish to track activity on those forms. WebTrends has the capability to exclude records of requests using methods you don’t want to track. Of course, you could also choose to include only those methods you do want to track and the results would be the same. Cookie As mentioned in Chapter 4 (see “Cookies” on page 64), cookies can be a means by which WebTrends can recognize visitors. However, cookies are used to store various types of information, such as shopping cart contents, time of first visit, and number of visits. By selecting an appropriate cookie, you can investigate the behavior of a specific segment of your visitors. The cookie filter is typically used for this investigative purposes. This can be useful, for instance, if you know of visitors whose activity is not pertinent to your analysis, and you wish to exclude their activity. • Filtering and Analyzing Your Data 97 Multi-homed domain If your site is spread across multiple domains on the Internet, you may want to view the activity of only one domain. You may also wish to exclude the activity of one or more domains. A multi-homed domain filter lets you specify which domain or domains to filter from the analysis. Let’s say that your company is based in the US, but its site has sub-sites in the US (www.yourcompany.com), some in France (www.yourcompany.fr), and some in Germany (www.yourcompany.de). If you only wished to view the main US site, you might wish to either exclude the French and German sites, or it might be easier to include only data from the US site in the analysis. For users of SmartSource Data Collection, the multi-homed domain filter can also be used to filter out hits from sites that may have copied pages and the SDC script included in that page (recall the discussion of client-side tagging; see “Using client-side tagging” on page 49). Another use (by filtering in) of the multi-homed domain filter is to identify sites that have “stolen” copyrighted material. Browser With all the different types of browsers available today, you may want to get a sense of the types of activity carried out from various flavors of browsers—Internet Explorer, Netscape Navigator, WAP and Palm device browsers. You may even want to know if activity originated from a robot or spider crawling your site. Your web data activity files typically contain a reference to the browser used to access content. The files also record visits from spiders and robots in the same browser and browser version field. If your business has a portion of its site devoted to WAP devices such as cellular phones, and you wish to examine visitor activity on only those WAP-specific areas, you could tell WebTrends to only analyze requests originating from WAP browsers. The excerpt below shows a possible web data activity file entry that would be included in analysis if you created an include filter for WAP device browsers. 2001-03-04 08:39:02 208.18.146.75 - SERVER10 WEB1 - GET /wml/products/ wireless/phones.wml - 200 0 647 543 0 80 HTTP/1.1 UP.Browser/3.1.03NK02+UP.Link/4.2.1.7 WEBTRENDS_ID=133.205.252.8-2562687908.34229567 - A portion of this excerpt refers to the browser and browser version number used by the client making the request: UP.Browser/3.1.03-NK02+UP.Link/4.2.1.7 98 WebTrends Implementation Guide You may also wish to compare the types of activity you experience from a specific standard HTML browser such as Netscape or Internet Explorer. Because these browsers handle HTML code slightly differently, comparing the visitor experience on one browser with another can reveal valuable information. For such a comparison, you could create an include filter for each browser of interest and then review analysis results for each browser. For example, if you find that Netscape Navigator users drop out more frequently in a shopping cart scenario than do Internet Explorer users, this may indicate that the HTML code does not appear as you had intended on browsers using Netscape Navigator. Although web designers always try to review their sites in several different versions, it's easy to miss problems with design when you have numerous pages to review or if testing is not thorough. Return Codes Return codes indicate whether or not requested content was successfully delivered, and if not, what the problem may have been. Return codes in the 200s and 300s indicate a successful content delivery, while those in the 400s and 500s indicate a failed delivery. For most web visitors, the most well-known and irritating error is the standard 404 File Not Found error. In the web activity data file, this appears as a server to client status entry. The following data file entry shows a successful return code of 304 (Success Not Modified) in the first data file entry, and a success return code of 200 (Success OK) in the second data file entry. Both return codes are highlighted in bold print: 2001-03-04 00:03:23 computer.attcanada.ca - W3SVC3 web1 192.168.1.1 GET / club/kb/s32/motors.wmp - 304 0 27000 58 412 80 HTTP/1.1 Mozilla/ 4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1) WEBTRENDS_ID=10.14.211.5292873123.102983222 2001-03-04 00:04:09 computer.quest.com - W3SVC3 web1 192.168.1.1 GET / dealers/default.asp WT.sv=Web%20Server%201&WT.ti=Dealer%20Home&WT.tz=420 &WT.ul=en&WT.cd=32&WT.sr=1024x768&WT.jo=Yes&WT.js=Yes&WT.co=Yes 200 0 37211 121 389 80 HTTP/1.1 Mozilla/4.0+(compatible;+MSIE+5.0b1;+Windows+NT) Because 400 and 500-level errors indicate potential problems with your site, you may choose to create an include filter that analyzes only the activity on failed requests. You can then determine which pages may have problems that are preventing users from accessing your content and modify those pages to resolve the problem. IP Address What if your company just launched its web site after a major site redesign? Your company had a big launch party, and all the employees afterwards decided to look at the redesign on their own. You probably wouldn't want to include their visits in your analysis, so you could • Filtering and Analyzing Your Data 99 simply filter them out based on their IP addresses or your company’s domain name. Within each web data activity file entry is a field that indicates the computer address of the visitor. Depending on whether or not you instructed WebTrends to resolve IP addresses, this may either be an IP address or a domain name. Filtering on a visitor’s IP address or domain name allows you to include or exclude specific addresses in your analysis. You might also want to see levels of activity based on regions, country, or domain types. The web data activity file entry below with the bold highlighted entry shows a visit from a computer located in Canada, as evidenced by the .ca extension: 2001-03-04 00:03:23 computer.attcanada.ca - W3SVC3 web1 192.168.1.1 GET / club/kb/s32/motors.wmp - 304 0 27000 58 412 80 HTTP/1.1 Mozilla/ 4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1) WEBTRENDS_ID=10.14.211.5292873123.102983222 If your web site caters to educational institutions, then you would be most interested in activity originating from educational organizations. You could capture this data for analysis by creating a filter that included all educational sites based on their domain type extension of .edu. Another use of the IP filter is to filter out monitoring software, such as Keynote, which is used to maintain the health of the web site. That is, companies and organizations with extensive web sites find it beneficial to have their web site monitored by special monitoring software. Every time the monitoring probes a given web site, all of its activity will be counted unless the IP filter has been used to filter out the monitoring software. File Many hits contain requests for images that have very little meaning for you. Besides overloading your system with meaningless data to analyze, you are likely more interested in the actual pages that were opened during a visit than the images your visitors saw. You can use a specific filter to select the file types, such as GIFs, JPEGs, and other image or graphics files, that you wish to exclude or include from analysis. Figure 6-1 shows a report that identifies the accessed types of files for your site and the total number of kilobytes of data transferred for each file type. The percentage column (%) reflects the percentage of all kilobytes of data transferred for the specified file type. 100 WebTrends Implementation Guide Figure 6-1. Accessed File Types report Directory If your site is structured in such a way that various directories include specific types of content—the products directory contains products content, the support directory contains all technical support content, etc.—it may be helpful to look at various areas of your site by including or excluding content based on the directory and sub-directories in which that content resides. Tell WebTrends to include the directories that contain content of interest to you, or conversely, the content you wish to exclude from analysis. • Filtering and Analyzing Your Data 101 Ad Views and Clicks Many sites sell advertising space as a way of bolstering their income. To be able to track ads more easily, hosted ads typically consist of a graphic on a web page that when clicked, passes the user through a redirect page. This redirect page then opens the ad’s content. For both billing purposes and to assure those companies who advertise on your site that advertising on your site works, you need to show them that visitors are viewing the pages on which their ads reside, and that those visitors are then clicking on those ads to view them. Clients will typically only want to see the activity on their ads. To do this, you need to create an include filter for the ad view and ad click for each client’s ad. The following two sample hits show first an ad view of the ad graphic, specials.gif, which is hosted on the site www.austinbusinesscomputing.com. The second shows an ad click that took the user to a redirect page, yahoo1.htm, which made it possible to track an ad hosted on Yahoo. 1999-02-07 08:12:11 nsts02-1077.sts.embratel.net.br - SERVER10 WEB1 - GET / ads/specials.gif - 200 0 17527 587 4456 80 HTTP/1.1 Mozilla/ 4.0+(compatible;+MSIE+5.0;+Windows+98;+get2net+update) WEBTRENDS_ID=194.240.147.235-3218603766.52660653 http://www.austinbusinesscomputing.com/ads/networkAd.htm 1999-02-08 08:11:19 nsts02-1077.sts.embratel.net.br - SERVER10 WEB1 - GET / redirect/yahoo1.htm - 302 0 835 436 10 80 HTTP/1.1 Mozilla/ 4.0+(compatible;+MSIE+5.0;+Windows+98;+get2net+update) - To report on the specials ad, you would filter in only the ads/specials.gif file. To report on the ad clicks, you would filter in only the redirect/yahoo1.htm file and the return code 302. Day of the Week and Hour of the Day Being able to analyze your web activity for only specific days of the week, or for a certain time period during the day can be useful in a number of circumstances. For example, if your site has a weekly online newspaper that came out Wednesday, you might be interested in knowing how many visitors view your content right when it appears on the Web. Or, if you are tracking employee activity on your corporate intranet, you may prefer to only track activity during standard business hours Monday through Friday. Each hit recorded in your web data activity file has a time stamp that can be used to filter in or out specific days of the week, and specific time intervals during the day. Figure 6-2 shows a report that provides the average activity on all pages (of a web site) for each hour of the day. The percentage column (%) reflects the percentage of visits to your site that occurred during the specified hour. 102 WebTrends Implementation Guide Figure 6-2. Hits Trend report Authenticated username If your site requires that your users fill out an authentication process, you can include or exclude hits from specific visitors based on their user names. This concept is very similar to that mentioned in the cookies section earlier, though cookies can be used to filter more than just specific users. You might use an authenticated username filter if you found that a particular user who you do not trust is snooping around on your site. You could just as easily use an authenticated username filter to discover if a particular prospect is exploring your site. Refer to “Authenticated Usernames report” on page 69 to get an idea of this kind of report. • Filtering and Analyzing Your Data 103 Visit filter criteria You can use visit filters to include or exclude all activity related to an entire visitor session. The following sections describe types of criteria can you apply to an entire visitor session. These filter criteria include: • Entry page • Referring URL or site • Advertising campaigns Entry page The page on which the visitor first enters your site is the entry page. Filtering by entry page lets you include or exclude from analysis visits that started on specific pages. For example, if you have a redirect page you’re using to track an ad, you might choose to only include activity associated with a visitor session that began with a visit through the ad’s redirect page. You might also want to view only activity of visitors who began their visitor sessions somewhere in the middle of your site, because these visitors often have more of a purpose in their visit than do visitors who enter at your home page. To do this, you would create an exclude filter that filtered out all visits to your home page. Figure 6-3 shows a report that identifies the first page viewed when a visitor visits your site, the number of visits to those pages, and the percentage of times this page was the entry page compared with other entry pages. The most common entry page is usually the home page, but other common entry pages include specific URLs that visitors type, pages that have been bookmarked, or pages referred to by other sites. 104 WebTrends Implementation Guide Figure 6-3. Entry Pages report Referring URL or site You might also wish to exclude or include all visitor sessions that were started from a particular referring URL or site. A classic case where you might use this is if your web server log contains many self-referring visits. This happens when the visit session times out due to inactivity, but the visitor is actually still on the site. When they resume viewing content on your site, it will appear as though they have started a new visit session, even though they were already on your site. When this occurs, the referring URL that gets recorded in your web • Filtering and Analyzing Your Data 105 server log is the page on your site that they were last viewing before beginning the “new” visitor session. Advertising campaigns If you have advertising campaigns on your site you may wish to track the activity that is occurring on them. To do this you must create a campaign definition before you can filter on a campaign. This definition specifies the referring page or entry page that, when visited, represents a visit to the campaign, or, in the case of SmartSource Data Collector, query parameters. The most common use for filtering by campaigns is to include only the visitor session activity associated with a particular campaign. If you have a reasonable idea of the value that you can associate with specific activity, you may be able to forecast the revenue that can be generated by the campaign. Figure 6-4 shows a report that provides visitor activity for each campaign. 106 WebTrends Implementation Guide Figure 6-4. Campaigns report • Filtering and Analyzing Your Data 107 Important considerations – filtering on visits Filtering on visits is slightly more restrictive compared to filtering on hits. With hit filtering, you apply the filter directly to the raw web data activity file. Any hit that matches the criteria is either included or excluded from analysis depending on the type of filter you specified. When you filter on visits, however, the web data activity file has been parsed and processed to sessionize your data. At this point, you are not actually applying the filter to the raw web data activity file—you are applying it to a summary of the hits associated with a visitor session. Regarding logging, you should keep in mind that because your browser wants to give you information as quickly as possible, it uses a process (called multi-threading) that allows multiple items to be uploaded at the same time. As these items either successfully or unsuccessfully load in your browser, they get logged to the web server or data collection server web data activity file. This means that if an image loads in your browser before the HTML text file that references it, your web data activity file will record the hit for that image first. WebTrends then needs to reorder hits in the proper time sequence so that the visit sessions are accurate. Handling Multiple Filters It’s not uncommon to apply multiple filters at once. Therefore, it’s important to understand how WebTrends handles multiple filter situations. Consider if you want to track all visitor activity for a particular campaign that resulted in successful hits. You would create an include visit filter to include the campaign visitor session activity, and then you would only take those hits within that activity that were successful by creating an include filter for successful return codes, or an exclude filter for failed return codes. Of course, if the filters were applied in the opposite order, the results would still be the same. This is just one way in which multiple filters can be combined. WebTrends explicitly tells you how it handles such cases. Figure 6-5 shows (in a broad sense) how include and exclude filters can achieve a desired result. This illustration depicts data files in the left column that pass through an include filter and then an exclude filter. The final data set is shown in the column on the far-right side. 108 WebTrends Implementation Guide Figure 6-5. Multiple filtering process Data aggregation Once your web activity data finds its way to either a web server or data collection server log, that data is processed and stored in an aggregated format in a set of summary tables. In addition to the log data in summary tables, you can also add data from external sources—for example, demographic data or customer data. In creating these summary tables, you have summarized the data in defined ways, and then you have discarded the raw data from which you aggregated the summary data. Summary tables can be used for data visualization with reporting tools such as WebTrends analysis products, Key Business Indicator tables, or external data reporting and visualization • Filtering and Analyzing Your Data 109 tools such as Crystal Reports. In addition, you can use these tables to perform deeper analysis with data mining tools, or you can run online analytical processing (OLAP) to uncover trends in your data that you might never think to consider. Table filtering As the data comes through the analysis process, the data will be put into various tables. Tablelevel filtering allows you to choose which data to include in a particular table. Whereas global filtering affects data that all of the tables are exposed to. With table filtering, you are selecting a portion of the entire analysis data set to be included in a single custom table. Default custom tables can include different subset of the analysis data. As with global filtering, table filtering can be based on hits or visit properties. Figure 6-6 shows an overview of the table filtering process. 110 WebTrends Implementation Guide Figure 6-6. Overview of the table filtering process • Filtering and Analyzing Your Data 111 Custom Reports WebTrends analysis products ship with a number of pre-defined reports that cover the information most organizations want, but every organization has its own, unique requirements for the web activity information it needs to see. This is where custom reports are particularly useful. Custom reports allow you to set one or two table dimensions—for example, you might want information about new visitors from a specific geographical region or with a certain income level. With custom reports, any dimension for which you have data, including any external data source you may have tied to your web activity data, can be tied to measures such as the number of page views, the number of visits, or the duration of a visit. If you need to narrow down what you view in the reports, you can apply filters to the report data just as you did when filtering the summary tables. WebTrends offers numerous dimensions for custom reports. Here are a few examples: • Most Recent Campaign • Product Manufacturer • Search Phrase • Lifetime Value Range • Day of Week WebTrends also offers numerous measures for custom reports. Consider the following examples: • Active Campaign Revenue • Daily Buyers • Daily Visitors • Order Value • Visitor Purchase Count Note: Not every measure-dimension combination makes sense. Some dimensions are very large and should be used wisely. For example, you don’t want to use unique visitor with referrer, because the virtually unlimited number of unique visitors and referrers would overwhelm your tables. Custom reports support data look up that translates coded information from your database 112 WebTrends Implementation Guide into more meaningful descriptions. See “Campaign IDs and translation tables” on page 126. Here are descriptions of several custom reports that may be helpful when you consider the data you might want to analyze: Buyers versus non-buyers by time period This report lets you see how many of your web site visitors purchase products from your web site. Compare the number of visitors who make purchases (buyers) to those who do not (non-buyers) by time period. Content group duration This report provides insight into which areas of the site are most attractive to your visitors. Analyze the content areas for possible cross-promotions, or analyze over time to interpret content popularity. Demand Channels This report shows activity occurring during the report time period segmented according to the demand channel of the last campaign to which a visitor responded. Geography drilldown This report provides a drilldown presentation of the geographical information (region, country, state/province, city) relating to the visitor’s IP address. The WebTrends GeoTrends Database is required to get complete information down to the state and city level. Marketing programs This report shows the marketing programs for the most recent campaigns that drove traffic to your site during the report time period. For the report time period, all conversions and other activities are tracked and attributed to the last campaign to which visitors responded. Thus, even if the conversion does not happen on the first visit generated by the most recent campaign, the appropriate source is “credited” with the conversion. Purchase conversion funnel by search phrase (all) This report helps to understand how the usage of all search engines and phrases correlates to conversion activity on your site. This report includes both organic (for example, natural search) and paid (for example, pay-perclick) search referrals. The conversion funnel allows you to analyze each step of the purchasing process to determine specifically where users are dropping off and which percentage completes the checkout process. • Filtering and Analyzing Your Data 113 Sales cycle by product This report page shows the number of days between a new buyer’s initial visit and first purchase for each product. Figure 6-7 shows the number of days between a new buyer's initial visit and first purchase. Figure 6-7. Sales Cycle (New Buyers) report 114 WebTrends Implementation Guide Parent-child profiles—a structural alternative to custom reports and/or filters Dividing the web traffic rather than filtering it is often an efficient alternative to custom reports. Many companies have one web site and one web server that generates all of their web-activity data files. In particular, large companies with many divisions may require a more complex way of dealing with their data files, because each division may have responsibility for a portion of the web site. Since each division will want reports that are tailored to the needs of that division (but not to the needs of other divisions), you have to generate hundreds of different kinds of reports. However, all of the activity is gathered in one data file, yet you don’t want to reprocess that data file hundreds of times to get the reports. You want to read the data file once and generate all of the reports that you will need for each division and a summary report. The reports will be basically the same, except that each report will contain only the specific piece of data that relates to a particular division of the company. The parent, then is the company at large and the child is each division. In other words, parent-child profiles/reports are typically used by multi-domain organizations (for example, service providers or large corporations) to simplify administration. A parent profile specifies the global settings that will be applied to any child profiles, and specifies when to create a child profile. In many cases, the presence of a new domain or sub-domain could trigger the creation of a child profile, or in some cases, the presence of a parameter in the URL is used. An example of this would be the creation of a child profile for a major content area of a site, if a complete set of reports is required for that content area. The parent profile automatically creates child profiles based on your criteria, which point to a limited set of your web data. The child profiles then analyze the subsets of your data. Parent-child profiles/reports can also be applied to content groups. You may be interested in the web activity for a particular content group, and you may have a number of different content groups that you want to examine. Therefore, several divisions of a large company could be interested in the reports relating to a particular content group. The parent in this case, is the company at large, but the profile/reports on a content group represent the child. Reducing profiles and increasing productivity Generally, there are a couple of major reasons why you may want to establish new profiles: • To see the full set of reports on a subsection of available data • To apply different filters to divide your web activity data into segments. If you would like to report on and analyze a particular portion of your site, you can create a • Filtering and Analyzing Your Data 115 new profile that only considers that section of your site. But if you look at the depth of analysis you need for this section of your site, creating hundreds of reports all specific to that section of the site may be overkill. It might be best to instead create a few custom reports that show you the traffic volumes and campaigns that are driving traffic to that section of the site. Likewise, if you need to apply different filters to the same segment of data (for example, one campaign versus a second campaign), you could create a separate profile for each campaign. Again, though, it may be excessive to create the hundreds of reports created by a full analysis profile. Better instead to consider custom reports for each campaign. Summary Filtering allows you to narrow down the volumes of web activity data to just the data you want to examine. Different types of filters can be used to focus on just the types of data you wish to analyze. You can apply filters to each line in the web activity log using hit filters, or you can apply filters to visits using visit filters. Visit filters are applied after the individual hits have been filtered on and the visit data has been sessionized. You may also specify which data to include or exclude from your reports. Indirectly, this is a way of reducing, or filtering, the data that you see in reports. The benefits for filtering data include not only reducing the amount of data that you need to store in your tables of aggregated data, but also making the amount of data you do want to examine more manageable. Finding the Features in WebTrends Products You will find the topics discussed in this chapter in WebTrends. Filtering Click on Web Analysis > Profiles & Reports > Edit a profile > Advanced > Hit Filters or Visitor Filters Custom Reports Click on Web Analysis > Profiles & Reports > Report Configuration > Custom Reports 116 WebTrends Implementation Guide Filtering Worksheet Use the following worksheet to help understand what kind of filters you need. Consideration Yes No Comments Do you plan to use image files (such as .jpg, .gif, or .tif files in your analysis? Do you plan to include spiders and robots in your analysis? Do you plan to include hits from people within your own company who look at your web site? Do you need high-level reports on ad campaigns or more reports on browsers and technical information? Is visitor segmentation important to your analysis? Is site segmentation important to your analysis? Does your company have many divisions requiring parent-child profiles? (Note: Parent-child profiles are only available in WebTrends Professional and WebTrends Enterprise.) • Filtering and Analyzing Your Data 117 118 WebTrends Implementation Guide Chapter 7 Acquisition Metrics Introduction Nearly every web site shares three fundamental web analytics objectives: acquire more qualified visitors for the lowest cost, convert these visitors into customers, and retain these customers for repeat business. Acquire more qualified visitors From online marketing to offline marketing, the first step in winning new customers today is driving new traffic to your web site. But all traffic is not equal. You need to drive the most qualified visitors for the lowest cost. With WebTrends you can get a complete picture of campaign response, campaign conversion and overall return on investment (ROI). As a result, you can pinpoint exactly which campaigns are working and which aren’t. This chapter discusses acquisition in more detail. Convert more visitors by analyzing click-by-click behavior Whether your web site goal is for visitors to register, make a purchase, or get technical support, conversion rate is a critical measure of your site’s success. WebTrends provides the most comprehensive navigation analysis in the industry, allowing you to track visitors click-by-click, identify confusing navigation and minimize abandonment. Isolating problem areas in your site and experimenting with improvements can have a big payoff. See Chapter 8, “Conversion Metrics” on page 139 for more information about conversion. Retain more visitors by segmenting those most likely to return Once you’ve persuaded visitors become customers, you need to retain them as loyal, returning customers. It typically costs 5-10 times more to acquire a new customer than to keep an existing one. WebTrends allows you to evaluate the effectiveness of your loyalty campaigns such as customer newsletters by how • Acquisition Metrics 119 recently and how frequently visitors are coming back and engaging in repeat business. Now you can measure whether or not you are increasing the average lifetime value of your visitors. See Chapter 9, “Retention Metrics” on page 159 for more information about retention. What the Business Person Wants to See Business people need to optimize the effectiveness of their marketing expenditures. They need to run campaigns that drive qualified traffic to their web sites. They make decisions regarding spending more money on tactics that work and reducing the amount on lessefficient areas. The decisions regarding the acquisition of visitors are some of the most important decisions that business people make because the process of acquiring visitors (such as creating ad campaigns, using marketing resources, outsourcing some areas) is expensive. Acquisition data can help you determine if your marketing tactics are successful. With WebTrends you can easily get reports on valuable metrics that reveal how many visitors came to your web site, whether they converted to registering or paying customers and how much value they brought to your organization. Acquisition data can also tell you how comparisons perform over time and which customers have the highest lifetime value. Entry/Landing page The first page that a visitor sees on your web site is called the entry or landing page. This is the most important page in your web site, because it provides the initial impression for your visitors and influences whether they will continue to look at other pages of your site. Entry pages can tell you whether or not people more often start at your home page or jump to the middle of your site—usually via a bookmark or link. Consider also that at the entry point to your site, visitors have not yet begun to navigate around the pages of your site. This may be an opportune time to guide them in the direction you want them to go. From these pages, you can promote areas of your site that you want them to see by putting noticeable links to those areas. In addition, entry pages usually provide good advertising real estate if you sell ad space on your site or promote your own products or services. 120 WebTrends Implementation Guide Basic entry page usage The Entry Pages report is the lowest-level report, because you don’t know exactly how visitors got there. That is, your visitors could have used an ad campaign, a search engine, or some other mechanism to get to your page. Yet this page may help you to pinpoint the pages on your site to improve. Based on this report, you can find your leading entry pages and improve them in order to represent your company in the best way possible and direct visitors to other pertinent pages of your web site. Figure 7-1 shows a sample report that identifies the first page view of a visitor at a site. Figure 7-1. Entry Pages report. • Acquisition Metrics 121 In this sample report, “Pages” refers to any document, dynamic page, or form. Different types of profiles have different default settings for which file extensions qualify a file as a page. “Visits” refers to the number of visits where the specified page was the entry page. A visit is a series of actions that begins when a visitor views the first page from the server, and ends when the visitor leaves the site or remains idle beyond the idle-time limit. Also, in this sample Entry Pages report, the home page or “Welcome Information” page is the top entry page. However, many visitors entered first through the products and store pages. Perhaps many of these visitors entered because of an ad campaign. If so, this ad campaign may deserve more scrutiny, because the company may have spent quite a bit of money on attracting customers via that campaign. The information in the Entry Pages report can indicate how you might want to optimize the architecture of your web site based on where your visitors are entering. It can also help you determine which external links are most effective. You may want to consider updating META tags and links. Advanced entry page usage You can make your entry pages useful by creating specific landing pages for each campaign and making sure that each landing page is not linked to anything except the specified campaign. That is, only the intended campaign should link to the landing page—nothing else on your web site should link to it. Then the landing page redirects the visitor the page that you want them to view. By themselves, entry pages are not that interesting, but if you can design them into your web site for analysis, then you can determine who came to your web site because of a particular campaign. Collecting the Right Data Web sites can employ a variety of mechanisms to drive traffic to a specific web site and track its success. In general, visitors will find your site through: • Ads (banner, email, traditional media) • Searches • Links • Directly through bookmarks 122 WebTrends Implementation Guide The following subsections discuss the mechanisms that help visitors find your site. These mechanisms are referrers, ad campaigns, search engines, and email marketing efforts (such as newsletters). Referrers Just as a doctor receives new patients from a referring source—such as another doctor or a current patient—a referrer, or referring URL, is the page on another web site that linked visitors to your site. Referring URLs tell you where your visitors came from to get to your site. You can use this information to determine which external sites are the best ones to place links on, or ads for, your site. This information can also convince you to develop or maintain positive relationships with these sites so that they will continue to offer a link to your site. How do you determine what the referrer is? The record in the data file contains the page that was visited before the page represented by a particular entry. So you can ascertain the referrer for each page from the record in the data file. But more interesting is what initiated a visit to the site. How do you determine the referrer for the visit? This is done by taking the first hit in the visit, looking at that hit’s referrer, and calling that the visit’s referrer. Therefore, all of the referrer’s URLs come from the first hit of the visit. Figure 7-2 shows the domain names of sites that refer visitors to your site. • Acquisition Metrics 123 Figure 7-2. Activity by Referring Site report From this sample report, you can get basic information. However, if you have several different ad campaigns on Yahoo, this report doesn’t reveal which one is working best. Consequently, the referrer reports provide general, low-level feedback on your efforts. For more specific information, you will need reports on ad campaigns, search engines and email marketing. 124 WebTrends Implementation Guide Referring site, domain, or URL Some web sites may have multiple links to your site. If you only want to know what site referred the visitor—not the individual pages on the site that contained a link to your site— you would need to strip out all parts of the URL except the site or domain name. This lets you discover which sites or domains refer visitors to your web site the most rather than diluting visits from the same site just because the links were on different pages of that site. Here’s the breakdown: A site, such as www.referrer.com may have several domains, for example search.referrer.com, your-referrer.com, my_referrer.com, ourreferrer.com. gobbeldygook.com, and referrer.com. All of these domains are do referring work for the main site www.referrer.com. Each domain name may have several IP address. For example, 217.194.141.67, 205.186.88.66, 66.67.2.10, 66.231.3.73, 199.221.98.4, 65.56.41.37 might all be used by the www.referrer.com domain. So your reports can give you the site name, the domain name, or the URL information. It depends on the level of information that you want. The self-referrer issue Using the referring URL to determine how your visitors came to your site has one major drawback: your own site can appear to be the referring URL. This self-referring circumstance occurs when a visitor begins a visit, leaves your site open in the browser window, stays inactive beyond the 30-minute visitor session window, and then becomes active again. Once the 30-minute threshold is crossed, WebTrends considers this to be a new session; however, this new session will register your site as being the referrer. For this reason, instead of using the referrer page or URL, you may be better off using other means of tracking how visitors come to your site. One useful method is by tracking visits from ad campaigns. No referrer - direct traffic “No Referrer” represents direct traffic to the web site as one of the following: 1) the visitor typed the domain name directly into his/her browser, 2) the visitor bookmarked the site, 3) the visitor has the page set as his/her home page, or 4) the visitor clicked on an email link, shortcut, or other direct link. • Acquisition Metrics 125 Ad campaigns Advertisements can come in many forms, including ads on other sites, popup ads that are triggered, and links embedded in email campaigns. Here are some broad definitions of ads that are frequently used: Web-based ads These ads include “banner ads” that appear on the web pages of sites that your best prospects are likely to visit. Web-based ads have many forms such as text, moving graphics, a call to action (“Click here to download …”), Flash or streaming banners, pop-ups, and pop-unders. Newsletter-based ads These ads are directed at publications that your prospects are most likely to be reading. With newsletter ads, you can often choose from among sponsoring the newsletter, sponsoring a column or feature in the newsletter, or placing an ad that will appear among other ads, usually as a text ad. Campaign IDs and translation tables You can manage your campaigns by using campaign IDs and translation tables to convert the campaign ID into meaningful information. For example, you may see a campaign ID in your data files such as campaign=721 WebTrends allows you to have a text file that (at analysis time) can translate all of your campaign IDs into their corresponding campaign names. Redirect pages Many ads are designed to initially route the user through a redirect page before they can view the ad content. This redirect page quickly and imperceptibly bounces the visitor to the actual page with the ad content, recording the redirect page as the entry page for the session because it was selected first. (Here, the first hit recognized as an ad campaign in the visitor session is counted.) If each redirect page for each placement is distinct from the others, you can track which version of the ad most often took you to the ad’s content. 126 WebTrends Implementation Guide Let’s say you have two online ads for your product, one on Yahoo, and one on AOL. In addition, you sent an email to potential customers with a link that takes them to the content. If you wish to track them all separately, you would create a separate redirect page for each one. In this scenario, you might have the following pages: Yahoo Ad: /redirect/yahoo_ad.htm AOL Ad: /redirect/aol_ad.htm Email Ad: /redirect/email_ad.htm By tracking visits to each of these redirect pages in the top entry pages, you can see which ad placements most effectively bring people to your site. Figure 7-3 illustrates the redirect process for WebTrends using the web server data collection method. Remember that clientside tagging will not give you this information unless the redirect page has the proper script (see “Drawbacks of client-side tagging” on page 52). Figure 7-3. Redirect process. Using this illustration, if you looked in the web data activity file you will see a two-step process: • Acquisition Metrics 127 The first web data activity file entry: GET YahooAd.htm - 302 - yahoo.com This took the visitor from the Yahoo.com to the Yahoo redirect page (YahooAd.htm). Status code 302 means that you were redirected. The second web data activity file entry: GET PromoAd.htm - 200 - YahooAd.htm This took the visitor from the Yahoo redirect page (YahooAd.htm) to the promotion ad (PromoAd.htm). Status code 200 means that you were successful. Figure 7-4 shows a sample report of top referring pages. 128 WebTrends Implementation Guide Figure 7-4. Referring Page report In this sample report, “Page” refers to any document, dynamic page, or form. Keep in mind that different types of profiles have different default settings for which file extensions qualify a file as a page. Any URL containing a question mark is considered a dynamic page. If “Direct Traffic” is 100% of all your traffic, then your web server is probably not logging the “referrer” field in your data files. • Acquisition Metrics 129 You can use WebTrends to create a campaign profile and track either entry or referring pages. However, some ads have several possible referring pages with long, complicated URLs. As a result, it can be more difficult to look up and define a referrer when you set up a campaign profile. Tracking multiple campaigns Re-direct pages are great for handling a handful of campaigns, but if you’re doing hundreds or thousands of campaigns, this method is impractical. Instead you should use a parameter field containing a parameter ID. The ID can be used to identify all of the attributes that make up the campaign, such as site name (for example, MSN, Yahoo), program (3rd quarter ProductName upgrade), offer (25% off), creative type (120x120 GIF banner), creative (race car image), and so forth. You can then use a translation file (via WebTrends script or custom table lookups) to create reports on which attributes are most effective (for example, did the race car image do better than the Flash movie of a tornado, or was the 25% off offer more effective then the free year of support). Off-line acquisition techniques Entry and redirect pages are handy for off-line acquisition techniques. For instance, if you place an ad in a newspaper or magazine telling people to go to your site, you might get them to type www.YourCompany.com/UpgradeOffer, but you are unlikely to get them to type in something like this www.YourCompany.com?CID=C42-61AF. Search engines Search engines play a large role in acquiring visitors. Whenever someone uses a search engine, there is the chance that they will use a keyword that triggers links to your web site. Search engines typically come in two flavors: Paid Search Engine You pay a fee for every person who clicks through to your site, and you have to monitor which keyword phrases are bringing you the best visitors. With Paid Search Engines, you need to evaluate the effectiveness of money spent. Organic Search Engine You pay nothing for visitors who come to your site. You monitor which keyword phrases are bringing you the best visitors. With Organic Search Engines, you evaluate the effectiveness of time spent. 130 WebTrends Implementation Guide People use search engines when they don’t know the name of your site or have no other direct link to click or distinct URL to type in their address box. Web site designers go to great pains to figure out how to get recognized by these search engines and appear in the “top 10” list that appears when a search is performed. Research consistently shows that more than 80% of web visitors use search engines to find what they need. The longer users are online, the more likely they will use search engines and make purchases. Since most web users believe that those sites that show up in the top of the listings are the most important sites, you must take every reasonable measure to make sure your site ranks highly with search engines for the search keywords and phrases that your most valuable prospects use. If you can’t get good rankings by optimizing, you can always try payper-click advertising options, which most of the search engines offer. But search engine technology constantly changes. What you did to get a search engine to effectively recognize your site or page today can have marginal results only a few months later. And each search engine has its own proprietary method of creating a result list based on the search phrases or keywords a web user enters. These lists use the search keywords and phrases to create a list of what they interpret as being the most relevant sites. Search engines also use a host of other factors, including how often visitors click on the link to your site from within their list, and how many of the more popular sites containing related content have hyperlinks to your site. Most search engines also let you register with them, and by paying them to place your site in their index, you can get more exposure than if you’d left it up to chance to get noticed. The element that you have direct control over in this mix is making sure that the keywords you planned for visitors to use to get to your site actually make your site appear in the search results. By reviewing the top keywords or search phrases entered by visitors, you can find out if those keywords are driving people to your site. If not, you can modify your web page content to promote your site with search engines—based on those keywords. Some common ways to modify that content involve including the keyword or phrase in the description and keyword meta tags, and increasing the frequency with which you use the word or phrase in the HTML title, a headline, and first few paragraphs of the page. These methods will improve your chances of being found and promoted by a search engine. Note: Search engine optimization is not the focus of this guide. Please consider other resources for a complete discussion of this constantly changing topic. You can use WebTrends to find the search engines that are used most often by visitors to arrive at your site. You might want to register with search engines if you find that your site is not being noticed. With WebTrends you can generate reports on organic search engines (non-paid search engines) and paid search engines. Figure 7-5 shows a sample report about most recent search engines. • Acquisition Metrics 131 Figure 7-5. Most Recent Search Engines (All) report WebTrends allows you to compare this information with information from a report on the most popular phrases for your site. Figure 7-6 shown a Most Recent Search Phrases report. 132 WebTrends Implementation Guide Figure 7-6. Most Recent Search Phrases report. Using the information from Figures 7-5 and 7-6, you can compare search engine rankings with the popularity and competitiveness of phrases to get a complete picture of how the web site is performing. Search engine rankings allow you to understand where your site shows up in the list of search results for certain phrases; for example, if you have a phrase that performs particularly well in terms of conversion, but your search engine ranking is low, you may want to try for more highly qualified traffic by boosting your ranking. WebTrends can also analyze paid and organic search engine usage and generate reports that show the total effectiveness of your search engine marketing and optimization strategies based on activity, depth and duration of visit. You can receive separate reports on paid search engine, or organic search engine, or both. • Acquisition Metrics 133 Email marketing When you want to reach prospects’ inboxes, but you need to say more than your would in a newsletter ad, you might consider using direct email and your own customer database, as well as renting a marketing list. You can also email to your in-house list of registered visitors, who have opted-in to receive communications. By using email marketing, the recipient can click on a link to your web site, and this visit is automatically recorded and catalogued by WebTrends. You can use WebTrends to track email campaign results via entry/landing pages as a primary or complimentary metric to the other measures produced by email solutions. WebTrends can help you to determine how far recipients get into the conversion process, as well as what they do once they’ve completed the process and on subsequent visits. Advanced email solutions will track clickthroughs to the site, campaign conversions and revenue—and in some cases visitors’ clickstreams/paths—but this is where the overlap with web analysis solutions ends. Unless the visitor’s activities are tied directly to the campaign, meaning the visitor entered your site through the link contained in your email, viewed campaign details/pages, and converted on the campaign offer, most email solutions will not measure it. You can make your entry pages useful by creating specific landing pages for each email marketing campaign and make sure that each landing page is not linked to anything except the specified campaign. That is, only the intended email marketing campaign should link to the page—nothing else on your web site should link to it. Then the landing page redirects the visitor to the page that you want them to view. To analyze the detailed interactions your email visitors have with your site beyond summary campaign information such as the number of responses and conversions, you will need a WebTrends solution. If visitors left campaign-centric pages, where did they go? What content groups or products (beyond the one featured) most interested them? Did email recipients purchase products that weren’t featured in the campaign? All of these questions can be answered by using WebTrends. Figure 7-7 shows a report that provides information about all types of campaigns, including e-marketing. 134 WebTrends Implementation Guide Figure 7-7. Campaigns report This report lets you compare different kinds of campaign types to see which are the most effective. Of course, the effectiveness is related to how much money you are spending on each campaign. Tracking multiple email campaigns You can use re-direct pages for handling a handful of email campaigns, but if you’re doing hundreds or thousands of campaigns, this method is impractical. Instead you should use a • Acquisition Metrics 135 parameter field containing a parameter ID. The ID can be used to identify all of the attributes that make up the campaign, such as site name (for example, MSN, Yahoo), program (3rd quarter ProductName upgrade), offer (25% off), creative type (120x120 GIF banner), creative (race car image), and so forth. You can then use a translation file (via WebTrends script or custom table lookups) to create reports on which attributes are most effective (for example, did the race car image do better than the Flash movie of a tornado, or was the 25% off offer more effective then the free year of support). Summary Acquisition is the most expensive step in getting visitors to your web site. Monetary expenditures on advertising, search engines, newsletters, and similar campaign efforts often make up the large share of a company’s budget. But without visitors—especially qualified visitors— your web site is meaningless. Once you have customers, you can work on converting and retaining them. Fortunately, conversion and retention are far less expensive. Finding the Features in WebTrends Products You will find the topics discussed in this chapter in WebTrends. Entry Pages and Referrers Click on Web Analysis > Report Configuration > Campaigns > New Campaign Ad Campaigns Click on Web Analysis > Report Configuration > Campaigns To create a report about ad campaigns, Edit a sample profile and click Visitor History. Make sure that Campaign History is checked. Search Engines Click on Web Analysis > Report Configuration > Custom Reports > Reports or Dimensions To create a report about search engines, Edit a sample profile and click Visitor History. Make sure that Search Engine History is checked. 136 WebTrends Implementation Guide Acquisition Metrics Worksheet Use the following worksheet to help understand how you want to acquire visitors. Consideration Yes No Comments Will you be using ad campaigns to drive traffic to your site? How many of these campaigns do you need to track? Do you complete test campaigns before you begin the real ones? Note: Test campaigns can help you understand which campaigns work the best. Will you use redirect pages? Do you intend to outsource the creation of your ads and the serving of your ads? Will you track referrers? • Acquisition Metrics 137 Consideration Yes Are you relying on statistics from organic search engines? Are you using paid search engines? Will you use a email newsletter campaign? 138 WebTrends Implementation Guide No Comments Chapter 8 Conversion Metrics Introduction After you have attracted visitors to your web site, you can measure how often the visitors take an action in line with what you intended. In other words, conversion means getting visitors to do what you want. For commercial web sites, conversion usually means how often visitors convert into paying customers. However, many commercial sites are interested in “lead generation” in which a sales lead may generate a potential conversion to a paying customer later. In either case, the metrics involved with conversion measure the process by which you persuade visitors to take the actions that you intended for them to take. Your conversion rate is a measure of your ability to persuade your visitors to take those actions. The following scenarios are examples of conversion: • Visitors purchasing products • Prospects registering for more information • Customers using your self-service section • Investors dowloading your annual report • Employees using your internal site to schedule vacations • Visitors registering for the site’s newsletter or to enter contests The conversion process may involve several steps through your site as visitors navigate their way. Conversion analysis helps you evaluate which types of content successfully support conversion. First-time visitors vs. repeat visitors Conversion is not the process of doing, rather it is the process of a non-doer becoming a doer. Consequently, you may want to filter across visitor segments to see what first-time visitors and first-time buyers do rather than what repeat visitors do. This means running a filter on a profile and doing some custom table filtering. Getting a new visitor to convert is a sign of success. • Conversion Metrics 139 Figure 8-1 shows a report comparing the number of visits by new and returning visitors to your site. Figure 8-1. New vs. Returning Visitors report Monetary considerations Conversion is the beginning of the rewards for having spent so much time and money on the acquisition step. Retention (discussed in Chapter 9, “Retention Metrics” on page 159) involves the process of how you minimize the ongoing cost. It is much cheaper to keep a customer happy than to get a new one. 140 WebTrends Implementation Guide Understanding Navigation Measurement Navigation measurement is one of the most fascinating areas of web analytics. You can theorize about why certain things happen on your site, but to draw any firm conclusions, you need to understand how visitors use your site by the paths they take within it. Knowing how visitors navigate your site can help you determine what types of content interests your visitor. It can also help you identify trouble spots that may have caused visitors to exit your site. Understanding where visitors go on your site helps you answer questions such as: • Did your visitors only view the top-level pages, or did they delve a little deeper to see details about a certain topic? • Where on your site are people running into dead ends or backtracking? You know that the fewer clicks a visitor has to make to get to the information they want, the higher their satisfaction with their overall experience. Consequently, you want to make sure that they aren’t having difficulties locating information. • When people go to the Contact or Support sections, what seems to be driving them there? Are they tending to come from certain areas of the site that you should examine more closely? • Where and why are shoppers deviating from the “ideal” straight-through checkout process that you created? Should you change the order of the steps or provide certain information earlier in the process? Is your site design causing visitors to go in circles? • If people abandon your site before getting the information or doing the transaction that you designed for the site, why are they leaving? Observing their first few clicks into the site—or into each section—can help identify the pages that need to be examined for confusion, inconvenience, lack of information, poor visual appeal, or other obstacles. • Can you get similar information from looking at the last few pages of aborted visits? • You have an idea of what constitutes a typical or ideal visit, but are you oversimplifying? Are there really several kinds of visits? What are they? Is your site designed to work well for many kinds of visitor “missions?” Are you ignoring the needs of an important group of visitors? • You would like people to visit more of my site than they do. Where are the best places to encourage visitors to explore new parts of the site? • Conversion Metrics 141 Path analysis “Where visitors go on your site” is actually called path analysis (also known as clickstream analysis). Path analysis lets you discover whether visitors are navigating your site the way you expected them to, and if not, where they are going instead. Path analysis can also help you track movement between pages, or can take advantage of your content group settings to track movement between groups of related content. Different approaches to path analysis provide different types of insight into your visitors’ activity. You can take a free-form approach and track the top paths starting with the entry page. This analysis lets you know where visitors began and where they went on your web site. Or you can look at the most popular routes on your site. You can also narrow or focus your approach by examining certain hot spots on your site, examining which paths led visitors to hot spots and which paths followed from the hot spot. WebTrends excels at path analysis, providing comprehensive information about the navigation of visitors on your web pages. Complete path A complete path means that you track all the pages that a visitor traverses during a visit session. This is virtually the same as manually examining each hit in your web data activity file or your SDC-generated web data activity file. If you took this approach, you would have so much data to interpret that you would never be able to recognize patterns in that data. Plus, the amount of data your system would have to process would tax your server’s performance considerably. So how can you narrow down the data on all of the paths? Focused path Typically, you know the pages that are of particular interest to you in your site—the significant pages. So rather than tracking all visitor paths through your site, just track the paths to and/or from significant pages such as entry pages, exit pages, the home page, search pages, shopping cart, or registration pages. Do so would narrow down the scope of how much data you’re viewing, providing far more focus than you would get by tracking every page. That is, by considering less data, you have the bandwidth to research deeper. Consequently, you can track to the depth that you want. On anything other than a simple site, you will still encounter so many paths to or from a given page that meaningful patterns in visitor behavior may still be difficult to discern. It’s also possible that certain paths—though technically different—are content-wise the same. Consider Figure 8-2 in which visitors started at different pages to arrive at the Zedesco Search:Search Results page. 142 WebTrends Implementation Guide Figure 8-2. Paths, Reverse:Zedesco Search Issues report In addition, it is not always intuitive to look at the progression of pages along a path and easily understand exactly what that behavior indicates. Perhaps instead of seeing visits to the Wireless phones View page in particular, you want to see the level of interest in visits to all product detail pages. This is where you use Content Groups to group related product details pages. • Conversion Metrics 143 Complete content group path By grouping together pages that are equivalent indicators of visitor behavior, you can track broader patterns as visitors traverse a complete path through the various content groups you’ve created. In other words, you are applying meaning to a group of hot spots and the directions that visitors take in getting to or leaving the hot spot. But much like tracking the complete path through pages, interpreting your results can be confusing due to the volume of results. Once again, to obtain information that is far easier to handle and interpret, it may be best to focus on specific content group paths. Focused content group path A focused content group path is the select list of content groups, in order, that a visitor traverses in arriving at, or departing from, a particular content group. The results you get from this type of tracking offer extremely high levels of insight into how visitors are using your site. Content groups allow you to ignore visits to pages that are of no interest by simply omitting the page from any content group. If you are interested in seeing whether visitors move from the Store Product Page to Accessories to Ordering, or from the Main Catalog Page to Specific Product Information and then to Warranty information just before Ordering, you can ignore side trips to the Glossary page or Investor Relations page. The ultimate value of the content group method depends on the skill with which the content groups and their member pages are chosen. Part of your success depends on selecting the right groups and the right members for each group. The groups must be comprehensive enough to simplify the picture, but not so comprehensive that they contain within themselves patterns that should be exposed. Focused content group path analysis is an excellent way to classify visits, which can be the basis for a sophisticated redesign. Because most or all of a visit can be captured in a good content group path analysis, it is possible to see if the different functional parts of your site— defined by the content groups—tend to appear together. For example, if the Technical Information section of a site is visited far more often by people who visit a particular product section, and not by other visitors, it may make sense to add better links between these two sections or to beef up the technical content of the product information. Figure 8-3 shows a sample Product Content Group Paths report. 144 WebTrends Implementation Guide Figure 8-3. Product Content Group Paths report • Conversion Metrics 145 Tracking the road most traveled Planning and designing a site for web traffic is a lot like planning for road traffic. Road planners track how often people exit from one road to another to determine if they need to make a road or exit more accessible. Some obstacles—such as potholes, multiple stoplights, or an area that has high crime—will cause drivers to avoid taking the most logical route. Conversely, people take some roads more frequently than others because they have a good surface, no stoplights, or lead to a popular destination. In much the same way, you will want to understand where a visitor is most likely to go after viewing a specific page or content group. You’ll also want to know what page or content group most often preceded a visit to a specific page or content group. This is called single jump analysis. This type of analysis shows you if your visitors are going where you expect them to go. If they aren’t, you would want to look for obstacles that might be preventing them from following the path you want them to follow. By ensuring that people visit specific areas of a site, you can be sure that these areas have the opportunity to succeed. Single jump analysis can also provide insight into areas other than web site structure and design. If your customer support line experienced multiple calls about a specific product line, you might suspect that these products have problems. Similarly, if a single jump path analysis revealed that the content group most visited prior to the Technical Support content group was for a particular product line, you might quickly conclude that web visitors have concerns or issues about these products. Figure 8-4 shows a sample report of the most popular routes taken from a specific page (the Zedesco Homepage) on a web site. From that page, you can find the next most popular pages to which visitors navigated. 146 WebTrends Implementation Guide Figure 8-4. Path Analysis: Zedesco Homepage report Scenario analysis A more specialized case of path analysis is scenario analysis. This type of analysis helps you discover if people are visiting all the pages in a scenario that you intended for them to visit. You typically have an interest in seeing them complete the steps in the scenario because completion of the scenario often translates into revenue. By telling WebTrends the pages that make up a scenario, you can track how many people started the process and where along the way they dropped out. If dropout rates are significantly high on specific pages, you may consider factors such as poor site design or insufficient information on those pages. Scenario analysis also allows you to exclude from analysis any irrelevant pages that the visitor visits while completing the scenario. This is something that would not be possible if you were • Conversion Metrics 147 simply tracking a specified path through the site. The following is an example of one of the most commonly used web site scenarios—an online purchasing scenario, commonly called a shopping cart. The typical shopping cart scenario might include the following steps: 1. Open the shopping cart. 2. Add products to the shopping cart. 3. Start the checkout process. 4. Complete the order. The scenario analysis technique tells you what percentage of visitors who complete one step in the sequence also complete the next step. An obvious example is shopping cart completion, but the technique can be applied to a variety of other scenarios, including applications for services, storefinders, feedback forms, personalization processes, and some kinds of on-site searches. Figure 8-5 shows an Purchase Conversion Funnel report with entry and exit pages.This view shows where people entered the scenario from, and where they went to when they exited the scenario at that step, or abandoned the scenario. For instance, when a visitor leaves a step, visits another page (page X), then leaves the site, page X is shown as the exit page from the last scenario step. Note that in this report: • On the left-hand side, you will find the entry pages that lead to one step in the funnel. For more information about entry pages, see “Entry/Landing page” on page 120. • On the right-hand side, you will find the exit pages that show where you visitors went when they left that step in the funnel. For more information about exit pages see “Exit Page and Exit Ratio Analysis” on page 152. 148 WebTrends Implementation Guide Figure 8-5. Purchase Conversion Funnel report with scenario entry and exit pages • Conversion Metrics 149 In this example, the largest number customers dropped out of the process after opening the shopping cart. Only just over 40% of people who started a shopping cart actually added an item to the cart. Interpreting these results depends on many variables. Whether or not a visitor starts a process, such as a purchase, is often more dependent on merchandising issues and perceived value than on site design. In contrast, whether or not a visitor finishes a process once they have started it usually depends on variables such as clarity or convenience. These variables are well within the control of the site designer. For this reason, scenario analysis of individual processes is an excellent tool for evaluating the effects of changes in the design of a process. After you configure WebTrends, analysis can be done on a before and after basis. Note that in the table that accompanies the funnel graph, the “Scenario Analysis Step” column lists the names of the steps in the defined scenario. Each step marks progress on the path that is being monitored. The Step Conversion Rate is the percentage of visits converted from the previous step in the scenario. Scenario Conversion Rate indicates the percentage of visits converted from the first step in the scenario. Sometimes the nature of scenarios is non-linear, meaning visitors may enter a step out of sequence. For instance, with a “Quick Checkout” process, a visitor may be able to jump from step 1 directly to step 4, and would never be counted in steps 2 or 3. Also, in the case of a visitor leaving the site at step 2, then returning later at that same step, this may cause the number of step 2 visitors being greater than those of step 1. WebTrends allows you to view these “Step Transitions.” This view focuses on how visitors proceeded from one step to the next, or through the scenario. If a visitor proceeded directly from Step 1 to Step 3, Step 3 will appear among the pages listed to the right of Step 1. Figure 8-6 shows the Step Transitions in the Purchase Conversion Funnel report. 150 WebTrends Implementation Guide Figure 8-6. Purchase Conversion Funnel report with step transitions You should be careful about which pages you select for your scenarios, so that you can determine problems. It pays to think through possible problem areas and to try using those pages as steps in the scenario you want to analyze. For example, you might find that visitors are abandoning your site at the page in which they are asked to state their address. Or they might be dropping out at the page that requests their financial information. • Conversion Metrics 151 Internal Search Another part of the conversion process takes place after visitors have found their way to a page containing an internal search feature. Visitors can use this search mechanism to find items on your site. Consider stores such as Powell’s, Amazon, or Barnes & Noble that have an internal search for books (and other items). By examining the keywords and phrases that visitors were searching for, you will learn what your visitors’ interests are. This information reveals explicit, rather than inferred, implied interest. You now know the words that your visitors are using to describe your content. This information can help you better organize your site, and it can help you to optimize your use of external search engines. Exit Page and Exit Ratio Analysis So now you understand various ways that people arrive at your site and some of the conclusions you can draw—based on how they got there. But what can you learn by knowing the exit page, the last page visited in a visit session? Leaving your site can be viewed as a failure of site design if the top exit pages were not where you expected your visitors to exit. Determining the positive versus the negative value of leaving via a specific page is relatively subjective, but it can suggest what on your site works, and what doesn’t. Figure 8-7 shows a sample report of last pages that visitors viewed before leaving a site. 152 WebTrends Implementation Guide Figure 8-7. Exit Pages report Visit-to-exit ratio The visit-to-exit ratio compares the number of exits from a given page to the number of visits to that same page. It is important to know what percentage of visitors to a page leave directly from that page, because pages that receive the most exits are almost always the most visited pages. To create this ratio for all of your site’s pages, simply start with the most important areas on • Conversion Metrics 153 your site. After you have calculated the ratios, you can review the pages with the highest percentage of exits per page view to prioritize the exit pages. This kind of information can often reveal a key page with a high visit-to-exit ratio that does not appear among the top exit pages. Dead-End Paths A dead-end path is a path in which the visitor goes from one page, to another, then returns to that original page. Dead-end paths can be both good and bad. In some cases, it can mean that visitors were looking for specific information, assumed that a given link would take them to that information, but upon arrival at the new page, realized that they had not found what they were looking for. This activity means that they are having trouble finding information. A dead-end visit can just as easily mean that the visitor followed a path out to its natural conclusion, and then came back to the previous page to continue looking for other information. A simple example of a good dead-end path can be seen with an online news site. The person opens the main page, clicks on the International News section, and then clicks on a specific article. After reading the article, they return to the International News section to select another story. This is exactly how you would expect these pages to be used. Gleaning Demographic Information Through Registration Forms Many sites require users to fill out a registration form when they reach a point in which they need to download some content or access more in-depth information on the site. These sites typically request varying levels of personal information too, depending on how much their audience is willing to reveal. Often, there is a delicate balance between collecting valuable information and alienating your visitor. Some web sites request information regarding gender, age, income, and a zip code. This allows the visitor to remain anonymous, yet still provides the web site owner with valuable demographic information. However, many sites do request more detailed information about the visitor. It just depends on what the site owner is trying to achieve by collecting visitor data. But how does visitor information get tied to an individual hit if there’s no authenticated user field to tie together hits by the same visitor? And where does the visitor information entered in the forms go? Just as you did in sessionization, you can identify the visitor by using a 154 WebTrends Implementation Guide cookie ID, the authuser field, or the IP address. Now let’s explore where the visitor information goes. Most online registration forms use the GET method of requesting content. With this method, information entered in the form can be attached as query parameters in the data activity file. There are two ways that these query parameters can then be used to capture visitor information, and they depend on the type of system you have set up to process your web activity data files—a web analysis program or a web data warehouse. Note: The GET method has a limit of 2000 characters. The POST method can also be used, but the content can’t be seen in the data activity files. Therefore, the GET method is preferred. In one method, WebTrends parses the hit (in the web activity data file) for the visitor information parameters you specified that it should locate. The WebTrends then takes that information and enters it into a database. With each new hit, the software checks the visitor identifier against visitors already in the database. If the visitor identifier is new, it adds a new row and adds visitor information to that row. If the visitor already exists in the database, the program attaches the hit information to that visitor record. The other method involves the use of a web data warehouse, a database that is designed to hold visitor information. You tell the warehouse which parameters hold specific web visitor information, and the warehouse parses the web data activity file, captures the visitor information, and stores it in a visitor database table within the warehouse. All behavioral information associated with that hit is also tied to the visitor via the visitor ID. Subsequent hits go through the same process. If the ID in the hit matches a visitor that has already been identified, only the behavior information for that visitor is updated. If the visitor has not yet been identified, then a row is added to the visitor table, and all the behavioral information from that hit is associated with that visitor. Note: For more information about warehouses, refer to Chapter 10, “Data Integration and Exploration” on page 171. Keep in mind that any issues you would encounter using cookie IDs or IP addresses to identify the visitor in visit sessionization, will also occur when using those same items to identify visitors. • Conversion Metrics 155 Evaluating Visitor Behavior by Browsing Your Site WebTrends SmartView displays a page from your web site fully rendered—as it appears to visitor—and annotates this page with results and metrics from analysis. In a companion window, SmartView displays the page’s metrics with reports for Page, Paths, Scenarios, or the Entire Site. This display makes it easy to evaluate the popularity of each individual page link with click-through, path, and scenario metrics superimposed on the page you are viewing. You can use SmartView to analyze page performance, providing insight into page conversion, path analysis, and overall web page statistics such as unique visitor counts. Figure 8-8 shows a typical SmartView page of the Zedesco web site. Figure 8-8. Sample SmartView page With SmartView you can get the a sense of where your visitors are going and relate the traffic to the actual visual appearance of the page. Consequently, you can see relationships quickly,— even ones you did not anticipate. This may lead you to rethink the page’s design or direct you toward new territory for further analysis. You might also want to use SmartView to doublecheck a hunch or an assumption. Since SmartView presents a higher-level and immediate view of the data, you probably will not use SmartView to publish reports on a weekly basis. 156 WebTrends Implementation Guide Summary Once you’ve told WebTrends how to identify visitors so that you can associate visitors with their behavior on your site, you can track the paths that those visitors take through your site. In fact, you can track the distinct pages they traverse through your site, and you can use your content group settings to track how they navigate through your site in terms of the types of content they viewed. Tracking pages can be useful in some cases, but typically you are more interested in getting a bigger picture of how visitors use your site. For this reason, you may prefer tracking paths through content groups rather than through pages. Finding the Features in WebTrends Products You will find the topics discussed in this chapter in WebTrends. Path Analysis Click on Web Analysis > Profiles & Reports > Edit a profile > Advanced > Path Analysis or Web Analysis > Report Configuration > Path Analysis Scenario Analysis Click on Web Analysis > Profiles & Reports > Edit a profile > Advanced > Scenario Analysis or Web Analysis > Report Configuration > Scenario Analysis Shopping Carts Web Analysis > Report Configuration > Scenario Analysis To create a report using shopping carts, Edit a sample profile and click Visitor History. Make sure that Purchase History is checked. Search Engines Click on Web Analysis > Report Configuration > Custom Reports > Dimensions To create a report about search engines, Edit a sample profile and click Visitor History. Make sure that Search Engine History is checked. • Conversion Metrics 157 Conversion Worksheet Use the following worksheet to understand how well visitors are converted on your site. Consideration Identify the top 5 key pages in your site that you want to see traffic moving to. What are the paths moving to and from those pages? Identify the scenarios (especially any registration or checkout pages) in your site. If you have an internal search feature, do the most popular keywords and phrases really fit your product? Are there other words that visitors should use? Should keywords be listed on a search page or other pages to help visitors make the associations you want them to make? Identify your dead-end pages. What is the meaning of each dead-end page? What kind of program can you set up to periodically measure the conversion rate to see if improvement has occurred? 158 WebTrends Implementation Guide Comments Chapter 9 Retention Metrics Introduction The vast majority of web sites need to retain their visitors. You’ve gone through a lot of hard work and expense to attract visitors and convert them into buyers or registered users. Now it’s time to keep those visitors. From a monetary perspective, retention involves the process of how you minimize the ongoing cost. It is much cheaper to keep a customer happy than to get a new one. Customers who return again and again have the highest value, which translates into profits for commercial businesses. To make retention work for you, you must find out more about your visitors and their behavior. Understanding your visitors and their behavior will help to answer the following questions: • On which visitors should you spend marketing dollars? When? • What can you expect in future sales from your existing visitors? • How do you predict which ads and products generate the best visitors? • What kind of incentives should you provide to get a visitor to do something you want them to? • Can you predict which visitors will be responsive to your program? • Should some visitors be contacted more often than others? • How can you put a value on your visitors and business as a whole, and project this value into the future? Visitor retention activities are an investment—with the expectation that the value of the investment will rise. But initially you’ve got to know more about your visitors and their behavior. • Retention Metrics 159 Visitor Segmentation and Behavior Segmentation By grouping, or segmenting visitors along lines such as gender, age, income, or location, and then comparing web activity between these population segments, you can learn a lot about whether you’re reaching your intended web audience. This is where visitor information gets correlated with behavioral information in visitor segmentation. That is, the who (visitors) becomes correlated with the what (their behavior). Behavior reflects what the visitors did. Which content groups and directories did they look at? What kinds of searches did they do? Who your visitors are and information related to them (demographics, referrers, entry point, browser, time of visit) is called visitor space. What your visitors do is called behavior space. Any slice of information relating to visitor is called visitor segmentation. Any slice of information relating to visitor behavior is called behavior segmentation. Figure 9-1 shows the relationship of visitor space and behavior space. Figure 9-1. Visitor space and behavior space Once you’ve identified the behavior of specific population segments on your web site, what now? This level of insight into your web visitor allows you to take action, if needed, to better capture the audience you want to attract. This is the information that lets you implement a continuous improvement cycle-you measure the activity for a given offer or ad campaign, make a decision based on that measure, take some action based on the decision, then you re- 160 WebTrends Implementation Guide measure to see what effect the action had. Let’s consider what might happen with a scenario in which a wireless phone company uses a cellular phone package to target 18 to 25-year-olds. The company might run an advertisement that web visitors access via promotions on ten different sites. These ten web sites were chosen because they are sites geared toward a younger crowd. When visitors link to the ad, before learning more about the package, they are prompted to fill out a survey that requests information on their age, sex, zip code (if applicable), and current occupation. After one week, the cellular phone company reviews which referring sites tended to send the greatest number of 18 to 25-year-olds–the target audience. At that point, the company continues paying for the promotion on sites that referred the most targeted visitors, but discontinues the ad on those sites that failed to do so. By tying web behavior to their web visitor, the cell phone company was able to quickly identify where their marketing dollars were effectively being spent, and where they were wasting their money. Even if you only learn about the behavior of visitors, you can move ahead. For example, you can compare the repeat rate of visitors generated by different banner ads or keyword phrases. Recency Number of days since the most recent visit of a visitor. Note that zero recency means that the visitor visited within less than 24 hours. Most businesses find recent customers to be more valuable than customers whose activity has been dormant for a long time. Frequency Number of visits since the visitor was first tracked. There’s a great deal of difference in value between a 100-time repeat visitor and a 2-time visitor. Latency Number of days between visits for visitors. Note that zero latency means that the visitor visited every day. Latency can be especially helpful for businesses where orders and contacts have a defined cycle (for example, a subscription-based business and businesses selling durable goods or high ticket items). All three measurements can be used to determine the potential value of your visitors. • Retention Metrics 161 Lifetime Value Lifetime value is a concept that applies to commercial web sites, because these sites need a long-term gauge for their repeat customers. Lifetime value represents the total sales generated since tracking a specific visitor began. Figure 9-2 shows the lifetime value of visitors to the Zedesco web site. Figure 9-2. Lifetime Value report Reports that reveal lifetime value have a great influence on the types of offers you might present your visitors. For example, the report in Figure 9-3 shows the lifetime value of buyers for the most recent campaign they responded too, and displays it in a drilldown. A drilldown enables users to examine this information at a highly summarized level, and navigate to successively more detailed levels of campaign data; for example, viewing lifetime value of buyers by demand channels, partners, marketing programs, marketing activities, campaign IDs, campaign descriptions and more. 162 WebTrends Implementation Guide Figure 9-3. Campaigns by Lifetime Value report If you run this report again a few months later and find that the average latency for most of your customers is increasing, then you will want to take action to correct this behavior. • Retention Metrics 163 Visitor History WebTrends allows you to collect the behavior of individual visitors over a period of time. This is called visitor history, and it is primarily used to track the activity of visitors’ purchasing behavior such as how well visitors have responded to advertisements, how much money they spent, how many times they bought something, and how many items they bought. General information about visitor history WebTrends stores a record of information per visitor. So, for every visitor, there’s a set of information recorded each time the visitor views a page. Each time the visitor returns to that page, WebTrends can compare the current activity with past activity and measure various attributes for that visitor such as: Purchase count Lifetime count of purchases from shopping cart Most recent purchase value The value of the most recent purchase Days before first purchase The number of days between a visitor’s first visit and first purchase Days since first purchase The number of days since a visitor’s first purchase Days since most recent purchase The number of days since a visitor has purchased an item In other words, visitor history allows you to measure visitor activity according to recency, frequency, latency, and lifetime value. Visitor history can help you to find out which customers you might lose. For example, the information you get from visitor history might cause your marketing departments to send special offers to customers who haven’t been active for a while. In general, visitor history can help you to convert one-time users into frequent users. 164 WebTrends Implementation Guide The visitor history records are stored in the visitor history database, which is “under the hood” of WebTrends. That is, you don’t see it or have to worry about it. The only thing you have to do is make sure that you activate the visitor history checkbox in the UI if you need visitor history for some analysis. The procedure is detailed in the “Finding the Features in WebTrends Products” on page 168. Specific information about visitor history Visitor history is all about storing a set of attributes on a per visitor basis. Then after a visitor generates new activity, WebTrends analyzes the attributes, comparing new information with older information. Here’s a complete list of attributes that are stored per visitor in the visitor history database: • Number of hits • Number of visits • Time of first visit • Time of last visit • Total number of seconds of visit time - added up from all lists of that visitor • Entry URL from a visitor’s first visit • Referring URL from visitor’s first visit • Referring URL from visitor’s first visit in which he/she bought something • Most recent referrer for a buying visit • The first ad campaign that brought the visitor to the web site • The most recent ad campaign that brought the visitor to the web site • The total of all the money that visitor spent on your web site over a lifetime • The total number of purchases made by a visitor • The time that the visitor made his/her first purchase • The time that the visitor made his/her most recent purchase • The search engines used by the visitor to get to your site • The search words/phrases used by the visitor to get to your site • Retention Metrics 165 WebTrends stores aggregated information about purchases. This aggregation is sophisticated enough to make fine distinctions such as invoice rejection. For example, if a visitor goes to a shopping cart site and accidentally submits twice on a purchase page, WebTrends can detect the unintended action and make sure that it will be counted once instead of twice. WebTrends can also detect an accidental bookmark to a purchase page and count that visit properly. Example usage of visitor history There are many ways to use visitor history to help retain your customers. Here are some examples. 1) Products and visitors with highest lifetime value Compare which products are being viewed by visitors with the highest lifetime value. To retain your most valuable visitors, you could send them special offers that are associated with the products they are most likely to purchase again. 2) Recency and lifetime value Compare recency with lifetime value and determine if some of your most recent buyers are ones with the highest lifetime values. If over a period of time you see that some of your most valuable customers are dropping off in their purchases, then you might make them a special offer. 3) Amount of time between first visit and first purchase Run a report to find the time of the first visit of some customers and then compare that with the time of their first purchases. Your will probably want to shorten the amount of time between that first visit and the first purchase. 4) Referring URL (or ad campaign) and lifetime value Run a report to list your top referring URLs (or ad campaigns) in relation to lifetime values of visitors they bring to your site. You might consider identifying the top three referring URLs (or ad campaigns) and work with the organizations that own them to increase your referrer rate. 5) Demographics and lifetime value Compare demographics and lifetime value to see what kinds of people have the greatest lifetime value. Such factors as age, sex, income level, and geographic location may indicate if you increase marketing efforts to one group or another. 166 WebTrends Implementation Guide 6) At-risk visitors/customers To find out about past visitors who have not been to a site in a number of days, you can use the recency metric and then decide if you would like to appeal to them (perhaps based on previous loyalty) with special offers. Unique Visitors, Unique Buyers People matter. The purpose of your web site is to present information to people and, usually, to encourage them to take some action such as purchasing. Hits and visits provide measures of what and when those people are viewing, but your real target is the people behind those actions. If you know how many of each type of individuals who come to your site, you can develop a strategy for changing the visitor’s behavior or for changing what you might offer them. It is at this point that identifying and counting unique visitors comes into play. In order to track unique visitors, you first need a means of unambiguously identifying each visitor. As discussed in Chapter 4, “Visitor Identification” on page 57, cookies and authenticated user names are the best solutions to this problem. Although unique visitors and unique buyers refer to the individual visitors to your web site, keep in mind that one unique visitor may view any number of pages on your site within the framework of a visitor session. Therefore, 1,000 unique visitors can generate 50,000 page views. WebTrends counts uniqueness by keeping track of daily unique visitors, weekly unique visitors, etc. by using a cookie. Figure 9-4 shows a tabulation of unique visitors over a 24-hour period. Figure 9-4. Visitor Summary from the Visitors Dashboard • Retention Metrics 167 After you have defined your unique visitors, you may be interested in certain groups of these visitors, such as those who have a lifetime value of at least $500. Or you could look at unique visitors who have a recency of once a day or once a week, and compare their lifetime values. In any case, by tracking the activity of these groups of unique visitors, you can adjust your marketing efforts and make special offers based on the information you find. However, if you have a web site with heavy traffic, there is no way you can keep a complete list of every visitor who has touched every page, every content group, etc., because the record keeping quickly expands exponentially into unmanageable lists. The issue is “counting uniqueness.” This means that you have to have a record for everybody who did something. Counting uniqueness translates into maintaining a complete list of visitors who performed a specific action. Then maintaining another list for another page. The numbers for each page get very large very quickly. For example, a web site with a million visitors and ten thousand pages has ten billion combinations to contend with. And that’s just for pages! The enormity of the problem of counting uniqueness affects web sites with fewer pages and visitors, too, because many of these sites want to know how many visitors touched their pages during a particular week or a particular month. That involves a time dimension. The numbers of records needed to keep track of this activity has just skyrocketed. Fortunately, with WebTrends, you can track visitor uniqueness over a period of time (daily, weekly, monthly, etc.) and begin to interact with your customers on a more individual basis. Finding the Features in WebTrends Products Retention metrics are enabled by activating the Visitor History tab. You can find the Visitor History tab by editing a profile and selecting the Visitor History tab. 168 WebTrends Implementation Guide Retention Worksheet Use the following worksheet to understand how well the retention of visitors is going on your site. Consideration Comments On which visitors should you spend marketing dollars? When? How often? When launching ads, do you target specific visitors or send out general information to all visitors? Which visitors will be responsive to your programs? Which visitors should be contacted more often than others? How can you put a value on your visitors and business as a whole, and project this value into the future? • Retention Metrics 169 170 WebTrends Implementation Guide Chapter 10 Data Integration and Exploration So far, this book has discussed what is often called data farming. That is, you figure out what you want to examine, and then you set up WebTrends to review those specific areas of interest. Just like a crop, you harvest these same pieces of information over and over again on a schedule. This lets you compare activity from one reporting period to another to get a sense of changes in visitor activity based on variables such as changes you’ve made to your site. But what if you have existing customer data that you would like to correlate to their web behavior? Or what if you just have a feeling that one dimension relates to another, or that several dimensions correlate significantly with each other, and you want to discover if your intuition is correct? At this point, you need the help of a web data warehouse and a tool that lets you report from the web data warehouse. A web data warehouse integrates the data that you want to explore. A warehouse also lets you 1) connect external data to your web behavior, and/or 2) export your web behavior to external data. External data, for example, may be information from a Customer Relationship Management (CRM) system or a customer database (with customer demographics). Using a web data warehouse is all about flexibility in analysis and reporting. WebTrends lets you look at your data in a number of different dimensions simultaneously. To view reports from the web data warehouse, you can use Microsoft Excel or another reporting tool. With Excel, you can make use of its PivotTable function to view and compare data in two dimensions (2D). You can also make graphs based on two measures as the X and Y axis. For more information on the Excel reporting solution, see “Deeper Reporting and Exploration Using Excel” on page 176. In general, a web data warehouse and an associated reporting tool (such as Excel) require more manpower, resources, and knowledge-power. Such work is for explorers and discovers. Note: Using a Web Data Warehouse may negatively affect performance in regards to log files. • Data Integration and Exploration 171 Data Integration and a Web Data Warehouse It’s important to understand the difference between using a web data warehouse with WebTrends for analysis and reporting rather than relying only WebTrends. A web data warehouse contains a database specifically designed to store web activity and web visitor data. Unlike the summary tables used by reporting tools in WebTrends, a web data warehouse actually holds onto detailed data rather than accumulating and summarizing it into daily, weekly, monthly, quarterly or yearly tables, and then throwing away the raw hit data. A web data warehouse uses a series of tables to capture and store web activity data. The Warehouse has a hit table with IDs that allow it to tie in to other tables containing hit data from processed web data activity files. Hit data is analyzed to create a visit table and some other tables with visit-specific information available in the hit such as the referrer for the visit, an ad campaign, or a content group. Each visit table record has a visit ID along with several IDs that allow it to match a given visit to the appropriate records in those related tables. The visit tables associate the web activity in a hit with a specific visit session. A web data warehouse also provides tables that hold visitor information—first name, last name, gender, age, email address, phone number, zip code, customer number—any information you ask your web visitors to provide about themselves that they’re willing to enter. These tables contain visitor IDs that are associated with visit information. Now you can perform queries on the database to correlate specific visitor attributes. Perhaps you might correlate age and/or gender with a particular web behavior, such as a visit to a particular ad. Consider the previously (page 161) discussed example of the ad for the cellular phone package. You could examine visits to the ad that originated from a given referring site made by visitors aged 18 to 25. Tying your data to external databases You can further enrich the data you have about your visitors by tying the data in your web data warehouse to external data sources such as demographic data. The key is that your webrelated data and the external data sources must have some variable in common so that you can match records from your web data to your external data. Because the web data warehouse is in a database form, it is fairly straightforward to join to an external database. Some warehouses have a mechanism by which you can join your web analysis results to an external source and then present that data in a custom report. 172 WebTrends Implementation Guide Demographic data Perhaps you have the state associated with each web visitor record, and you want to tie that activity into a database that describes demographics by state. Numerous databases exist that can help you segment your visitor population. For example, WebTrends GeoTrends provides demographic information. Let’s consider a straightforward scenario: Zedesco’s budget limits them to airing a TV commercial in only one state. If they are using their web site as a basis for deciding in which state to air the commercial, what information might they need? One of the most basic pieces of data they could look at is which states show the most web viewing activity, such as the most page views or the most visits. If two states show similar activity levels, the next step might be to see which state has the most buying power. To do this, they could tie into a demographic database that contains information on average income level by state. If they find that between the two states showing the most activity one has a lower average annual income, then assuming all other variables are equal, they’d air the advertisement in the wealthier state. Customer databases Joining web visitor information to web visitor activity is useful for marketing professionals as they try to more accurately target their marketing using the web. But you can also use your web activity and web visitor data for account management. You do this by joining the web activity of individual web visitors with their account contact data in Customer Relationship Management (CRM) systems such as Siebel Call Center or PeopleSoft. CRM systems are database-driven applications that are generally used to manage the information about an organization’s prospects and customers. These systems often contain information about customers or customer prospects, such as: • Correspondence • Contact information • Previous transaction information • Communication via email, phone, or regular mail Joining web visitor and web activity data to complex databases such as those used by CRM systems requires the structure of a web data warehouse. To join the two sets of data, you need one or more shared keys, or IDs, to match the records in one database with records in the other. Typically, this will be some visitor ID in the web activity database, and a customer ID in the call center database. Other possible shared keys between the two databases could be combinations of first and last names or email addresses. Figure 10-1 illustrates the shared keys between two databases. • Data Integration and Exploration 173 Figure 10-1. Shared key between two databases Joining web activity with visitor information lets salespeople understand their visitors’ interests with information such as: • Which web pages they visited • How many times they visited those pages • How long they stayed • Which products or topics they researched • How much information and interest they have about specific products as evidenced by the white papers, demos, or other marketing and technical materials they downloaded from the web site Service professionals can also use this combination of information to review a customer's web activity to prepare them for handling the customer’s issue. Useful information includes troubleshooting topics, frequently asked questions, or technical white papers that the customer has already examined. In addition, by reviewing how often specific troubleshooting topics or frequently asked questions are accessed, support organizations can determine if products or documentation have weaknesses or other issues that need to be addressed. Figure 10-2 shows an environment that is running machines that use web analysis and warehouse data. In this illustration, the client machine is able to view reports on the warehouse using a reporting application such as Crystal Reports. The warehouse can commu- 174 WebTrends Implementation Guide nicate with other sources of data, such as CRM or Enterprise Resource Planning (ERP) and wed that information with the warehouse data. Figure 10-2. Web analysis and warehouse environment Reporting from a web data warehouse While a web data warehouse provides an effective vehicle for organizing and storing your web data, it often doesn’t provide a means of reporting on that data. To view your web activity data from a warehouse, you need to use a reporting tool, such as Microsoft Excel. Here are the steps to use Excel to report from a web data warehouse. 1. Export the WebTrends data to Excel. 2. Export data out of the web data warehouse to CSV format 3. Import CSV-formatted data to Excel. It is important to note that the imported data be in the CSV file format defined by WebTrends. Also, Excel has a limit of 65,000 rows of data. • Data Integration and Exploration 175 Deeper Reporting and Exploration Using Excel WebTrends allows you to move beyond standard reporting to dig deeper into your analysis and compare several different variables with each other. If you find that your reports do not fully cover what you’re interested in examining, or do not view the data from the perspective you wish to view it, you can create new reports interactively, on the fly. You can do this kind of exploration by exporting WebTrends reports to Excel spreadsheets, called SmartReports, and then working with Excel’s PivotTables. Through SmartReports, you can also develop graphs and charts that correspond to the tables of data (using trend data). A typical use for SmartReport for Excel is to verify whether a correlation between several variables exists so that you can then structure your web analysis to generate periodic reports on those variables and track them over time. Another useful application of SmartReports for Excel is to combine web analytics data with external data, such as marketing cost or product cost, to calculate GMROI—for example, you can bring in the marketing cost or product cost data to calculate GMROI in SmartReports. After you have used Excel to reveal specific gross margin trends, you can track your variables over time and chart them in SmartReports for further insight. For example, you can calculate gross margin trends and chart the sum of gross margin revenue by campaign for insight into which campaigns are most successful for you. To export your WebTrends report to SmartReports, you can click the Export to Excel icon, which is shown in Figure 10-3. 176 WebTrends Implementation Guide Figure 10-3. Exporting to Excel An Excel Wizard takes you through several easy-to-use steps before generating the report. It’s important to be aware that the more dimensions and the longer the time period you specify and export into Excel, the more calculations that must be performed and the harder your system has to work. Important: Excel is limited to 65,000 rows of data. Drill Down capability With Excel, you can drill down in the report to discover more critical pieces of information. This capability can be especially useful when you are dealing with a hierarchy within the dimensions you’re analyzing. For example, if you had an outdoor gear store, each product category might have a subcategory, and within that subcategory, you might have a further division. The following table shows how this might look: • Data Integration and Exploration 177 Table 10-1. Categories and Subcategories Product Category Subcategory Level 1 Subcategory Level 2 Camping Tents 3-season 4-season Camp Stoves Backpacking Car Camping Boots Men’s Women’s Clothing Men’s Women’s Backpacks Internal Frame External Frame Kayaks Inflatable Non-inflatable Canoes Inflatable Non-inflatable Hiking Boating Within WebTrends reports, you can interactively click on a given dimension and drill down to the next level. For example, if instead of examining all product categories (Camping, Hiking, and Boating) you only wanted to view information about the Hiking category, you could simply click on the Hiking Product category, and view information about Boots, Clothing, and Backpacks. Within Excel, you can drill as far as you have specified in WebTrends drilldowns. For instance—using the example above—within the Hiking product category, you could drill down three levels, and examine visits to pages in the Internal Frame subcategory of the Backpacks subcategory. Figure 10-4 shows an Excel spreadsheet with categories and subcategories. 178 WebTrends Implementation Guide Figure 10-4. Example of categories—Campaign Drilldown Working with dimensions and measures By exporting to Excel you can add as many dimensions as you like. The measures allow you to group dimensions to get a less fragmented view of the data, but you cannot drill down further than the data that you have captured. For example, as is shown in Figure 10-5, you can capture traffic and revenue information by product SKU (in this case, the model number), and then you can use translation and augmentation (either in WebTrends or in Excel) to group these SKUs into class, subclass, department, family, or other categories. You can calculate actual gross margin by product by importing web analytics data containing revenue by product into Excel and then augmenting that data with external product costs. By determining these patterns, you can target the placement of products on your site for better impact. • Data Integration and Exploration 179 Figure 10-5. Excel with dimensions and measures Data exploration With Excel’s tools, you can choose the exact dimensions and measures you want to compare, and you can discover significant correlations between dimensions. These tools use automated machine learning and statistics to uncover trends, which Excel can present in a variety of graphs, tables, and charts. 180 WebTrends Implementation Guide Data exploration is an iterative process. You will need someone who is adept at statistics and is willing to look at the same data again and again in order to find the nuggets in the data. Figure 10-6 shows an Excel chart with trend data mapping campaigns by sum of gross revenue for December 2003. This is an example of charting data that is calculated in Excel and shown in a graphical format. Figure 10-6. Gross revenue by campaign Figure 10-7 presents another Excel chart of trend data mapping. Note that you can use PivotTable reports to filter the data by group, department, etc., and that this filtering can change the visual representation in the graph. • Data Integration and Exploration 181 Figure 10-7. Sum of gross margin by product Figure 10-8 shows the calculation of Gross Margin Return on Investment for various demand channels. External data such as Marketing Cost Per Click and actual product costs were added to the original WebTrends data and then used to calculate the GMROI. 182 WebTrends Implementation Guide Figure 10-8. Products by demand channel Another data exploration exercise might involve examining relationships between visitor attribute data-income level, zip code, gender and the content groups and ad campaigns visited. To do this, you would have Excel compare each visitor attribute and combination of visitor attributes against content groups, against the combination of content groups and ad campaigns, and then against ad campaigns. But practically speaking, what are the benefits of data exploration? Data exploration can be used to reveal significant trends in customer behavior. For example, with an online travel site, women from zip code 97215 with an annual income of $70K visit the last minute deals pages and respond to e-mail ad campaigns more than any other visitor population segment. Knowing this, you might choose to send out a targeted email for a last minute deal, and then use standard web analysis reporting to see if that e-mail campaign is effective. Overhead and monetary costs Data exploration is much more resource intensive than looking at web analysis data in the standard way. Getting the most results from data exploration requires personnel who can look at all of the possible information that they can mine from your data and understand which correlated segments are worth pursuing. They must thoroughly understand data statistics and data interpretation to make the most of your investment. Another major cost regarding data exploration involves computing power. Data exploration can exhaust computing power very quickly, because you have to do all sorts of cross tabulations of various dimensions to find which ones correlate. • Data Integration and Exploration 183 Your web site does not have to register a million hits to make data exploration cost effective. It’s more about the money attached to your traffic than the total amount of traffic. Data exploration can be a cost effective solution for web sites with a lot of money riding on a small amount of traffic. Data exploration will give you a lot more insight at a higher (and deeper) level, but the exploration involved can be expensive. You may be exploring many avenues before you reach the right one(s) (for example, by using A/B testing); so you’ll need some intelligence to figure out which way to go. Since data exploration is very open-ended, you need to narrow down the many possibilities and achieve meaningful results. Consequently, a data exploration solution for you company doesn’t mean that you merely purchase more software, plug it in and watch your income grow. You will have to look hard at adding the right kind of personnel who will work hard to interpret the data. Using reports for continuous improvement The purpose of reporting on your web site activity is to have easily interpreted information that allows you to make improvements to your site, marketing campaigns, or other aspects of your business that are tied to your web site traffic. Just as in any continuous improvement cycle, you need to determine your objectives for your site, plan how to implement those objectives, execute that plan, then generate reports that allow you to assess the success of that plan. As you discover what works and what doesn’t, you make small, incremental changes. To complete the cycle, you measure the impact of those changes with other comparative reports. 184 WebTrends Implementation Guide Data Integration and Exploration Worksheet Use the following worksheet to help understand more about data integration and exploration. Consideration Yes No Comments Do you have external data that you want connected to web behavior? Can you afford a web data warehouse in terms of costs relating to people, software, hardware, and planning? Will there be compatibility issues if you bring any previous-existing external data into the warehouse? Do you have data that you need to investigate in Excel? Do you have Excel experts who know how to work with PivotTables? • Data Integration and Exploration 185 186 WebTrends Implementation Guide Chapter 11 Optimizing Your Analysis Environment WebTrends can be a very resource-intensive proposition. Besides resource requirements of analysis processing itself, you have storage issues for web data activity files, summary tables, report tables, perhaps a web data warehouse, external databases, IP addresses, and page titles. This chapter discusses the areas of the web analysis process in which you can manipulate the limits your computing resources. It also discusses the trade-offs you make when you limit those resources. At the end of each section, where relevant, recommendations are made for how to handle each analysis environment variable. These are purely recommendations, based on the average web site’s requirements. As you well know, each web site has its own unique characteristics, and for this reason, you need to use your own judgment and experience to adjust these recommendations accordingly. Physical Data Storage Issues Log file rotation/rollover With web analysis that relies on web server logs, the first consideration you must make is how long to hold onto the raw, unaggregated web data activity files. You may need to access old web data activity files to reanalyze them. For example, you might want to reanalyze raw data based on new configuration settings. Or you might need to reanalyze the web data activity file from a server belonging to a cluster that was not available at the original time of analysis and then add that reanalysis into an entire day’s worth of logs. In a web data activity file, a typical hit might range roughly from 250 to 750 bytes in size. Given that number, consider what happens if your site experiences an average of 10,000 hits per day. This means that your web data activity file can be anywhere from 2.5 MB to 7.5 MB • Optimizing Your Analysis Environment 187 in size. If your site experiences up to 5,000,000 hits per day (an amount of web traffic that is not unusual for enterprise-level organizations) your web data activity file size can easily be several gigabytes in size. Evidence shows that for large organizations with extremely active web sites, generating terabytes of data per year is common. Because data activity file sizes for even a daily web data activity file can require gigabytes of storage space, most organizations implement a log file rotation scheme that keeps computing resources available for processing tasks. Depending on the volume of web traffic that your site experiences, you may wish to rotate/rollover web data activity files daily, weekly, or monthly. Note: When IIS servers rollover on a daily basis, they close out one log file and start another at 12:00 am GMT, not at midnight local time. Note: You can review the process of log file rotation/rollover in “Log file rotation/rollover” on page 45. Figure 11-1 shows a basic overview of log file rotation, rollover, and archiving. Figure 11-1. Log file rotation/rollover/archiving 188 WebTrends Implementation Guide Rotation schedules can also depend on how you access your web data activity files, and how often you intend to report on those web data activity files. If you use FTP to access your web data activity files and you generate reports hourly, then you must rotate your web data activity files hourly. Hourly rotation is necessary because in order to run reports, the web data activity file must first be transferred to the local, analysis machine. With a mapped drive, the transfer is not required because to your system, the drive already appears to be local. Therefore, whenever reports are scheduled to run, WebTrends does not need to transfer an entire file, because the file, for all intents and purposes, is local. Typically, organizations rotate their web data activity files daily. Unless you need to generate hourly or more frequently, daily rotation is usually a good rule of thumb. But once you’ve rotated the files out and analyzed them, you need to determine how long to archive them. The length of archival depends on your reasons for holding onto the data. Some organizations don’t intend to ever re-analyze their data, and consequently throw out the data shortly after the analysis. Other organizations hold onto their data forever. For most organizations, a basic rule of thumb is to archive data for a quarter up to one year. Recommendations • Rotate web data activity files daily—yet consider hourly rotation if you access your web data activity files via FTP, and if your site experiences a considerable amount of traffic. • Archive analyzed web data activity files for one year. Storage and performance issues Archiving Occasionally, after analyzing your data, you may need to go back to a point at which you knew the analysis results were in line with what you wanted. Consider this situation: Just recently, you added a new content group to track on your site. This content group contains a group of new pages that relate to a new product. A week later, when reviewing your weekly report, you are dismayed to find that the content group did not make it into your reports. A little sleuthing reveals that improper syntax was used to define the pages of the content group. As a result, all hits to those pages were missed. So what do you do? Hopefully you either configured your software or used some other custom means to periodically create backup copies of your summary tables database along the way. WebTrends software offers the ability to take a snapshot of the database. Depending on what the analysis software is configured to create, the snapshot may include a copy of the daily, weekly, monthly, quarterly, and/or yearly summary tables at a point in time. You can restore that copy in the event that you run into problems with your analysis later on. Once you have reloaded • Optimizing Your Analysis Environment 189 the data up to the last known good copy of it, you will need to fill in the data that was not contained in that backup. This requires you to reload and re-analyze the raw web data activity files for the data from the time of the backup to the most current web data activity file. Let’s go back to the earlier example in which the content group was incorrectly set up. If your web site experiences a significant amount of traffic, and for that reason, each daily web data activity file analysis requires around 10 minutes to run, you might determine that you could afford the time it would take to re-analyze up to twenty-eight days of data at any given time. You also feel that 28 days is enough time to discover any issues considering that you review reports once a week. Your storage capabilities allow you to have four backups of the data. This means that when a fifth backup is created, it replaces the oldest backup. With this situation, a sensible solution could be to back the data up every seven days, and maintain four backups. This allows you to maximize the amount of storage space you have, yet assure that you will catch any problems with the data long before your oldest archive is overwritten. This means that given the following situation (shown in Figure 11-2): Archive 1 Archive 2 New content group with syntax problem added one day after Archive 2 was created Archive 3 Syntax problem discovered three days after Archive 3 was created Figure 11-2. Sample archival scenario You have two options: 1. Correct the syntax for the new content group and re-analyze the data, and then go back and import all the raw web activity data from day one (assuming you still have those web 190 WebTrends Implementation Guide data activity files). 2. Go back to the last known good set of summary tables and then re-analyze the data from that day up to the current day. In this case, you would restore Archive 2, the last archive that contained data without the syntax problem, correct the syntax for the new content group, and then you would re-analyze the raw web data activity file data up to the current day. As you can imagine, creating and maintaining multiple backup copies of an entire database can require substantial storage space on your computer. It’s important to consider the tradeoff between the storage space you have available and how many backup copies you can afford to keep around at any given time. This trade-off is also affected by how long it would take to restore lost data, which in turn is impacted by how much traffic your site experiences, which summary tables you choose to create, and how powerful your system is. How often you may need to backup data also depends on how closely you monitor the results of your data. If you only review results once a day, then creating daily backups, or a backup every couple of days might be fine because you will probably catch any issues within a few days. Recommendations • Check how much disk storage space you have to save the backups versus the average size of a backup. • Determine how long it takes to restore data by analyzing it from the raw web data activity file. This is affected by how much traffic your site generates, which summary tables you choose to create (daily, weekly, monthly, etc.), and how fast your system can process the data. • Figure out how soon you are likely to catch issues that may necessitate restoring a backup by how closely and frequently you monitor your analysis results. Caching uncompressed web data activity files When a web server completes a data activity file, depending on whether it is created on a mapped network drive, or whether you access the file via FTP, that file may or may not be compressed. Typically, if you compress the file before transferring it to a new location. Compressing the web data activity file reduces the amount of storage space required and speeds up the transfer of that web data activity file when it is moved because there is less data to transfer. You should note that data transfers can save a significant amount of time, because web data activity files, being text file entries with many repeated strings (for example, the date, the file extension, URLs, and browser information), are ideal candidates for compression. In many cases, a compressed web data activity file may be less than a tenth of its original size. However, when it’s time to analyze a compressed web data activity file, the file must be • Optimizing Your Analysis Environment 191 uncompressed and placed in a temporary storage location, or cache, that is located on the analysis machine or at least on a drive that is mapped to the local machine so that it appears to be local to the machine. The web data activity files are accessed from this cache during analysis, but at the end of analysis, you need to decide what to do with the uncompressed files. If you suspect that you will run many analyses on the uncompressed files, it makes sense to hold them, in uncompressed form, in the cache. This saves the time required to transfer them to the cache and unzip them. For web data activity files of any significant size, this time savings can add up. On the other hand, if you are fairly certain that you will not use the file again, you don’t need to use space on your machine to save those files. Depending on how your WebTrends software approaches this cache situation, you may have the choice to: • Delete the file from cache upon completion of the analysis. • Keep the file in cache for a specified number of days. • Keep the file in cache until the cache reaches a maximum size, at which point the oldest files in the cache will be replace by new, incoming files. • Keep the file in the cache, but delete it if it is not accessed within a specified period of days. Recommendations • If you do not plan to re-analyze a web data activity file, you can save space on your local machine by choosing to delete it immediately upon completion of analysis. • If you suspect that you will re-analyze your web data activity files, configure your software to maintain the uncompressed version of your files in a local cache for a specified period of time or until the cache reaches a maximum size. Caching files transferred from an FTP server If you are analyzing a web data activity file that you must access using FTP, you will need to physically transfer that web data activity file to a local drive. You can either use your WebTrends software to take care of the web data activity file transfer, or you may set up your own procedure to bring the web data activity files over prior to running the analysis. Once the web data activity file is stored locally, you again have the choice to either use the WebTrends software to unzip the compressed file, or you may set up your own process to take care of this. Either way, once you have a web data activity file on your local drive, you need to decide how long to keep that file there. The same reasoning that you used to make your decision on maintaining cached copies of uncompressed web data activity files can be applied in this situation. It all depends on how often you expect to re-analyze the web data activity file, how much data that web data activity file contains (which affects how long it takes to transfer the web data activity file using FTP), and how much local storage space you can afford to 192 WebTrends Implementation Guide designate for storing web data activity files. And you will likely have the same choices you had when deciding how to handle the web data activity file you accessed via FTP. Namely: • Delete the file from cache upon completion of the analysis. • Keep the file in cache for a specified number of days. • Keep the file in cache until the cache reaches a maximum size, at which point the oldest files in the cache will be replaced by new, incoming files. • Keep the file in the cache, but delete the file if it is not accessed within a specified period of days. Internet resolution When your web server generates a web data activity file, it can either be configured to look up the client machine’s IP address as it creates the web data activity file in a process known as reverse DNS, or it can leave the IP address unresolved. The more efficient approach is to look up the IP address during web data activity file creation; however, because this process (known as Internet resolution) takes some of the server’s resources to perform this lookup, web site content delivery may be negatively affected. For this reason, many web servers are not configured to perform a lookup. The reality is that when reviewing reports about your web visitors, just receiving the IP address of your visitor does not give you much insight. An IP address can’t let you easily see that many of your visitors come from the competition, or that many of your visitors come from a company with whom you are trying to establish more business. IP addresses also affect visitor counts, because multiple IP addresses can resolve to the same domain name. WebTrends software gives you the option to look up IP addresses from DNS servers. Once looked up, these IP addresses are stored in a cache so that future analyses can grab that information locally, rather than having to go through DNS servers to locate the information. You need to determine the value of having IP addresses translated into meaningful names versus the loss of disk space that the cache of resolved addresses occupies. Typically, cached addresses have a maximum size, and when that cache limit is reached, the oldest entries get deleted to make room for the most recent. In addition, you need to weigh the impact on performance that looking up IP addresses will have on your analysis system. • Optimizing Your Analysis Environment 193 Recommendations • Determine how important it is to have the looked up values of IP addresses in your reports. The space required by these looked up values can be fairly minimal, but the performance slowdown can be noticeable. Most people tend to have the lookup performed if the web server did not already do this. • Note that a company may use many IP addresses that are assigned to them but only register a few of these addresses as domains. For example, a company may have many proxy servers with addresses that connect to the Internet, yet since the company doesn’t expect anyone to connect to the proxy, it hasn’t assigned a domain to the proxy. Consider using WebTrends GeoTrends, which will resolve IP addresses more accurately than DNS. That is, GeoTrends identifies the companies that registered the IP addresses. GeoTrends also provides pertinent geographical and demographical information for your web analysis. HTML page title lookups In the web data activity file, requested content is recorded as the URL for that item. The URL could be for a gif or a jpeg image, it could be for a downloaded file, or it could be for a page. WebTrends software can look up the actual page titles that are recorded in the Title tags in each HTML page. However, if you choose this option, you will have to dedicate some space on your hard drive for the results of the page title lookups. Just like the resolved IP address cache, the cache for HTML page title lookups is managed by setting either the maximum number of entries allowed in the cache at any given time, and/or the maximum number of days that a page title can remain in the cache. Again, you have to balance the usefulness of the looked up titles against the cache space they require and the performance hit your system takes during the initial lookup. Recommendation Determine how important it is to have the HTML page titles of the URLs in your reports. The space required by these looked-up values can be fairly minimal, but the performance slowdown can be noticeable. Most people tend to perform the lookup to make reports more meaningful. Note: Web site security can impede or prevent HTML title lookups. You may need to configure a username and password to get the data. 194 WebTrends Implementation Guide Table limiting Your system only has so much physical memory (called random access memory or RAM) in which to store the results of analysis. When data requirements exceed that memory, it has to use virtual memory, exchanging data as needed from RAM to the hard disk and back to RAM. This can create a low performance situation known as thrashing, in which a lot of activity is going on (swapping pages of data in and out of RAM), but little is being accomplished. Unfortunately, there is no perfect solution to the issue of overwhelming your memory with data. However, there are measures you can take to reduce how often your system has to swap data out to the disk. You can add more RAM, which up to a point will increase performance. Yet after you have added 2 GB of RAM there is no additional benefit. Note: Most “normal” computers these days (that is, those with 32-bit processors) can address only 4 GB of memory (that is virtual address space, regardless of how much physical RAM you might have), and they usually divide that 1/2 for user process 1/2 for the operating system. So, 2 GB is a per-process limit. You could put 4 GB (or more) in a machine and two user processes (that is, two programs running simultaneously) can each use 2GB of physical RAM simultaneously. Some of the Windows versions (for example, the higher-end ones, such as Windows 200 Advanced Server) can be configured to provide 3 GB of memory for user processes and 1 GB for the OS. WebTrends can use 3 GB if available. A second approach that may be used by WebTrends software is to make smarter decisions about the data to swap out to RAM. By swapping out those items that most likely will not be needed in the future, the amount of time your system needs to access the hard disk is reduced. Another approach is to limit the amount of data that you store in your summary database tables. The trade-off with this approach is that by limiting the amount of entries in a summary table, you only collect records up to the point that you reach that limit. For example, if you limit the top pages table to 10,000 pages, then data will only be aggregated for the first 10,000 pages entered in the table. Any new pages encountered in the web data activity file after that will not be entered in the table. This means that if your site experiences a great deal of traffic and has 200,000 or 300,000 pages, then limiting it to the top 100,00 will significantly reduce the accuracy of your reports. However, if you were to perhaps limit it to the top 50,000, you might expect to get a reasonably accurate representation of the top pages in your reports. In addition to requiring less storage space in RAM, limiting tables also reduces the time spent inserting data into the database. This time savings is fairly minimal in comparison to the time savings achieved by avoiding swapping data out to the hard disk. Whether you have to limit table sizes depends on three factors: • Optimizing Your Analysis Environment 195 • System processing speed • Amount of RAM • Tables being created (daily, weekly, monthly, quarterly, and/or yearly) System processing speed impacts how long the instructions and data must stay in main memory, while the amount of RAM affects how much data can be kept in main memory at any given time. And finally, the periods for which you have chosen to generate reports determine which tables exist and have data aggregated in them. If you have selected to aggregate data in yearly tables, toward the end of a year, you would be maintaining almost an entire year’s worth of data. Because the summary tables have to be loaded in RAM to aggregate the data, the larger the amount of data, the more likely that you may have to swap out to hard disk. Recommendations • If you trade accuracy for speed, you need to be certain that you really need that report. • Use WebTrends software to limit the number of elements that are fed into the tables. Also, you can limit tables for your custom reports. Performance issues Simultaneous analysis Many web analysis applications are multi-threaded applications, meaning that they can run multiple processes simultaneously. Depending on the number and speed of the processors and memory in your analysis system, you may increase performance by running more than one analysis at a time. Recommendations • Have no more than one simultaneous analysis for each processor in the analysis system. • Each processor should have at least 2 GB of RAM. Scheduling reports and storing reports There are several decisions you have to make about reports. • Which reports to generate – daily, weekly, monthly, or yearly? • How frequently to run an analysis – every five minutes, every ten minutes, or once a day? 196 WebTrends Implementation Guide • How long to keep a given report – do you hold onto each daily report for one month, two months, or longer? • How many elements to store in a report – 100, 2000, or 20,000? Reporting is one of the key elements to consider when deciding how to allocate resources, because the report rendering process itself demands a lot from your system’s performance and after you’ve created those reports, each one requires a fair amount of storage space. Rendering reports is a fairly processing-intensive task. The report engine must first look up all the information requested by the report templates. It must then create tables and graphs that are populated with all the requested information. Depending on the report periods requested (such as daily, monthly, and yearly) your report engine may have one or more different reports to render for each report type. Keep in mind that each stored report can occupy a fair amount of memory—up to 1 MB of memory, for example, for a basic report that comes packaged with WebTrends software. Therefore, always consider the amount of time and resources involved in generating reports. For example, if it takes an hour to generate a complete day’s report and if you did it every hour, it would take more than an hour to generate the report, because of overhead involved in shutting down processes and starting up processes. Your system might also experience thrashing if you generated reports too frequently. Recommendation Many IT departments prune reports to contain only the tables/charts that may be of interest to the particular audience. Culling the reports makes them less daunting, more accessible, and reduces processing time and storage needs. You should track which reports are viewed by business users and then remove those that are never accessed. Maintenance and storage of reports By default, WebTrends copies the top-most elements from the analysis tables to the report database (called On Demand Database). You can increase the number of elements that can be copied, but as you increase this number the performance of the On Demand Database decreases. In general, it is recommended to keep the On Demand Database “trimmed” so you get your reports in a timely manner. WebTrends also allows you to control the number of reports kept over a period of time. You could, for example: • Delete all daily reports that are more than 90 days old. • Keep weekly reports only over the last 52 weeks. • Optimizing Your Analysis Environment 197 • Keep only eight quarterly reports and two yearly reports. By limiting the number of reports to keep in the On Demand Database, you reduce the storage space required. There’s a trade-off between keeping massive amounts of data and maintaining a robust database that generates reports efficiently. Some organizations may find great value in keeping a lot of historical data—no matter what the cost is. Other organizations may find that maintaining daily reports from the previous year to be of little value. It’s a matter of what your organization needs and can afford. Finding the Features in WebTrends Products You will find the topics discussed in this chapter in WebTrends. Archiving Click on Administration > System Management > Backup/Restore > Restore Backup Internet resolution Click on Web Analysis > Options > Analysis >Internet Resolution HTML page title lookups Click on Web Analysis > Options > Analysis > General You will see Retrieve HTML page title. Table limiting Click on Web Analysis > Options > Analysis > Table Limiting Report database Click on Administration > System Management > Data Retention > Report Database Elements in report tables in standard tables Click on Web Analysis > Report Designer > Options > Reports Elements in custom reports Click on Web Analysis > Report Configuration > Custom Reports > Reports > Dimensions 198 WebTrends Implementation Guide Optimizing Worksheet Use the following worksheet to help optimize your analysis environment. Consideration Yes No Comments Do you plan to archive your web data activity files and do you know how long you will keep them archived? Do you have adequate storage space for the archived files? Do you plan to backup analysis data, including summary tables? Do you have adequate storage space for backup data? Do you plan to cache uncompressed web data activity files for re-analysis? Do you plan to use IP address lookup (aka reverse DNS)? Can you improve your system performance if it slows down because of IP address lookup? • Optimizing Your Analysis Environment 199 Consideration Yes Do you plan to look up HTML page titles? Can you improve your system performance if it slows down because of HTML page title lookups? Have you maximized the size of your RAM? Can you limit the size of your summary tables? Can you limit the size of your reports? 200 WebTrends Implementation Guide No Comments Glossary Abandonment Rate For a scenario or multi-step process, the percentage of initiated scenarios that were not completed during the visit. Scenarios can be defined many ways—for example, the entire shopping process, a finite checkout process at an ecommerce site, a registration process at a lead generation site, or a search process at an information site. Acknowledgement Page A page that is displayed after a visitor completes an action or transaction: for example, a Thank-you or Receipt Page. An Acknowledgement Page is often important in Scenario Analysis, where it is an indicator of a completed scenario. Acquisition The process of attracting a visitor to your web site. Activity Ad A general term referring to nearly any site measurable, including visits, hits, visitors, and viewing time. A link, usually commercial in nature, consisting of a graphic or text that takes a visitor to a web site when clicked on. An abbreviation for “advertisement.” Ad Campaign A specific effort to attract visitors to your site through ads. It may be one individual ad or a coordinated set of ads treated as one entity for reporting purposes. On the web, ad campaigns usually consist of e-mails, graphics on other sites or on a wireless interactive appliance, and traditional media such as direct mail, print, broadcast, outdoor advertising, etc. In WebTrends, ad campaigns are set up by the reporting administrator with a unique URL/landing page, a starting date, an ending date, and a cost. Same as Campaign and Marketing Campaign. • Glossary 201 Ad Click A click on an ad resulting in a jump to the site being advertised. Ad View A display of an ad on a page that is viewed during a visit. There may be more than one ad view on a page. Address An Internet term loosely referring to the location of a web site or web page on the Internet or the Web. Or, more specifically, an identifier for a specific computer that is connected to the Internet. Aggregate Combining data of two or more dimensions in a report. For example, adding up all Departments to get Total Division data. While such combinations are normally sums, any type of formula might be used. Authenticated User A visitor who used a username-password login process to get access to all or part of a web site. The username (but not the password) is captured in a specific field in web site log files or through client-side data collection tags. Since it is possible for many different unique visitors to have the same IP address, authenticated username is perhaps the most accurate way to count unique visitors. You may find more authenticated user names than total visitors because several persons may be using the same IP address; this is particularly common on corporate Intranets where a large number of visitors are sharing a smaller pool of IP addresses. Authentication Technique that limits access to Internet or intranet resources to visitors who identify themselves by entering a user name and password. Average A statistical term referring to the sum of a measure divided by the number of items measured. For example, for a series of 11 visits consisting of 3, 7, 7, 7, 8, 10, 15, 22, 25, 25, and 35 page views each, the average number of page views is 14.9 (total 164 divided by 11), the median is 10 (the 6th in the series of 11) and the mode is 7. In statistics, average is also called the mean. 202 WebTrends Implementation Guide Average Frequency The average of the frequencies of all the visitors during the reporting period, where each visitor’s frequency is the number of times they have visited the site since WebTrends visitor tracking began. Average Latency The average of the latencies of all the visitors during the reporting period, where each visitor’s latency is the average elapsed time, in days, between all their visits since WebTrends visitor tracking began. Average Lifetime Value The average of the lifetime values of all the visitors during the reporting period, where each visitor’s lifetime value is the total monetary value of a visitor’s past orders since WebTrends visitor tracking began. Average Recency The average of the recency values of all the visitors during the reporting period, where each visitor’s recency is the averaged elapsed time, in days, since their last visit. Banner, Banner Ad An online advertisement, usually a graphic, which can be anywhere on a web page but typically refers to a horizontally elongated graphic of significant size located at the top or bottom of a web page. Bookmark In a browser, a shortcut to a web site page that is created by the visitor to allow a quick one-click return to the page in the future. Bookmarks are called “Favorites” in some browsers. Visitors arriving at a site by clicking on a bookmark will appear as a “Direct Traffic” entry in Referrers reports. Browser A program - such as Microsoft Internet Explorer and Netscape - used to locate and view web pages as well as to follow hyperlinks. The Browser is identified in the “Agent” or “User Agent” field of a web site log or through standard clientside data collection tags. Campaign A specific advertising effort to attract visitors to your site. A campaign may be one individual ad or a coordinated set of ads treated as one entity for reporting • Glossary 203 purposes. For online channels, campaigns usually consist of e-mails, graphics on another site or on a wireless interactive appliance, and traditional media such as direct mail, print, broadcast, outdoor advertising, etc. In WebTrends, campaigns are set up by the reporting administrator with a unique URL/landing page, a starting date, an ending date, and a cost. Same as Ad Campaign and Marketing Campaign. Campaign Creative A “creative” describes the characteristics of a marketing activity, such as color, size and messaging; for example, a “Buy Now” graphic. These creative elements are used to encourage clickthrough to the web site. Campaign Creative is a level within the drilldown categorization scheme set up by the WebTrends administrator, which allows for reporting on groups of campaigns in a way that is meaningful to the report users. Campaign Drilldown In certain WebTrends reports, a drill-down feature allows the user to navigate from a highly summarized level of data to successively more detailed levels of data, organized along a concept hierarchy. With Campaign Drilldown, users can examine visits, page views, revenue, average order size, and more, by Campaign Partner, Demand Channel, Marketing Program, Marketing Activity, Campaign Name, Campaign Creative, Campaign Offer, and other campaign attributes. Campaign ID A unique campaign identifier used to calculate campaign success, cost, etc., which may involve several different marketing activities, or a single effort. Campaign ID is a level within the drilldown categorization scheme set up by the WebTrends administrator, which allows for reporting on groups of campaigns in a way that is meaningful to the report users. Campaign Type This is a user-defined category, which might include online banner ads, emarketing newsletters, and direct mail campaigns. Campaign Type is a level within the drilldown categorization scheme set up by the WebTrends administrator, which allows for reporting on groups of campaigns in a way that is meaningful to the report users. Checkout Page The page or series of pages viewed when a visitor goes through the process of buying something online. 204 WebTrends Implementation Guide Child Profile WebTrends can use Child Profiles to report on a web site that shares a log file with other unrelated sites due to a constraint or choice by a hosting provider. Child profiles can be helpful if an ISP or web hosting service hosts multiple customer sites on their web servers. To a web site visitor, a customer’s site can appear as a distinct, stand-alone domain, but often the web activity data for each customer site is recorded and lumped together in the service provider’s main web server log file. If service providers want to offer their customers a set of basic web activity reports with data specific to each customer’s site, they need a means of breaking out data by customer. Because service providers also want to reduce management and maintenance of this data splitting process, they want WebTrends to auto-discover and split out these data subsets while parsing the log file. Parent-Child profiles provide this auto-discovery functionality, and also creates profiles, called Child profiles, for these data subsets. Click The act of activating a hyperlink, usually by physically pressing down (clicking) on a mouse button when the cursor is over a link on a page. In Web advertising, a click is an instance of a user activating an advertising link to go to an advertiser’s web site or page. Click-through-Rate The number of clicks on an ad as a percentage of the total views of the ad during the reporting period. Client A computer (or software on a computer) that accesses resources provided by another computer, called a server. Client Errors An error occurring due to an invalid request by the visitor's browser. Client errors are in the 400 range (see Status Code on page 227 for a list). Client-side Data Collection An alternative to traditional web server log file analysis that involves collecting data directly from the visitor's browser (the client) rather than from server log files, improving data accuracy. Special script in a page’s source code is used to • Glossary 205 transmit page-level data, not “hit-level” data, to a data collection server, dramatically reducing data volume and decreasing processing time. Client-side data collection obtains more accurate information than log files do—by accurately tracking visitor activity normally hidden by browser’s local cache and proxy and caching servers like those used with an AOL account—as well as by collecting extra, customized data not included in normal web server log files. Accuracy is also improved since spiders do not trigger client-side tags; with log files, spiders can appear to be “real” visitors unless their activity is filtered out. However, client-side methods provide no information on server technical performance or bandwidth use. WebTrends’ proprietary client-side data collection technology is called SmartSource. Combined Log File Format A basic (“common”) log file with two additional fields, the Referrer and User Agent fields. Also referred to as Extended Log File Format. Content Group An administrator-defined group of one or more web pages that is treated as one entity in certain reports such as Content Groups and Content Paths. Content Groups are created by a WebTrends administrator to group pages according to similarities that are meaningful in the context of your web site. Content Path A consecutive sequence of two or more Content Groups viewed during a visit. Conversion, Conversion Rate The percent of a group (of visits or visitors) that took a specific action of interest. The term Conversion can apply to any type of action a web site wants its visitors to perform, and any type of goal or mission a visitor wants to complete on the site. Conversion can encompass the entire visit population, such as the percent of all visits that involved a completed registration. Conversion can also refer to a very small and precise action, such as the percent of people at step 3 of a scenario who continued to step 4; or it can apply to a subpopulation, such as the percent of knowledgebase searches that result in issue resolution. Cookie 206 When a user’s browser requests a page from a web site server, the server often returns a cookie, a small text file sent to a browser by a web site to be stored locally. WebTrends Implementation Guide In its simplest form, this text file usually contains a long unique string of characters that helps the web site recognize that visitor when he/she makes subsequent page requests. One purpose of a cookie is to let the server keep track of important information through the course of a visit, such as the items added to a shopping cart by a visitor. Without a cookie, many online transactions would not be possible because the web site would not be able to associate information entered on the shipping address page with information entered on the payment page, as one example. The browser user controls whether a browser accepts cookies or not. If the browser is set to accept cookies, WebTrends uses the cookie character string to divide the mass of page views into individual visits. If a cookie is the persistent type that is stored on the client’s hard disk, WebTrends also uses the cookie to define a visitor as either first-time or returning. WebTrends can also use the cookie to associate previous visits with a particular visitor in order to report on past purchases, lifetime value, or past responses to campaigns. Custom Filter A hit or visit filter created in the Custom Reports section of the WebTrends Admin Console. Custom filters can be a variation of a filter already in use or can be completely new, based on a variety of hit or visit characteristics. Visit-related custom filters are especially powerful, allowing the inclusion or exclusion of entire visits depending on whether a specific page was viewed at any point in the visit. Dashboard A customizable WebTrends report consisting of summary information—usually graphs—from individual WebTrends reports in a profile, all grouped on one page. Dashboards provide a quick overview of key information for individuals, departments and specific roles. Data Source Splitter (DSS) A WebTrends feature allowing several profiles to use the same set of log files more efficiently rather than having to create separate profiles in the standard WebTrends manner. An organization with several virtual domains all served by the same set of web servers, and all logging to the same set of log files would be a candidate for using DSS. Another would be a hosting provider with several different domains logging to the same log files on the same servers. DSS allows an administrator to create profiles for each of the virtual domains, which splits the log files into smaller logs based on the domain names, so that domain- • Glossary 207 specific profiles can be run on the smaller logs. Destination Page A destination page is an administrator-specified page used in Destination Paths reports as the page to which all the analyzed paths lead. Dimension Elements or categories being reported on in a WebTrends report. A dimension usually does not have a numerical value; for example Pages and Content Groups. They are statistically described using Measures—which do have a numeric value—such as visits, views, view time, etc. In WebTrends reports, the dimension is the first column or the first two columns if both a Primary and Secondary dimension are used. Dimensions are also presented in drill-down format in some WebTrends reports. Directory A web site is made of files that are usually grouped in buckets of similar files, such as all product pages, or all Human Resources pages. In a complex web site, buckets can contain smaller buckets, such as Human Resources procedures pages and Human Resources job listings, and the levels of buckets can go quite deep. The buckets, which may or may not have names that clearly indicate their contents, are called Directories. The smaller buckets within a bucket are called SubDirectories. This categorization is often reflected in the “address” of a web page, which includes not only the name of the page (joblistings.html), but also the series of buckets it belongs in separated by slashes (/international-companyinfo/USA-company-info/USA-human-resources/). WebTrends uses the Directories concept two ways. First, it is possible to use a Directory to filter (exclude or include) page views by specifying directories to include or exclude. Second, a Directories report tallies the activity in individual directories. DNS Lookup (Domain Name Service Lookup) The process of converting a numeric IP address into a text domain name. For example, DNS Lookup will convert the IP address 255.255.255.255 to the domain name YourDomain.com. DNS Lookup can be turned on and off by the WebTrends administrator. “DNS” refers to Domain Name Server. DNS Lookup is also called IP Resolution and Domain Name Lookup. Documents A legacy term referring to pages that were defined as “documents” by the system 208 WebTrends Implementation Guide administrator. Traditionally, a page is a document if the content is static, such as an HTML page. Domain Name The text name corresponding to the IP address of a computer on the Internet. For example, netiq.com is a domain name. A domain can be associated with many IP addresses but an IP address can have only one domain. Domain Type A broad categorization of domain names identified by the suffix, such as .edu (for domains related to educational institutions), .com (for domains related to commercial web sites), .org (for domains related to non-profit organizations), .gov (for domains related to governments), and many others. The domain type does not necessarily reflect the true nature of the web site, as domain suffixes are only loosely regulated, if at all. Drill Down In certain WebTrends reports, the drill-down feature allows the user to navigate from a highly summarized level of data to more detailed levels of data, organized along a concept hierarchy. On a web site, “drilling down” is the act of going further down a branch of the site in search of more detailed information. Often, drilling down results in seeing a series of different navigation bars, each appropriate to its own level. DSS See Data Source Splitter on page 207. Dynamic Page A page that is created by the web server from a template, or a general page structure, which is filled in with content pulled from a database. Servers “build” dynamic pages from particular components according to requests they receive from browsers. The URLs of dynamic pages typically consist of the template name, followed by a question mark, followed by the content for the displayed page as a series of text strings separated by ampersands in the format “parameter=parametervalue”. For example, a page showing a blue Empire couch might be “/product.asp?item=couch &type=Empire&color=blue.” The parameters can be of great interest in web analytics, when shown as tabulated summaries of views of couches, Empire items, • Glossary 209 and blue items, or combinations of these. Entry The first page, file, or content group in a visit. Entry File The first file requested in a visit. A visit has one and only one entry file. Files may be of any type, including a page file. Entry Page The first page requested in a visit. A visit has one and only one entry page. Note that a visit will have no pages if it doesn’t include a page file. Entry-Exit Page A page view that is both the entry and the exit page; the only page in a SinglePage Visit. Exit Page The last page viewed in a visit. File A collection of information stored under a unique name, often in the form “name.extension” where the extension identifies the type of file and, usually implies what kind of program can open or view it. On the Web, common types of files are: page files (.htm, .asp, .jsp, .cfm, etc.), image files (.gif, .jpg, .png, etc.), applet files (.js, among others), non-page document files (.doc, .txt, .pdf, etc.), and style files (.css, among others). While a page file is technically different from a page (see Page on page 217), a page will always includes a page file. File Type Corresponds to a file’s extension. For example, a file named graphic.gif is identified as type “gif.” Filter A setting in WebTrends that instructs the program to exclude or include (to the exclusion of all else) certain visits or hits from the analysis. In WebTrends, filters can be used individually or in groups, and individual filters can be combinations of different subparts. First-Time Buyer A visitor who has made his or her first purchase. Also called New Buyer. 210 WebTrends Implementation Guide Forms Scripted pages that pass variables back to the server. These pages are used to submit information entered by visitors in the form’s fields. Frequency The number of times a visitor has visited a site since tracking with persistent cookies and Visitor History began. Average Frequency is the average of the frequencies of all the visitors during the reporting period. Frequency is a retention metric and is part of RFM (recency, frequency, monetary) analysis. If visitors did not visit the site during the report time period, their frequency is not included. FTP Funnel File Transfer Protocol. A standard method of sending files from one computer to another over the Internet. A profile of increasing attrition that happens as site visitors go through a scenario, or a series of defined steps such as a purchase, an information hunt, or a registration on a web site. Because the number of people participating in each step is usually smaller than the step before, a graph of the declining participation, when mirrored, resembles a funnel. Geography Drilldown In certain WebTrends reports, a drill-down feature allows the user to navigate from a highly summarized level of data to successively more detailed levels of data, organized along a concept hierarchy. With geography drilldown, users can examine activity by areas of visitor origination, for example, viewing visits, page views, revenue, or average order size, or viewing by Region, Country, State/ Province, or City. GeoTrends Database The optional GeoTrends Database resolves IP addresses of visitors into more meaningful data such as the region, country, state/province, city, area code, designated marketing area, metropolitan statistical area, and time zone data corresponding to the location of the owner of a specific domain name. In the specific case of AOL IPs, location is resolved to geographic regions served by AOL as opposed to the location of AOL in the state of Virginia. GeoTrends Database replaces the older WebTrends’ Company Database. • Glossary 211 GIF Hit A graphics file format and file extension (*.gif) commonly used on web pages, referring to Graphics Interchange Format. A request for a file by a browser. Since “file” refers to images, styles, and many other elements besides .html pages, a single web site page view can involve dozens of hits. Because the number of hits is so heavily influenced by the complexity of a page, hits are a far less helpful measure of site traffic than visits or visitors. The hits statistic is somewhat useful in assessing the load experienced by a web server. WebTrends SmartSource Tags do not capture hit-level data. Homepage The main or introductory page of a web site, usually designed with the expectation that it is the first page a visitor sees. It is also the default page that is sent in response to a request containing only the domain name. Homepage URL The URL for the homepage of the site analyzed in the report. The homepage URL is specified during WebTrends setup in order to help WebTrends consolidate hits to several versions of the homepage, for example, flash- and non-flashversions or framed and frameless versions. HTML HTTP The abbreviation for Hypertext Markup Language, which is used to format text files so that web browsers can display text with appropriate hyperlinks, font sizes, and other text formatting. The abbreviation for Hypertext Transfer Protocol, a standard method of transferring data between a web server and a web browser. It is the text string that appears at the beginning of web addresses, and it informs a browser that the request is for a web page as opposed to an FTP site or another type of browser destination. Instrumented Web Page A web page that contains a WebTrends SmartSource Tag. The SmartSource Tag does two things. First, it transmits traffic data (similar to that in a standard IIS or Solaris log) to the WebTrends SmartSource Data Collector for processing into 212 WebTrends Implementation Guide reports. Second, if set up to do so, it also collects and transmits a wide variety of optional extra data to the same Data Collector. IP Address A numeric phrase used to identify a computer connected to the Internet. IP addresses consist of four one-to-three-digit numbers separated by periods, for example, 212.6.125.76. WebTrends allows filtering activity coming from a specific IP address or range of addresses. JavaScript Tag A script (JavaScript or sometimes VBScript) that can be added to the code of a web page to capture information about a visit to that web page (for example, IP of visitor, time of day, name of page, parameters, etc.) and send it to a data collection server such as WebTrends’ SmartSource Data Collector. JPEG Jump An abbreviation for Joint Photographic Expert Group, referring to a compressed graphics format common on the Internet. Also called JPG. Navigation or moving from one page to another using a link. Landing Page A page on a web site—which may or may not be the home page—where the visitor arrives. For example, in an email campaign, you would use a landing page as the page to which the email directs the prospect via a link. Latency The average number of days between visits for a given visitor since tracking with persistent cookies and Visitor History began; for example, those who visit on average every 7 days. For a given visitor, a lapse of 12 days between the first and second visit, and a lapse of 24 days between the second and third visit, equals a latency of 18 days. Note that a zero latency means the average time between visits is less than 24 hours. If visitors did not visit the site during the report time period, their latency is not included.) Lifetime Value The total monetary value of a visitor’s past orders since tracking with persistent cookies and Visitor History began. Average Lifetime Value is the average of all the Lifetime Values of the visitors who visit the site during a reporting period. If • Glossary 213 visitors did not visit the site during the report time period, their Lifetime Value is not included. Link On a web page, text or an image that has been coded to take a browser from one page to another, or from one site to another. Log File A file on a web server that contains records of activity related to requests for site content from browsers, spiders, and other outside entities. Log File URL The full address, including network ID, drive and directories, of the web server log files that are to be analyzed in a profile. Loyal Visitor A visitor who visits a site relatively frequently. LTV Same as Lifetime Value; see page 213. Marketing Campaign A specific effort to attract visitors to your site. It may be one individual ad or a coordinated set of ads treated as one entity for reporting purposes. In the web world, marketing campaigns usually consist of e-mails, graphics on another site or on a wireless interactive appliance, and traditional media such as direct mail, print, broadcast, outdoor advertising, etc. In WebTrends, campaigns are set up by the reporting administrator with a unique URL/landing page, a starting date, an ending date, and a cost. Same as Campaign and Ad Campaign. Mean A statistical term referring to sum of a measure divided by the number of items measured. Also called the average. For example, for a series of 11 visits consisting of 3, 7, 7, 7, 8, 10, 15, 22, 25, 25, and 35 page views each, the mean number of page views is 14.9 (total 164 divided by 11), the median is 10 (the 6th in the series of 11) and the mode is 7. Measures Quantities being reported on in a WebTrends report. Measures are quantitative in nature and appear in WebTrends reports as columns to the right of the Dimension column(s), statistically describing them. In Custom Reports, the 214 WebTrends Implementation Guide WebTrends administrator can define and use a wide variety of Measures. Median Mode A statistic used as an alternative to Average. In a collection of numbers that have been ordered by size, the Median is the middle value. It is smaller than exactly half of the numbers and larger than the other half of the numbers. The Median is less distorted by extreme numbers than is the Average. For example, for a series of 11 visits consisting of 3, 7, 7, 7, 8, 10, 15, 22, 25, 25, and 35 page views each, the median is 10 in this series (the 6th in the series of 11). The average is 14.9 and the mode is 7. For an even numbered series, such as 12 visits, the median is the average of the middle two numbers. A statistic used as an alternative to Average. In a collection of numbers, it is the number that appears most often. For example, for a series of 11 visits consisting of 3, 7, 7, 7, 8, 10, 15, 22, 25, 25, and 35 page view each, the mode is 7. The median is 10 in this series (the 6th in the series of 11), and the average is 14.9. Monetary Value The total value of a visitor’s past orders or transactions since tracking with persistent cookies and Visitor History began. Same as Lifetime Value. Average Monetary Value is the average of all the Lifetime Values of the visitors during a reporting period. If visitors did not visit the site during the report time period, their Monetary Value is not included. Most Recent Campaign The last campaign that a visitor responded to since tracking with persistent cookies and Visitor History began. For the report time period selected, all conversions and other activity are tracked and attributed to visitors’ most recent campaigns. Only those most recent campaigns whose durations have not expired are included, and the report administrator sets this expiration. Thus, even if the conversion does not happen on the first visit generated by the most recent campaign, the appropriate source is “credited” with the conversion. If visitors do not visit the site during the report time period, their most recent campaign is not included. Multi-Homed Domain The domain name or IP address of one of the sites in multi-homed log file. You can report on a single domain using the Multi-Homed Domain Filter. • Glossary 215 Multi-Homed Log File A single log file that contains the access information for multiple web sites. To specify which domains are analyzed in this type of file, use the Multi-homed Domain Filter. Multi-homed Web Server A single server that hosts more than one web site. Multi-Page Visit A visit in which more than one page was viewed. In other words, any visit that is not a single-page visit. Navigation The act of moving from location to location within a web site, or between web sites, accomplished by clicking on links. Navigation also can refer to the overall structure of the links on the site, comprising the paths available to the visitor. New Visitor A visitor who has never been to the site since tracking with WebTrends and persistent cookies began. New visitors are identifiable only on sites that give out persistent cookies. WebTrends identifies visitors as new visitors if they have no site cookie when they arrive, and they are able to accept a cookie for their subsequent page views. If they already have a site cookie when they arrive, they must have been to the site before. In a log file, a new visitor’s first page view has no cookie, but all other page views do. It’s important to realize that “never been to the site before” can be evaluated only for the time period during which the persistent cookie has been given out. In fact, when a persistent cookie is first implemented, all visitors appear to be first-time visitors. Visitors whose browsers do not accept cookies appear as “unknown” in reports that display new and returning visitors. No Referrer A line item in the Referrers reports that pertains to visits that have no known referring site, domain, or URL. Usually, this means that visitors arrived at your site by typing the URL of your site into their browser address window, they used a bookmark, or they clicked on a link in an e-mail. If “No Referrer” is the only line in a Referrers report, this usually means the Referrer field is not used in your traffic logging. 216 WebTrends Implementation Guide Order A purchase consisting of one or more items. Order Count The number of completed purchases. Order Quantity The number of items purchased in an individual order. Order Value The monetary amount of an order. Organic Search Phrase A search phrase for which your site shows up on result pages, because of the search engine’s method of ranking pages as opposed to paid placement. Other This is a term appearing at the bottom of WebTrends report tables for any table that spans several pages. In these situations, “other” refers to table line items that appear on the other pages of the table, whether before or after the portion of the table being viewed. WebTrends uses the “other” quantity to indicate the proportion of the total picture that is the viewable part of the list. Paid Search Phrase A search phrase for which your site shows up on result pages due to paid placement with the search engine as opposed to its method of ranking pages (Organic). Page Same as “web page.” In terms of a web site visitor’s experience, a page is a unit of site content, often resembling a paper page of indefinite length and width, that has a single URL address. What the visitor sees as a “page” is usually a collection of files, always including one page file (.htm, .jsp, .asp, .cfm , etc.), plus, depending on the page, image files (.gif, .jpg, .png, etc.), style files (.css, among others), applet files (.js, among others), and a variety of other types of files. In WebTrends default settings, a page is technically defined as a file with the following extensions: .htm, .asp, .jsp, .cfm, etc. This technical definition can be modified by the administrator to include or exclude any file extension. Page View Technically, a page that is displayed by a browser. This term is often used loosely • Glossary 217 to also include page files that are delivered to a browser, whether or not they are displayed on the screen. An example of a Page View that is not actually displayed is a Redirect Page. Palm Browser A program used on a Palm device to display site content, similar to Netscape or Internet Explorer on PCs. Palm Device A portable personal computer small enough to fit in the palm of a person’s hand, specifically those made by the company Palm and using the Palm operating system. Parameter Parameters are located in the URL immediately after a question mark and are followed by an equal sign and a return value, known as name=value pairs. For example in the following URL, (/products/furniture.asp?cart_id=445& product=couch), there are two parameters: cart_id is the name and 445 is the value, and product is the name and couch is the value. When URLs contain more than one parameter value name=value pairs are separated by the “&” symbol. Parent-Child Profiles A specialized way of setting up profiles for different web sites that share servers and log files. Setting up a Parent-Child arrangement automates the creation of profiles and reports on a number of domains or subdomains from a single log file. New domains or subdomains automatically generate new profiles. Path The sequence of all pages viewed during a visit, or any portion of that sequence. In WebTrends reports, paths either have a designated starting point (the visit entry page or a designated path start page) or a designated end point (“destination page”); or, paths are Top Paths, which, regardless of specific start page or end point, are common routes through the site. Technically, any visit contains many paths, each consisting of two or more sequential page views. Paths can also refer to content group paths instead of paths consisting of individual pages. The length of paths tracked is either determined by the number of pages viewed, or by the path analysis length limit if the number of pages viewed is greater than the limit. 218 WebTrends Implementation Guide Path Analysis A report displaying and quantifying paths that fit the criteria set up by the WebTrends administrator including a starting point or an ending point (destination), and a path analysis length limit. Path of Interest Describes a concept and practice of focusing path analyses on a particular area of interest. With WebTrends this is typically done with Destination Paths and Paths From Starting Page reports, though technically Top Paths and Paths From Entry are also paths of interest. Percent Change In a comparative date range display, a positive or negative percentage that indicates the size of the increase or decrease between the first and second date range. A value of 100% indicates that the second date range’s value is twice that of the first date range’s value; that is, 100% more than the first value. Percent change is calculated by subtracting the first date range’s value from the second date range’s value and dividing the result by the value of the first. Persistent Cookie A cookie that lasts longer than the duration of a visit and is saved in the Cookie folder of a browser’s computer. It is used by WebTrends to distinguish new from returning visitors among other things. Platform The operating system, such as Linux or Windows, used by the visitor’s computer. Product A specific good or service that is sold or displayed on a web site. Product Group This is the highest-level categorization of products used in product drilldowns, for example Electronics. The WebTrends administrator defines levels used in the categorization scheme to allow reporting on groups of products in a way that is meaningful to the report users. Profile This is a collection of WebTrends report settings and definitions used to generate, analyze and distribute the set of reports. It is integral to producing WebTrends reports. The characteristics of a Profile include the location of the • Glossary 219 log files and specific information about their content that will be used in analysis, such as which page URLs are to be assigned to Content Groups and which page URLs are to be starting pages for path analysis. When specified in conjunction with a Template, the Profile determines a complete report configuration that can be analyzed. A Profile can have several templates, just as a template can be applied to many Profiles. A web site can have one or many Profiles and templates. Protocol An established method of exchanging data over the Internet. Psychographics Used to build customer segments based on attitudes, values, beliefs and opinions as opposed to the “factual” characteristics of demographics. Political views, learning patterns or music tastes would qualify for psychographic segmentation. Marketing research usually combines demographic and psychographic information to build a more comprehensive understanding of customers. Because the Internet is still a relatively new and evolving medium, one which the mass market is still getting used to and whose usage patterns are determined both by levels of Web experience and type of person, psychographics are of great interest for the Web. The ability of an online broker to convert browsers to online traders, for example, will depend to a large degree on the type of person using the site: are they confident people who like to ‘give things a go’ or are they risk-averse followers of the masses? Psychographic segments built on attitudinal and behavioral characteristics will often be good indicators of how customers will use and react to a web site. Purchase A completed transaction involving an exchange of money for a product, service, privilege, or other item. Purchase Conversion Funnel A specific kind of scenario analysis consisting of steps leading to online purchases. The steps of the scenario are designated by the WebTrends administrator. Query Parameter An individual piece of a query string consisting of a parameter name and a value for the parameter. 220 WebTrends Implementation Guide Query String The part of a URL that contains information about the content of a dynamically generated page. Web servers use this information to retrieve the specified content from a database and combine it with a template to display a page. A Query String can also contain information that is not directly used to construct a page, but which is intended for use in reporting or other functions. WebTrends’ SmartSource SDC tagging is often used to insert valuable reporting information into the query string. In many dynamic URLs, the Query String is the part of the URL that follows a question mark. Recency The number of days since a visitor’s most recent visit since tracking with persistent cookies and Visitor History began. Zero recency refers to a visit in the preceding 24 hours. Average Recency is the average of the recency of all visitors during the reporting period. If visitors did not visit the site during the report time period, their Recency is not included. Redirect Page A web page that is coded to take the visitor’s browser to another page automatically and usually immediately. Many redirects are instantaneous and the visitor does not see the redirect page. Some have time delays and allow the visitor to see the redirect page for a certain number of seconds. Redirects are used to help track clicks that go off site, or to an executable, downloadable, or other file that cannot normally be logged. Referrer A web domain, site, or page that contains a link to one of your site pages that was used by a visitor to get to your site. Referring Domain A web domain that contains a link to one of your site pages, used by a visitor to get to your site. For example, yahoo.com. Referring URL The URL of a specific page on a site that contains a link to one of your site pages that was used by a visitor to get to your site. Registration Conversion Funnel A specific kind of scenario analysis comprised of steps leading to online registration. The word “funnel” refers to the typical attrition of visitors from one step • Glossary 221 to the next. The steps of the scenario are designated by the WebTrends administrator. Repeat Buyers Visitors who bought something during the reporting period and are known to have bought something previously as well. Use persistent cookies to track Repeat Buyers. If buyers have cookie parameters for purchases from your site dating from their purchases during the reporting period, they are repeat buyers. Visitors whose browsers do not accept cookies appear as “unknown” in reports that display first-time vs. repeat buyers. Returning Visitors Visitors who have been to your site before. Returning visitors are identifiable only on sites that give out persistent cookies. WebTrends identifies visitors as repeat visitors if they have a cookie from your site dating from before their first visit during the reporting period. Visitors whose browsers do not accept cookies appear as “unknown” in reports that display new and repeat visitors. Report A term loosely applied to graphs and a table associated with an individual analysis, or the collection of all such reports resulting from the analysis of a given profile and template. Report Period, Reporting Period The dates covered by the data displayed in a report. WebTrends users may select a report period of any day, week, month, quarter, or year, or a custom date range and can switch between date ranges as desired. Report Templates A set of report characteristics consisting of content, the content’s order of appearance, graphic type specification, style, format, language, and other settings which determine the form and content of a finished report. A given profile can have many templates assigned to it, and the report user can view different templates depending on permissions in place. Likewise, a given template can be assigned to many different profiles. Request A signal from a browser to a server that asks the server to send a specific file to the browser. The request, plus some details about the server’s response to the request, is recorded as a line in a log file. Although “GET” in a log file is usually 222 WebTrends Implementation Guide thought of as a “request,” both “POST” and “GET” methods are requests. Resolve With respect to IP addresses, indicates success in identifying and displaying a text domain name for a numeric IP address. Retention How well a site draws visitors back for more visits. Alternatively, a measure of the effectiveness of a source of visitors (a campaign, a search engine, individual keywords on a search engine, an affiliate site, etc.) measured in terms of Recency and Frequency of visitors who were originally introduced to the site by that source. Return Code A code in the “status” field of a log file that identifies the success, failure, and other characteristics of a transfer of data from a server to a browser. Also called Status Code. See Status Code page 227 entry for a full list of all error codes. Returning Visitors Visitors who have been to your site before. Returning visitors are identifiable only on sites that give out persistent cookies. WebTrends identifies visitors as returning visitors if they have a cookie from your site dating from before their first visit during the reporting period. Visitors whose browsers do not accept cookies appear as “unknown” in reports that display new and returning visitors. Reverse Path A path that ends at a designated page, called the destination page in WebTrends reports. Reverse indicates “backing up” from a certain page to examine how visitors arrived there. RFM A group of measures, made up of Recency, Frequency, and Monetary Value, which are useful for segmenting customers for marketing purposes. RFM analysis is a marketing technique used to determine quantitatively which customers are the best ones by examining how recently a customer has purchased (recency), how often they purchase (frequency), and how much the customer spends (monetary value). Requires use of persistent cookies and Visitor History. If visitors did not visit the site during the report time period, • Glossary 223 their RFM is not included. Scenario A series of two or more pages on a web site that can be treated as a kind of process or logical sequence, such as the process of making a purchase (the checkout process), the process of signing up for a newsletter (the signup or registration process), the process of using a gift finder, and so on. While a scenario by definition has a series of ordered steps, it is possible for visitors to start processes mid-scenario, such as a campaign that directs visitors to step 2 of the scenario. New scenario visualization capabilities show visitor progress through scenarios, as well as the origin of visits entering scenarios midway and where visitors went after leaving the scenario. Scenarios are defined by the WebTrends administrator. Scenario Analysis A report showing the amount of activity at each step of a defined scenario, plus conversion rates for each transition from step to step as well as for the whole process. Examples of scenarios are check-out, registration, or application sequences. New scenario visualization capabilities show visitor progress through scenarios, as well as the origin of visits entering scenarios midway and where visitors went after leaving the scenario. Scenario Conversion Rate The percentage of scenarios started in relation to those that were completed. Script A simple programming language used to execute tasks. Scripts are often used for pages on the Internet to serve dynamic content and to tailor pages for individual visitors. Search Engine Keywords A single word within a search phrase, or a search word used by itself. In the phrase “cordless phone” the individual keywords are “cordless” and “phone.” Also called “search keyword.” Search Engine Phrase All the words used in a search. In the phrase “cordless phone” the phrase is “cordless phone,” and in the search “phone” the phrase is “phone.” Also called “search phrase.” 224 WebTrends Implementation Guide Search Engine A web site that enables users to search for web pages throughout the Internet by entering keywords. Search Engine Marketing The art and science of increasing a web site’s visibility and traffic by being listed favorably on search engines for a defined set of keywords and phrases through paid and optimization tactics. Search Engine Optimization The art and science of optimizing your web site to improve the “natural” listing or ranking your site receives from search engines for certain keywords and phrases. Often referred to as SEO. Server A computer that stores a web site and interacts with browsers to send (“serve”) web pages and other files associated with the web site. Server Errors A server error occurs at the web server and receives an error code in the 500 range. Below are examples of some of the most commonly experienced server errors: • • • • • • 500 – Internal Server Error 501 – Not Implemented 502 – Bad Gateway 503 – Service Unavailable 504 – Gateway Time-out 505 – HTTP Version Not Supported Session, Sessionize, Sessionization The process of dividing and ordering a list of page views and events in a site’s log into visits or sessions, where each visit includes the sequence of pages viewed by a visitor during a specified time period. Shopping Cart A part of a shopping web site where visitors can park items they have selected, presumably for eventual purchase. • Glossary 225 Single Access Page In WebTrends 6.x and before, a visit that consists of only one page view. In WebTrends 7.x and after, these are called “Single-page Visits.” Single-page Visit A visit that consists of only one page view. In Single-page Visits, the page viewed is counted in at least three WebTrends reports: Single-page Visits, Entry Pages, and Exit Pages. SmartSource A trademarked technology from WebTrends. SmartSource Data Management offers an alternative to traditional web server log file analysis, collecting information directly from the visitors' browser (the client) rather than from server log files, improving data accuracy. Special script in a page’s source code is used to transmit page-level data, not “hit-level” data, to a data collection server— dramatically reducing data volume and decreasing processing time. Advantages of using SmartSource include capturing page views resulting from back button use, views of cached pages, and the opportunity to collect extra, customized data not included in normal web server log files. SmartSource Data Collector (SDC) A specialized web server application, proprietary to WebTrends that acts as the recipient and organizer of data transmitted from web pages by WebTrends SmartSource Tags. The SmartSource Data Collector also validates and generates cookies and delivers a .gif file as part of the data collection process. SmartSource Parameter WebTrends’ SmartSource SDC tagging is often used to insert valuable reporting information into the query string of URLs. This is done through SmartSource Parameters, which consist of name=value pairs. SmartSource Tags A WebTrends script (JavaScript or VBScript) that can be added to the code of a web page to capture information about a visit to that web page (for example, IP of visitor, time of day, name of page, parameters, etc.) and send it to a data collection server such as WebTrends’ SmartSource Data Collector. The code is executed when the page is loaded into a browser. Spider 226 An automated program that crawls widely through the Internet and collects and WebTrends Implementation Guide indexes information, usually on behalf of a search engine or a monitoring company. A spider can often by identified through the User Agent field of a log file, or through its IP address. Status Code A code in the “status” field of a log file that identifies the success, failure, and other characteristics of a transfer of data from a server to a browser. Also called Return Code. • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 100 = Success: Continue 101 = Success: Switching Protocols 200 = Success: OK 201 = Success: Created 202 = Success: Accepted 203 = Success: Non-Authoritative Information 204 = Success: No Content 205 = Success: Reset Content 206 = Success: Partial Content 300 = Success: Multiple Choices 301 = Success: Moved Permanently 302 = Success: Found 303 = Success: See Other 304 = Success: Not Modified 305 = Success: Use Proxy 307 = Success: Temporary Redirect 400 = Failed: Bad Request 401 = Failed: Unauthorized 402 = Failed: Payment Required 403 = Failed: Forbidden 404 = Failed: Not Found 405 = Failed: Method Not Allowed 406 = Failed: Not Acceptable 407 = Failed: Proxy Authentication Required 408 = Failed: Request Time-out 409 = Failed: Conflict 410 = Failed: Gone 411 = Failed: Length Required 412 = Failed: Precondition Failed 413 = Failed: Request Entity Too Large • Glossary 227 • • • • • • • • • • Stem Step 414 = Failed: Request-URI Too Large 415 = Failed: Unsupported Media Type 416 = Failed: Requested range not satisfiable 417 = Failed: Expectation Failed 500 = Failed: Internal Server Error 501 = Failed: Not Implemented 502 = Failed: Bad Gateway 503 = Failed: Service Unavailable 504 = Failed: Gateway Time-out 505 = Failed: HTTP Version Not Supported The part of a dynamic URL that is the template. It is usually the part of the URL before the question mark that separates the template from the parameters. Same as URL Stem Field. In path analysis, each page view in the path is a step. In Scenario Analysis, each page in the Scenario is a step. Subtotal In WebTrends report tables, this usually refers to the total for just the line items appearing in the part of the table on one report page, i.e., that can be seen by scrolling but not by clicking on a “forward” or “back” button. If a table spans several pages, each page’s portion of the table will have its own subtotal. Statistics for parts of the table not shown on the current page will appear as “Other.” Suffix (Domain Name) The three digit suffix of a domain name can be used to identify the type of organization to which the web site belongs. For example, the suffix .edu implies that the organization associated with the site is an educational organization. Table Tag 228 In WebTrends, a matrix or tabular array of results. Each report usually contains one or more graphs and a table. A table may be broken up to span several pages, or it may fit on one page. A script (JavaScript or VBScript) that can be added to the code of a web page to capture information about a visit to that web page (for example, IP of visitor, WebTrends Implementation Guide time of day, name of page, parameters, etc.) and send it to a data collection server such as WebTrends’ SmartSource Data Collector. WebTrends’ proprietary tag is called the SmartSource Tag. Target Page When a redirect page is used, the target page is the page to which the visitor’s browser is sent. The term can also refer to the web page that is the destination of a hyperlink. Template A collection of WebTrends settings that has a unique name and defines the content and appearance (language, style) of reports to which it is applied. When specified in conjunction with a profile, it determines a complete report configuration that can then be analyzed. In many cases, a given template can be applied to any profile, and a given profile can have many templates. A template allows you to automate and easily customize the content on the WebTrends Desktop for a specific business function or user. Templates give administrators and users the ability to customize their views, as well as assign dashboards, reports and language preferences to a given template. Time to Serve The time it takes to serve up a web page to a visitor, measured in milliseconds. Top The pages from which most users enter the site or leave the site. Can be distorted by non-human traffic (for example, spiders and robots). Useful to see if lots of people are following a particular link out of the site or whether visitors appear to have a bookmarked page other than the homepage. Top-Level Domain The suffix of a domain name. A top-level domain can be based on the type of organization (.com, .edu, .gov, .name, etc.) or it can be a country code (.uk, .de, .jp, .us, etc.). The top-level domain can be used to identify the type of web site. Traffic In general terms, the number of visits, visitors, or activity on a web site. Translation Files Comma separated value files (.csv) used to convert analysis information into more helpful report data. Their uses include creating more readable reports and • Glossary 229 providing drilldown analysis for campaigns and products. They can translate a captured value into another single value or, when using drilldown capabilities, into multiple values that all pertain to the original value. Unique Visitors Number of unique individuals who visited your site during the report period, as identified by a persistent cookie. If someone visits more than once during the report period, they are counted only as one unique visitor. Unique visitors may not perfectly match the number of unique individuals visiting the site, because someone may visit a site from more than one computer and have a different cookie at each computer, or people may share the same computer to access the same web site. Unknown “Unknown” is a possible line item in several WebTrends reports. In geographyrelated and organization-related reports, “unknown origin” means WebTrends was unsuccessful in looking up an IP address or domain name. In first-time versus repeat visitor and buyer reports, it refers to visitors whose browsers did not accept cookies. In repeat visitor reports where all visitors appear as unknown, then the site does not issue persistent cookies. URL Uniform Resource Locator. It is a means of identifying an exact location on the Internet. For example, http://www.webtrends.com/html/info/default.htm is the URL which defines the location of the page Default.htm in the /html/info/ directory on the NetIQ Corporation web site. As the previous example shows, a URL consists of four parts: Protocol Type (HTTP), Machine Name (webtrends.com), Directory Path (/html/info/), and File Name (default.htm). URL Query String The portion of the URL that contains query parameters. URL Stem Field The part of a dynamic URL that is the template. It is usually the part of the URL before the question mark that separates the template from the parameters. Same as Stem. User Agent Portion of a log file that identifies the browser and platform used by a visitor. Also identified through Tags. 230 WebTrends Implementation Guide VBScript Tag A script (VBScript or sometimes JavaScript) that can be added to the code of a web page to capture information about a visit to that web page (such as IP of visitor, time of day, parameters) and send it to a data collection server such as WebTrends’ SmartSource Data Collector. Visit Visitor All the activity, of one visitor’s browser to a web site, within certain time constraints. A visit is a series of page views, beginning when a visitor’s browser requests the first page from the server, and ending when the visitor leaves the site or remains idle beyond the idle-time limit. A person at a computer using a browser to visit a web site. A visitor may make more than one visit during a given time period. Note the combination of person, computer, and browser. Since a person may use different computers or even use different browsers on the same computer, it is possible for him/her to appear as more than one visitor because the chief means of distinguishing a visitor is through a persistent cookie or, less desirably, the combination of IP address and platform/browser details. Visitor History Visitor History is a feature in WebTrends, which when activated, records specific information about the history of your visitors including how often they have visited your site (frequency), how recently they’ve visited (recency), the number of days between their visits (latency), the value of all their purchases (lifetime value), the campaign that generated their first visit to your site, the search engine phrase used most recently to visit your site, and much, much more. Many reports depend on Visitor History being activated, such as any of the Buyers by reports. The Visitor History table has four categories of information it captures, each of which offers a variety of different measurements and possible report combinations that allow visitor segmentation, including: Visit Attributes, Campaign Attributes, Purchase History, and Visitor “Firsts.” Also, Purchase History can measure any form of conversion the WebTrends administrator defines, not just sales. Persistent cookies are used to recognize unique visitors and to record Visitor History events, which are only associated with this unique ID—not specific, known individuals. With all Visitor History measures and reports, a visitor must • Glossary 231 have visited the site during the report time period in order for their Visitor History data (data which may be outside the report time period) to be included in the report. Visitor Session A full time period a visitor spends at a particular site. As soon as there is 30 minutes (definable within WebTrends) of inactivity, the session is closed. WAP Wireless Application Protocol. WAP Browser A program used on a WAP device to display site content, similar to Netscape or Internet Explorer on PCs. WAP Carrier A server that acts as an intermediary and relays requests from visitors with WAP devices to your site. WAP Device A wireless device using Wireless Application Protocol (WAP), such as a cellular telephone or radio transceiver, that can be used to access the Internet. WebTrends software reports only include WAP devices if the web data activity file shows the device used a WAP browser. WebTrends Data Warehouse The WebTrends Data Warehouse (formerly called the Webhouse Builder) transforms raw web data activity files into a normalized format which can later be used by web traffic analysis profiles for analysis and reporting. Without the WebTrends Data Warehouse, large logs files must typically be stored on a separate machine accessed through a mapped drive, which makes the speed of the analysis dependent on the speed of the network connection. Additionally raw web data activity files are just that, unprocessed, and in their original state. Web data activity files that have been imported and stored using the WebTrends Data Warehouse have already been parsed, normalized, processed, and possibly even filtered, making reporting time for large logs significantly shorter. 232 WebTrends Implementation Guide Well-known Parameter Specially named URL parameters that work specifically with the WebTrends Auto-configuration feature. These parameters are created and transmitted by SmartSource Tags or using WebTrends Script, and are recognized by WebTrends to allow automatic generation of reports based on those parameters, without the need for configuration on the part of the WebTrends administrator. For example, parameters can be used to assign a page to certain Content Groups, Scenarios, or to insert data into Visitor History Tables as “first campaign” or other attributes. WTLS Acronym for Wireless Transport Layer Security protocol, which is the security layer endorsed by the WAP Forum (www.wapforum.org). Its primary goal is to provide privacy, data integrity, and authentication for WAP applications. Zero-page Visit A visit that included no page views. This is possible if a visit consisted of at least one request for a non-page file (such as a graphic), but no page files (such as .htm, .asp, .jsp, or .cfm). • Glossary 233 234 WebTrends Implementation Guide Index authenticated username filter 103 authentication 202 average frequency 203 average latency 203 average lifetime value 203 average recency 203 average, statistical term 202 B A A/B testing 19 abandonment rate 201 Accessed File Types report 100 acknowledgement page 201 acquisition 201 email marketing 134 referrers 123 acquisition metrics 119 Activity by Referring Site report 124 Activity by Search Engine report 132 activity, web 201 ad 201 ad campaign 201 Ad Click 85, 202 Ad Clicks filter 102 Ad View 85, 202 Ad Views filter 102 address filter 99 Address, web 202 Advertising Views 85 aggregate 202 archiving 189 authenticated user 202 authenticated username identifying visitors 68 banner, banner ad 203 behavior segmentation 160 bookmark 203 branding web sites 37 browser 203 browser filter 98 business goals 28 business metrics 30 C caching files from an FTP server 192 caching uncompressed web data activity files 191 campaign 203 campaign creative 204 campaign drilldown 204 campaign filter 106 campaign ID 204 campaign type 204 Campaigns report 135 checkout page 204 child profile 205 click 205 clickstream analysis 142 click-through-rate 205 client 205 client errors 205 • 235 client-side data collection 205 client-side tagging 49 benefits 51 drawbacks 52 collecting web activity data 41 combined log file format 206 commerce web sites 28, 33 complete path 142 consulting with WebTrends 17 content group 206 content group path 144 content groups 77 Content Groups report 79 content path 206 content web sites 32 conversion metrics 139 cost 140 conversion, conversion rate 206 cookie expiration 67 cookie filter 97 cookies 64, 206 pitfalls 65 corporate portal web site 30 cost of conversion metrics 140 critical metrics 27 CRM 173 custom reports 112 customer databases 173 customer relationship management 173 customer retention 24 customer self-service web site 29 D dashboard 207 data aggregation 109 data collection methods 41 choosing 53 236 WebTrends Implementation Guide data collection worksheet 54 data exploration 171 data farming 171 data integration 171 data record, sample 59 Data Source Splitter (DSS) 207 data storage issues 187 data tagging 49 benefits 51 drawbacks 52 day of the week filter 102 dead-end paths 154 defining behaviors worksheet 92 demographic data 154, 173 destination page 208 dimension 208 directory 208 directory filter 101 DNS (Domain Name Service) 62 DNS Lookup 208 documents 208 domain names 209 pitfalls 63 visitor identification 62 domain type 209 drill down 209 drill down capability 177 DSS 209 dynamic page 209 dynamic pages URL rebuilding 87 dynamic web page 76 E email campaigns, tracking multiple 135 email marketing and acquisition 134 embedded IDs 67 entertainment web site 29 entry file 210 entry page filter 104 entry pages 120, 210 Entry Pages report 121 Excel 171 Excel’s PivotTable function 171 exclude filters 94 exit pages 152, 210 Exit Pages report 152 exit ratio analysis 152 external databases 172 F file 210 types 89, 210 file filter 100 filtering data 93 filtering worksheet 117 filters 210 Ad Views 102 address 99 authenticated username 103 browser 98 campaign 106 clicks 102 cookie 97 day of the week 102 directory 101 entry page 104 exclude 94 file 100 hit 95 hour of the day 102 HTTP method 97 include 94 multi-homed domain 98 referrer 105 requested URL 96 return codes 99 visit 95 first-time buyer 210 first-time vs repeat visitors 139 focused path 142 forms 211 frequency 161, 211 FTP 211 funnel 211 G geography drilldown 211 GeoTrends 173 GeoTrends database 211 GIF file 212 H hit 212 defined 58 hit filter criteria 96 hit filters 95 Hits Trend report 102 home page 89 homepage 212 homepage URL 212 hosted solutions 52 hour of the day filter 102 HTML 212 page title lookups 194 HTTP 212 HTTP methods filter 97 • 237 I LTV 214 identifying visitors 57 include filters 94 informational web site 28 instrumented web page 212 internal search 152 international leads, distribute 22 Internet resolution 193 Intranet web sites 30, 37 IP addresses 213 pitfalls 63 visitor identification 62 M J JavaScript tag 213 JPEG file 213 jump 213 L landing pages 120, 213 latency 161, 213 lead-generation web sites 28, 34 lifetime value 162, 213 link 214 log entry, explained 44 log file rotation/rollover 187 log file sessions 60 log file URL 214 log files 42, 214 access 46 benefits 48 drawbacks 48 format 43 rotation 45 loyal visitor 214 238 WebTrends Implementation Guide marketing campaign 214 mean, statistical term 214 Measurable Improvement Cycle 18 measures 214 media web site 29 median, statistical term 215 metrics acquisition 119 conversion 139 Microsoft Excel 171 mode 215 monetary value 215 most recent campaign 215 Most Recent Search Phrases report 133 multi-homed domain 215 multi-homed domain filter 98 multi-homed log file 216 multi-homed web server 216 multi-page visit 216 multiple filters 108 multiple login IDs 66 problems with 66 N navigation 216 navigation measurement 141 new visitor 216 New vs. Returning Visitors report 140 newsletter sign up 22 no referrer 125, 216 non-hosted solutions 52 O objectives and critical metrics worksheet 39 On Demand Database (ODDB) 197 Onsite Ad Impressions report 85 optimizing worksheet 199 order 217 quantity 217 value 217 order count 217 other, report term 217 P page 217 page title lookups 194 page view 58, 217 paid search phrase 217 palm browser 218 palm device 218 parameter 218 parent-child profiles 115, 218 path 218 path analysis 142, 219 Path Analysis report 146 path of interest 219 percent change 219 performance issues 189, 196 persistent cookies 65, 219 physical data storage issues 187 PivotTable function (Excel) 171 platform 219 portal web site 29 product 219 Product Content Group Paths report 144 product groups 80, 219 Product report 81 profiles 219 definition 94 protocol 220 proxy server buffers 63 psychographics 220 purchase 220 purchase conversion funnel 220 Purchase Conversion Funnel report 150 Q query parameter 220 query string 221 R recency 161, 221 redirect page 221 referrer 221 referrer filter 105 referring domain 221 referring site, domain, URL 125 referring URLs 221 and acquisition 123 registration conversion funnel 221 Registration Conversion Funnel report 83 registration information and demographic information 154 repeat buyers 222 report period, reporting period 222 report templates 222 reports 222 Accessed File Types 100 Activity by Referring Site 124 Activity by Search Engine 132 Campaigns 135 Content Groups 79 Entry Pages 121 Exit Pages 152 • 239 Hits Trend 102 Most Recent Search Phrases 133 New vs. Returning Visitors 140 Onsite Ad Impressions 85 Path Analysis Page 146 Product 81 Product Content Group Paths 144 Purchase Conversion Funnel 150 Registration Conversion Funnel 83 scheduling 196 storing 196 request 222 requested URL filter 96 resellers, finding 21 resolve 223 retention 223 retention metrics 159 return code 223 return code filter 99 returning visitors 222, 223 reverse path 223 RFM 223 rotation of log files 45 rotation/rollover 187 S scenario 224 Scenario Analysis 83, 147 scenario analysis 224 scenario conversion rate 150, 224 scope of analysis, focusing 75 script 224 SDC 226 tags 49 search engine 22, 225 analysis 23 keywords 224 240 WebTrends Implementation Guide marketing 225 search engine optimization (SEO) 225 search engine phrase 224 segmentation 160 self-referring URLs 125 self-service web sites 36 server 225 server errors 225 session cookies 65 session ID 67 session, sessionize 225 sessionizing visits 59 sessions 59 shared key between two databases 174 shopping cart 225 process 148 scenario analysis 149 simultaneous analysis 196 single access page 226 single jump analysis 146 single-page visit 226 site objectives 27 site structure issues 87 SmartReports 176 SmartSource 226 tagging 49, 226 SmartSource Data Collector (SDC) 226 and cookies 65 and URL classification 77 SmartSource Parameter 226 SmartView 156 software solutions 52 spider programs 226 static web page 76 status code 227 stem 228 step (in a path) 228 storage issues 189 subtotal 228 suffix (domain name) 228 URL stem field 230 user agent 230 T V table 228 table filtering 110 table limiting 195 tag 228 tagging 49 benefits 51 drawbacks 52 target page 229 template 229 time stamp 61 time to serve 229 top pages 229 top-level domain 229 traffic 229 training with WebTrends 17 translation files 229 VBScript tag 231 visit 231 visit characterization worksheet 137, 158, U unique visitors 59, 167, 230 unknown 230 URL 230 URL classification 75 Advertising Views 85 and SmartSource Data Collector (SDC) 77 content groups 77 example 76 product groups 80 scenario analysis 83 WebTrends methods 77 URL format 75 URL query string 230 URL rebuilding 87 169 visit filter criteria 104 visit filters 95 visit, defined 58 visitor 231 behavior 73 defined 58 goals 28 identification 57 identifiers 61 segmentation 160 visitor history 164, 231 visitor ID worksheet 72 visitor session 232 visitor summary 168 visitors worksheet 185 visit-to-exit ratio 153 W WAP 232 WAP browser 232 WAP carrier 232 WAP device 232 warehouse reporting 175 web activity 201 collection methods 41 defining 57 web activity data collecting 41 • 241 web address 202 web analysis focus 28 web analysis introduction 13 web data activity files caching uncompressed 191 web data warehouse 172 reporting 175 web log worksheet 39 web page, dynamic 76 web server log files 42 web site branding oriented 37 business metrics 30 business models 31 commerce oriented 33 content oriented 32 goals 20 intranet oriented 37 lead-generation oriented 34 objectives 27, 28 objectives and critical metrics worksheet 39 self-service oriented 36 strategy 15 structure issues 87 web-customer intelligence 14 WebTrends consulting and training 17 WebTrends Data Warehouse 232 WebTrends Enterprise 52 WebTrends GeoTrends 173 WebTrends On Demand 52 WebTrends SmartReports 176 WebTrends SmartSource Data Collector (SDC) 49 WebTrends SmartView 156 well-known parameter 233 worksheet data collection 54 defining behaviors 92 242 WebTrends Implementation Guide filtering 117 objectives and critical metrics 39 optimizing 199 visit characterization 137, 158, 169 visitor ID 72 visitors 185 web log 39 WTLS 233 Z zero-page visit 233