Affymetrix Expression Console Software v1.0
Transcription
Affymetrix Expression Console Software v1.0
manual.book Page i Thursday, October 5, 2006 12:21 PM Affymetrix Expression Console™ Software Version 1.0 — User Guide P/N 702387 Rev. 1.0 manual.book Page ii Thursday, October 5, 2006 12:21 PM ii For research use only. Not for use in diagnostic procedures. Trademarks ®, GeneChip®, HuSNP®, GenFlex®, Flying Objective™, Affymetrix®, ® CustomExpress , CustomSeq®, NetAffx™, Tools To Take You As Far As Your Vision®, The Way Ahead™, Powered by Affymetrix™, GeneChip-compatible™, and Command Console™ are trademarks of Affymetrix, Inc. All other trademarks are the property of their respective owners. Limited License Notice Limited License. Subject to the Affymetrix terms and conditions that govern your use of Affymetrix products, Affymetrix grants you a non-exclusive, non-transferable, non-sublicensable license to use this Affymetrix product only in accordance with the manual and written instructions provided by Affymetrix. You understand and agree that except as expressly set forth in the Affymetrix terms and conditions, that no right or license to any patent or other intellectual property owned or licensable by Affymetrix is conveyed or implied by this Affymetrix product. In particular, no right or license is conveyed or implied to use this Affymetrix product in combination with a product not provided, licensed or specifically recommended by Affymetrix for such use. Patents Software products may be covered by one or more of the following patents: U.S. Patent No's. 5,733,729; 5,795,716; 5,974,164; 6,066,454; 6,090,555, 6,185,561 6,188,783, 6,223,127; 6,228,593; 6,229,911; 6,242,180; 6,308,170; 6,361,937; 6,420,108; 6,484,183; 6,505,125; 6,510,391; 6,532,462; 6,546,340; 6,687,692; 6,607,887; and other U.S. or foreign patents. Copyright ©2006 Affymetrix, Inc. All rights reserved. manual.book Page iii Thursday, October 5, 2006 12:21 PM iii Table of Contents CHAPTER 1 Welcome 3 INTRODUCTION 3 WORKFLOW DIAGRAM 5 ABOUT THIS MANUAL 6 FAQS 7 CONVENTIONS USED IN THIS GUIDE 7 TECHNICAL SUPPORT CHAPTER 2 CHAPTER 3 10 Installation and Setup 13 SOFTWARE REQUIREMENTS 13 MINIMUM HARDWARE RECOMMENDATIONS 13 INSTALLATION INSTRUCTIONS 14 SETUP PROFILE 14 LIBRARY FILES 17 Creating a Study 25 GETTING STARTED 25 ANALYZING FILES 32 manual.book Page iv Thursday, October 5, 2006 12:21 PM iv CHAPTER 4 CHAPTER 5 Affymetrix® Expression Console™ Software v.1.0 – User Guide 3’ Expression Array Analysis ANALYSIS CONTROLS 35 ANALYSIS ALGORITHMS 35 ANALYZE DATA 36 RESULTS 38 Exon Array Analysis SPECIFY CONTROLS CHAPTER 6 CHAPTER 7 CHAPTER 8 35 QC Tables and Graphs 43 43 53 REPORTS 54 GRAPHS 58 Exporting Data 81 EXPORTING 81 SAVING IMAGES 86 Controls and Thresholds 89 REPORT CONTROLS 89 REPORT THRESHOLDS 95 manual.book Page v Thursday, October 5, 2006 12:21 PM Contents CHAPTER 9 CHAPTER 10 Appendix A Appendix B Advanced Analysis v 101 3' EXPRESSION ARRAY CONFIGURATIONS 101 MASK FILES 109 MAS 5.0 CHP FILES IN THE COMMAND CONSOLE FORMAT 113 MAS 5.0 CHP FILES IN THE GCOS FORMAT 115 ADVANCED CONFIGURATION EXON ANALYSIS 116 Analysis Scripts 123 ANALYSIS SCRIPT CREATION 123 ANALYSIS SCRIPT DELETION 125 ANALYSIS SCRIPT EXECUTION 126 Algorithms 129 MAS 5.0 ALGORITHM 129 RMA ALGORITHM 130 PLIER ALGORITHM 131 COMPARISON OF ALGORITHMS 132 Algorithm Parameters and Outputs MAS 5.0 COLUMN HEADINGS 135 135 manual.book Page vi Thursday, October 5, 2006 12:21 PM vi Affymetrix® Expression Console™ Software v.1.0 – User Guide COLUMN HEADINGS FOR RMA AND PLIER 138 PROBE SET SUFFIXES 140 Index 143 manual.book Page 1 Thursday, October 5, 2006 12:21 PM Chapter 1 Welcome manual.book Page 2 Thursday, October 5, 2006 12:21 PM Chapter 1 manual.book Page 3 Thursday, October 5, 2006 12:21 PM 3 Welcome Welcome to the Affymetrix® Expression Console™ software (v.1.0) User Guide. The Expression Console application provides signal estimation and QC functionality for the GeneChip® Expression Arrays (3’ Expression Arrays and Exon Arrays). The Expression Console software allows users to: • Generate probe set summarization (CHP) files from feature intensity (CEL) files for both 3’ Expression Arrays and Exon Arrays • Capture a standard set of metrics for evaluating the success of the individual hybridizations for both 3' Expression Arrays and Exon Arrays • Identify outlier samples in the data set The Expression Console software is targeted for research personnel (such as laboratory technicians, research associates, and scientists) analyzing Affymetrix GeneChip® data. Introduction The Affymetrix® Expression Console™ software provides an easy way to create summarized expression values (CHP files) for individual or collections of 3' Expression Array and Exon Array feature intensity (CEL) files. In addition to CHP writing, the Expression Console application also produces a collection of QC metrics for evaluating the success of hybridizations. The user defines thresholds for these metrics and the software highlights the metrics that do not meet the defined thresholds. In addition, the CHP files are highlighted in the study table, if they have any metrics outside of the defined thresholds. Individual QC metrics for each labeling technique are discussed in Chapter 6, QC Tables and Graphs. The Expression Console application contains graphic capabilities for visual inspection of the hybridization results. To identify outliers, the application is designed to display: • Line graphs for individual or collections of metrics or probe sets • Box plots for signal distributions before or after normalization manual.book Page 4 Thursday, October 5, 2006 12:21 PM 4 Affymetrix® Expression Console™ Software v.1.0 – User Guide • MvA plots for signal distributions • Heat maps for correlation matrices The Expression Console application is not a secondary analysis package. However, it does create the CHP files required for secondary analysis packages from the Affymetrix GeneChip® Compatible Program. manual.book Page 5 Thursday, October 5, 2006 12:21 PM Chapter 1 | Welcome Workflow Diagram Figure 1.1 Affymetrix® Expression Console™ software workflow 5 manual.book Page 6 Thursday, October 5, 2006 12:21 PM 6 Affymetrix® Expression Console™ Software v.1.0 – User Guide About this Manual This manual presents information about the Affymetrix Expression Console™ software in the following chapters and appendices: • Chapter 2, Installation and Setup: Describes how to install and configure the Affymetrix® Expression Console™ v.1.0 software. • Chapter 3, Creating a Study: Describes how to create a study to analyze the array data. • Chapter 4, 3’ Expression Array Analysis: Describes how to create CHP files from CEL files for 3' Expression Arrays using either the MAS5, RMA, or PLIER algorithm. • Chapter 5, Exon Array Analysis: Describes how to create CHP files from CEL files for Exon Arrays by applying either RMA or PLIER algorithms. • Chapter 6, QC Tables and Graphs: Describes how to run reports and apply graphs for data interpretation. • Chapter 7, Exporting Data: Describes how to export data using PDF, TXT, and PNG file options. • Chapter 8, Controls and Thresholds: Describes how to identify, define, modify and/or remove controls and thresholds. • Chapter 9, Advanced Analysis: Describes how to modify the default algorithm parameters for either the 3' Expression Array or Exon Array. • Chapter 10, Analysis Scripts: Describes how to create a script that automates the process of running the selected analysis algorithm and creating a standard set of QC graphs and tables. • Appendix A, Algorithms: Briefly describes the algorithms offered in the Expression Console™ software with links to reference material for further reading. manual.book Page 7 Thursday, October 5, 2006 12:21 PM Chapter 1 | Welcome 7 • Appendix B, Algorithm Parameters and Outputs: Gives definitions for report column headings, which represent MAS5, RMA, and PLIER output data. FAQS A list of Frequently Asked Questions (FAQS) about the Affymetrix® Expression Console™ software can be found on the Affymetrix website at www.affymetrix.com; then go to /Support/Product/Software/Expression Console Software. Conventions Used in This Guide This manual provides a detailed outline for all tasks associated with Affymetrix® Expression Console™ software. Various conventions are used throughout the manual to help illustrate the procedures described. Explanations of these conventions are provided below. STEPS Instructions for procedures are written in a numbered step format. Immediately following the step number is the action to be performed. Following the response, additional information pertaining to the step may be found and is presented in paragraph format. For example: 1. Click Yes to continue. The Delete task proceeds. In the lower right pane the status is displayed. FONT STYLES Bold fonts indicate names of commands, buttons, options or titles within a dialog box. When asked to enter specific information, the input is displayed in italics within the procedure being outlined. For example: 1. Click the Find toolbar button ; or manual.book Page 8 Thursday, October 5, 2006 12:21 PM 8 Affymetrix® Expression Console™ Software v.1.0 – User Guide Select Edit Find from the menu bar. The Find dialog box opens. 2. Enter AFFX-BioB-5_at in the Find what box, then click Find Next to view the first search result. 3. Continue to click Find Next to view each successive search result. SCREEN CAPTURES The steps outlining procedures are frequently supplemented with screen captures to further illustrate the instructions given. The screen captures depicted in this manual may not exactly match the windows displayed on your screen. ADDITIONAL COMMENTS Throughout the manual, text and procedures are occasionally accompanied by special notes. These additional comments and their meanings are described as follows: Information presented in Tips provide helpful advice or shortcuts for completing a task. The Note format presents supplemental information pertaining to the text or procedure being outlined. The Important format presents important information that may affect the accuracy of your results. manual.book Page 9 Thursday, October 5, 2006 12:21 PM Chapter 1 | Welcome 9 Caution notes advise you that the consequence(s) of an action may be irreversible and/or result in lost data. Warnings alert you to situations where physical harm to person or damage to hardware is possible. manual.book Page 10 Thursday, October 5, 2006 12:21 PM 10 Affymetrix® Expression Console™ Software v.1.0 – User Guide Technical Support Affymetrix provides technical support to all licensed users via phone or E-mail. To contact Affymetrix® Technical Support: AFFYMETRIX, INC. 3420 Central Expressway Santa Clara, CA 95051 USA Tel: 1-888-362-2447 (1-888-DNA-CHIP) Fax: 1-408-731-5441 sales@affymetrix.com support@affymetrix.com AFFYMETRIX UK Ltd., Voyager, Mercury Park, Wycombe Lane, Wooburn Green, High Wycombe HP10 0HH United Kingdom UK and Others Tel: +44 (0) 1628 552550 France Tel: 0800919505 Germany Tel: 01803001334 Fax: +44 (0) 1628 552585 saleseurope@affymetrix.com supporteurope@affymetrix.com Affymetrix Japan K.K. Mita NN Bldg. 16F 4-1-23 Shiba Minato-ku, Tokyo 108-0014 Japan Tel. 03-5730-8200 Fax: 03-5730-8201 salesjapan@affymetrix.com supportjapan@affymetrix.com www.affymetrix.com manual.book Page 11 Thursday, October 5, 2006 12:21 PM Chapter 2 Installation and Setup manual.book Page 12 Thursday, October 5, 2006 12:21 PM Chapter 2 manual.book Page 13 Thursday, October 5, 2006 12:21 PM 13 Installation and Setup The Affymetrix® Expression Console™ software is a stand-alone application. It can be installed on computers that have GeneChip® Operating System (GCOS) software, Affymetrix GeneChip® Command Console™ (AGCC) software, or neither. Affymetrix recommends that if you are using GCOS CEL files, you should use the Data Transfer Tool (DTT) provided by Affymetrix to move the CEL files out of the GCOS directory. Software Requirements The Expression Console software can be installed with the following operating systems: • Microsoft Windows 2000 Professional with service pack 4.0 or higher • Microsoft Windows XP with service pack 2.0 or higher Minimum Hardware Recommendations The minimum hardware recommendations are: • Memory (RAM): 1 GB* • Hard drive: 20 GB** (sufficient space should be available to meet user data requirements) • Processor: 2.0 GHz Intel Pentium or higher * Due to the large number of datapoints produced by the Exon Arrays, an additional 1GB of RAM is highly recommended. **The larger file sizes associated with Exon Array data should be taken into account when calculating the necessary free space requirement. manual.book Page 14 Thursday, October 5, 2006 12:21 PM 14 Affymetrix® Expression Console™ Software v.1.0 – User Guide Installation Instructions To install the Expression Console software: 1. Go to www.affymetrix.com and download the software from the following location: Support / By Product / Software / Affymetrix® Expression Console™ Software Unzip the downloaded software package. 3. Double-click on setup.exe to install the software. 4. Follow the directions provided by the installer. 5. The setup process installs the required Microsoft components, which includes the .NET 2.0 framework. 2. If there is a previous version of Expression Console software installed, the installation software prompts the user to remove it before installing the new version. Setup Profile The Affymetrix® Expression Console™ software is ready for configuration. Follow the steps below to configure the software for sample analysis. Create a Profile A profile is used as a method to group options and parameters so that those parameters can be used again. Once the software is started, the profile can only be changed when there is no study is open, by clicking , selecting Edit → Change User Profile from the drop-down menu, or going to Toolbox → Configuration → Specify User Profile. To open the software and setup a profile, perform the following steps: manual.book Page 15 Thursday, October 5, 2006 12:21 PM Chapter 2 | Installation and Setup 1. 15 Open the Expression Console™ application by selecting Start → Programs → Affymetrix → Expression Console. The Expression Console software window opens with the Profile Information dialog box displayed. Affymetrix® Expression Console™ software - Profile Information Window Type in a name for your profile and click OK. By entering a profile name, you retain the specific analysis settings and quality control thresholds for each profile entered. A list of previously created profiles, if any, are found in the drop-down menu on the Profile Information dialog box. 2. You can select a different profile without terminating the program, but the current study must be closed to open another profile. manual.book Page 16 Thursday, October 5, 2006 12:21 PM 16 Affymetrix® Expression Console™ Software v.1.0 – User Guide Delete A Profile The list of previously created profiles are found in the drop-down menu on the Profile Information dialog box. To remove profiles no longer needed: 1. 2. 3. 4. 5. Close the study window, if open. Select File → Utilities. Select the User Profile Management tab (Figure 2.1) Highlight the profile to be removed. Click Delete. Figure 2.1 User Profile Management Tab manual.book Page 17 Thursday, October 5, 2006 12:21 PM Chapter 2 | Installation and Setup 17 Library Files Library Files – Download Option The Expression Console software requires information stored in library files (array types) to analyze the CEL files generated by GCOS or Affymetrix GeneChip® Command Console™ (AGCC) software. These files are available from Affymetrix and can be downloaded within the Expression Console application. When you click OK in the Profile Information window the first time (Figure 2.1), a dialog box opens asking you to direct the software to your GeneChip library files folder (Figure 2.2). You can select any location for the library files folder You can select any location for the library files folder. However, once you direct the software to the folder location, do not place any library files in a subfolder. The Expression Console application cannot find library files in a subfolder! If the Affymetrix GeneChip® Command Console™ (AGCC) software is installed on your system, the Expression Console™ application defaults to the files in C:\Command_Console\Library. If the Affymetrix GeneChip® Operating System software (GCOS) is installed on your system, Affymetrix recommends that you do not select the GCOS library file directory as the library file directory for Expression Console, to avoid confusion. manual.book Page 18 Thursday, October 5, 2006 12:21 PM 18 Affymetrix® Expression Console™ Software v.1.0 – User Guide Expression Console downloads library files from NetAffx for analysis, but these are not registered with GCOS and are not sufficient to scan arrays. Figure 2.2 Browse For Folder window to locate GeneChip library files folder Library files can be downloaded from NetAffx™. 1. Select File → Download Library Files. A dialog box opens requesting your account information for NetAffx (Figure 2.3). To obtain a NetAffx account, go to www.affymetrix.com, click Register at the top of the Affymetrix main page and follow the instructions. manual.book Page 19 Thursday, October 5, 2006 12:21 PM Chapter 2 | Installation and Setup 19 Figure 2.3 NetAffx Account Information dialog box 2. Enter your registered email address and password. The NetAffx Library Files window opens (Figure 2.4). This window contains a complete list of Affymetrix library files that can be downloaded. Library files previously downloaded are marked currently installed. manual.book Page 20 Thursday, October 5, 2006 12:21 PM 20 Affymetrix® Expression Console™ Software v.1.0 – User Guide Figure 2.4 NetAffx Library Files window - downloading a checked library file 3. Check the library files needed and click the Download button. The appropriate files are downloaded to the folder you previously indicated. The green status bar at the bottom of the window highlights during the download process. If the Abort button is selected, the download process stops with a message indicating which library files failed to download. manual.book Page 21 Thursday, October 5, 2006 12:21 PM Chapter 2 | Installation and Setup 21 When you create a new study and add CEL/CHP files that do not have corresponding library files loaded in the correct folder, you will be prompted to download the appropriate library files. Figure 2.5 Status Window - indicating missing library file Library Files – Copying Files Manually For computers that are not connected to the internet and therefore cannot take advantage of the library file download option, it is possible to manually copy the necessary files to the computer with Expression Console: Create a folder on the computer to hold the library files for the Expression Console application. 2. Copy the necessary files from the CD or other removable media to the library file folder. In order for the default report controls to be identified for the array type, this should be done with the application closed. 1. For 3' Expression Arrays, only the .psi and .cdf files need to be copied to the directory. B. For Exon Arrays, minimally the .clf, .pgf, .bgp, and .qcc files need to be copied to the directory. A. Do not create subdirectories within the library file folder. The Expression Console software does not look at subdirectories. manual.book Page 22 Thursday, October 5, 2006 12:21 PM 22 Affymetrix® Expression Console™ Software v.1.0 – User Guide To continue, proceed to Chapter 3, Creating a Study. manual.book Page 23 Thursday, October 5, 2006 12:21 PM Chapter 3 Creating a Study manual.book Page 24 Thursday, October 5, 2006 12:21 PM Chapter 3 manual.book Page 25 Thursday, October 5, 2006 12:21 PM 25 Creating a Study This section describes how to create a study with probe level intensity files (CEL) and probe level summarization files (CHP) in the Affymetrix® Expression Console™ software. Getting Started To get started using the Affymetrix® Expression Console™ software, the user creates a study consisting of a collection of probe cell intensity files (CEL) and/or probe level summarization files (CHP) and their associated sample information for analysis and examination. The study table contains the following: • File names (CEL or CHP) • Sample attribute information from either the .xml files for GCOS (requires the use of DTT to transfer the data from the GCOS database) or .arr files (sample/array) from AGCC • An indication of whether or not CHP files pass the user-defined tolerances for the array metrics. See Chapter 8, Controls and Thresholds. • Whether or not the results are on a linear or log scale By default, the algorithms PLIER and MAS 5.0 are linear; the RMA algorithm is in the log scale. Under the edit menu, the scale can be changed, but this does not change the scale of the data in the CHP file. See Appendix A, Algorithms. Follow the steps below to create a study in the Expression Console software. GCOS users must use DTT v1.1, using the Flat File option, to transfer files to be analyzed by the Expression Console software from the GCOS database to an independent folder. More detailed instructions can be found at www.affymetrix.com; then go to Support/Technical/Tutorial/GCOS. manual.book Page 26 Thursday, October 5, 2006 12:21 PM 26 Affymetrix® Expression Console™ Software v.1.0 – User Guide Study Window Follow the steps below to create a study to analyze CEL files or view CHP file data. 1. Create a study by first selecting one of the following: - The icon - File → New Study - Toolbox → Study → Create New Study. For further analysis in an existing study, select , or File → Open Study, or Open → Existing Study from the Toolbox. The Affymetrix Study window opens. Figure 3.1 Affymetrix Study window 2. To add CEL files from your data set of interest to the study window, do one of the following: manual.book Page 27 Thursday, October 5, 2006 12:21 PM Chapter 3 | Creating a Study 27 - Click the Add Intensity Files button - Select File → Add to Study → Probe Cell Intensity Files - In the workflow toolbox, select Study → Add Intensity Files. The Select Probe Cell Intensity Files window opens. Sample attributes from .arr or .xml files are displayed for the associated CEL files, if available. Figure 3.2 Affymetrix Study window with CEL files manual.book Page 28 Thursday, October 5, 2006 12:21 PM 28 Affymetrix® Expression Console™ Software v.1.0 – User Guide Figure 3.3 Select Probe Cell Intensity Files window 3. To add previously created CHP files to the study window , select the Add Summarization Files button, click File → Add to Study → Probe Level Summarization Files, or go to the workflow toolbox and select Study → Add Summarization Files. The Select Probe Level Summarization Files window opens and sample attributes from .chp files are displayed. Use the Files of type menu selection at the bottom of the window to filter the CHP files based on the selected algorithm type. manual.book Page 29 Thursday, October 5, 2006 12:21 PM Chapter 3 | Creating a Study 29 Figure 3.4 Select Probe Level Summarization Files window After CHP files are added to the study, either at the end of an analysis or directly by clicking the Add Summarization Files button, the metrics and controls associated with that algorithm run are compared against the user defined thresholds. Any files that fail to meet any one of the criteria are highlighted. On a system with AGCC, the Expression Console software uses the AGCC index to locate the matching ARR file; otherwise, it pairs the CEL/CHP and ARR files by matching the root names. On a system with GCOS, the Expression Console application will use the sample attribute information from the .xml files, if the .xml files are in the same folder as the CEL or CHP files and have the same root name (e.g., 12345.xml and 12345.cel). 4. In the Select Probe Cell Intensity Files or Select Probe Level Summarization Files window, click the column heading of choice to sort the files based on that attribute. manual.book Page 30 Thursday, October 5, 2006 12:21 PM 30 5. Affymetrix® Expression Console™ Software v.1.0 – User Guide Select the files to be added and click Open to add the selected CEL/CHP files to the study window. Library Files To analyze the selected study files, the Expression Console application automatically locates the associated library files in the indicated folder, when the software is first installed. If the selected library path is not specified, Expression Console software defaults to the AGCC library files folder, if it is present. If the correct library files have not been downloaded to the selected library file directory, the following error message displays when you attempt to analyze your files: Figure 3.5 Error message - opening CEL files for analysis The status window at the bottom of the Expression Console application window lists the missing library files. If you need to establish a new path: 1. Select Edit → Set library path and locate the new directory. All studies must be closed to activate the Set library path menu selection. 2. Select the CEL files to analyze and click Open. The CEL files you selected are populated in the Affymetrix Study window with their accompanying attributes. manual.book Page 31 Thursday, October 5, 2006 12:21 PM Chapter 3 | Creating a Study 31 Figure 3.6 Affymetrix Study window with CEL files Study Window Controls The study window in the Expression Console application controls the files that are analyzed by the software. Within the study window the default sort order for the tables and graphs is determined. In addition, a sample attribute can be selected that will be pre-pended to the array names for display in the tables and graphs. File Consolidation A study can contain files that are located in multiple directories on the computer. To aid in file management, the Expression Console application can move all of the files associated with a study to a single directory. Studies cannot be consolidated when a study is open. 1. 2. 3. 4. 5. Select File → Utilities. Select the File Consolidation Tab (Figure 3.7). Browse to find the study to be consolidated. Select an output directory to hold all of the associated files. Click Consolidate and then Close. manual.book Page 32 Thursday, October 5, 2006 12:21 PM 32 Affymetrix® Expression Console™ Software v.1.0 – User Guide Figure 3.7 File Consolidation Tab File consolidation moves the files to the selected directory. Analyzing Files To analyze a 3' Expression Array data set, proceed to Chapter 4, 3’ Expression Array Analysis. To analyze an Exon Array data set, proceed to Chapter 5, Exon Array Analysis. manual.book Page 33 Thursday, October 5, 2006 12:21 PM Chapter 4 3’ Expression Array Analysis manual.book Page 34 Thursday, October 5, 2006 12:21 PM Chapter 4 manual.book Page 35 Thursday, October 5, 2006 12:21 PM 35 3’ Expression Array Analysis Probe cell intensity data from 3' Expression Array analyses is further analyzed in the Affymetrix® Expression Console™ software application using the MAS5, RMA, and PLIER algorithms to create CHP files. Follow the instructions below to analyze 3' Expression Array data. Analysis Controls Affymetrix microarrays contain the hybridization, labeling and housekeeping controls that help determine the success of the hybridizations. For more information on the interpretation of these controls, see the whitepaper, Data Analysis Fundamentals, at for the time www.affymetrix.com. To aid in the examination of these controls, the Expression Console application displays summarized probe information in tabular format. In order for this feature to function properly, the controls must be identified prior to analysis. Once the controls have been identified for a particular GeneChip (probe array type), they are saved and only need to be updated if the user wishes to modify the controls. For most standard Affymetrix GeneChip® Arrays, a standard set of defaults has been provided; however, it is recommended that the user should verify all controls before initiating their analysis. For details on defining the controls and their thresholds, refer to Chapter 8, Controls and Thresholds. Analysis Algorithms By default, the 3' Expression Array workflows are set as follows: • PLIER workflow is Quantile normalization with PM-MM • RMA workflow is Quantile normalization and has a general background correction • MAS 5.0 workflow is set so that all probe sets are scaled to TGT = 500 For more details about the algorithms, see Appendix A, Algorithms and Appendix B, Algorithm Parameters and Outputs. manual.book Page 36 Thursday, October 5, 2006 12:21 PM 36 Affymetrix® Expression Console™ Software v.1.0 – User Guide Analyze Data To analyze 3' Expressson Array data: 1. Create a new study or open an existing study. See Chapter 3, Creating a Study. In the study window, check the CEL files for Analysis. 3. To analyze the selected CEL files in the study, select Analysis → 3' Expression Arrays and select the appropriate workflow (MAS5, RMA, or PLIER) (Figure 4.1). 2. You can also look at the controls alone to obtain a cursory look at the success of the hybridization. This allows you to remove the obvious outliers prior to the time investment of analyzing the full array. Figure 4.1 3’ IVT Expression Array - MAS5 Algorithm analysis The suffix dialog box for the selected workflow opens (Figure 4.2). manual.book Page 37 Thursday, October 5, 2006 12:21 PM Chapter 4 | 3’ Expression Array Analysis 37 Figure 4.2 Expression Console - MAS5 - File name suffix dialog box For advanced analysis, select Analysis → Advanced Expression Configuration and choose the appropriate Configuration or Mask file menu (Figure 4.3). Figure 4.3 Expression Console application - Advanced Analysis For details on how to utilize the advanced analysis features, see Chapter 9, Advanced Analysis. If you want to run a script (an automated analysis with preset parameters) to generate CHP files, select Analysis → Expression Analysis Scripts and create a new script or open an existing script (Figure 4.4). For details on how to set up an analysis script, see Chapter 9, Advanced Analysis. manual.book Page 38 Thursday, October 5, 2006 12:21 PM 38 Affymetrix® Expression Console™ Software v.1.0 – User Guide Figure 4.4 Expression Analysis scripts menu 4. Enter a suffix to identify the specific samples after analysis. When you click OK, analysis begins. The progress of analysis is displayed in the Status window (Figure 4.5). Figure 4.5 MAS5 Analysis files in the Status window Results Several minutes may be required for processing results, depending on the number of arrays selected. Once processing is complete, the study window shows a new group containing the newly summarized data (CHP files). The new group is named with the type of algorithm selected. manual.book Page 39 Thursday, October 5, 2006 12:21 PM Chapter 4 | 3’ Expression Array Analysis 39 The files are listed in the status window at the bottom of your screen as they are generated, and the Affymetrix Study window (Figure 4.6) is populated with the corresponding CHP files. Figure 4.6 Affymetrix Study window - MAS5 analysis results Figure 4.7 Affymetrix Study window - MAS5 analysis results. Highlighted CHP files (Figure 4.7) contain metrics outside of defined thresholds. manual.book Page 40 Thursday, October 5, 2006 12:21 PM 40 Affymetrix® Expression Console™ Software v.1.0 – User Guide manual.book Page 41 Thursday, October 5, 2006 12:21 PM Chapter 5 Exon Array Analysis manual.book Page 42 Thursday, October 5, 2006 12:21 PM Chapter 5 manual.book Page 43 Thursday, October 5, 2006 12:21 PM 43 Exon Array Analysis Probe cell intensity data (CEL) from Affymetrix GeneChip® Exon Arrays are analyzed in the Affymetrix® Expression Console™ software. The application uses the RMA-sketch workflow for both Exon and Gene level analyses to create CHP files. Follow the instructions below to analyze Exon Array data. Affymetrix® Exon Arrays include the following: • Human – HuEx-1_0-st-v2 • Mouse – MoEx-1_0-st-v1 • Rat – RaEx-1_0-st-v1 Specify Controls To analyze data from the Affymetrix Exon Arrays: 1. Download the appropriate library files, File → Download Library Files, (for example, human Exon Arrays require HuEx-1_0-st-v2 library files). To use custom library files for an analysis, see Chapter 9, Advanced Analysis. Create a new study or open an existing study. See Chapter 3, Creating a Study. 3. Add the CEL files you want to use to create CHP files by clicking the Add Intensity Files button. 2. 4. Select the level of analysis by clicking the Run Analysis button in the Affymetrix Study window to see the Available Analyses menu box or select Analysis to display a drop-down menu (Figure 5.1). Then perform one of the following: manual.book Page 44 Thursday, October 5, 2006 12:21 PM 44 Affymetrix® Expression Console™ Software v.1.0 – User Guide Figure 5.1 Select the Run Analysis button in the Affymetrix Study window - For the Gene level, select Analysis → Gene Level (Figure 5.2), or select the Gene Level from the drop-down menu (Figure 5.1). Then select the annotation level, Core or Extended. These describe transcript structure confidence levels (for the annotation used to define the transcript) that are annotations broadly defined as follows: – Core: limits analysis to exon-level probe sets that map to BLAT alignments of mRNA with annotated full-length CDS regions. – Extended: limits analysis to transcripts that are defined by exon-level probe sets that map to cDNA alignments and their annotations based on cDNA alignments, in addition to transcripts defined by the core category. – Full: analysis includes transcripts that consist of exon-level probe sets that map to sets of ab-initio gene predictions in addition to transcripts defined by the core and extended categories. For further explanation of confidence levels, see the Affymetrix whitepaper, Exon Probeset Annotations and Transcript Cluster Groupings, at www.affymetrix.com; then go to Support/Technical Documentation/White Papers. manual.book Page 45 Thursday, October 5, 2006 12:21 PM Chapter 5 | Exon Array Analysis 45 Figure 5.2 Gene Level probe set with confidence modes: Core, Extended, Full - For the Exon level, select Analysis → Exon Level, or select one of the exon level analysis options from the drop-down menu (Figure 5.2). Then select the confidence level, Core, Extended, or All. These are gene confidence levels, which are input transcript annotations broadly defined as follows: – Core: limits analysis to exons that consist of BLAT alignments of mRNA with annotated full-length CDS regions. – Extended: limits analysis to cDNA alignments and their annotations based on cDNA alignments, in addition to core defined exons. – Full: analysis uses exons derived from sets of ab-initio gene predictions in addition to the core and extended exons. – All: consists of all three confidence levels plus probe sets that map to more than one gene or to genes that do not align to the genome. For further explanation of confidence levels, see the Affymetrix whitepaper, Exon Probeset Annotations and Transcript Cluster Groupings, at www.affymetrix.com; then go to Support/Technical Documentation/White Papers. - For Controls, select Analysis → Controls Only → Controls (Figure 5.3). A subset of probes, on all arrays, are identified as controls. This option summarizes expression values for controls only. manual.book Page 46 Thursday, October 5, 2006 12:21 PM 46 Affymetrix® Expression Console™ Software v.1.0 – User Guide You can look at the controls alone to obtain a cursory look at the success of the hybridization. This allows you to remove the obvious outliers prior to the time investment of analyzing the full array. Figure 5.3 Exon analysis - Controls Only The RMA-sketch workflow is automatically selected by the software. Figure 5.4 Available Analyses - Controls Only - Controls 5. Click OK. The suffix dialog box for the selected array and confidence level opens. manual.book Page 47 Thursday, October 5, 2006 12:21 PM Chapter 5 | Exon Array Analysis 47 Figure 5.5 Expression Console - Exon Core - File name suffix dialog box You can add a suffix to the file name in order to further identify your samples, or you can leave it blank. The summarization method is automatically included in the file name. Enter a suffix to identify the specific samples after analysis. 7. Click OK; analysis begins. 6. The processing time for the CEL files depends on a number of factors, which include the number of CEL files, level of analysis, number of probes under consideration, amount of available RAM, and computer processor speed. See Chapter 2, Minimum Hardware Recommendations. The progress of the analysis can be tracked in the status window (Figure 5.6) (Figure 5.7). Since Exon Arrays are processed as a batch, the CHP files are not added to the study until the completion of the analysis. The files are listed in the status window as they are generated; then the Affymetrix Study window (Figure 5.8) is populated with the corresponding CHP files. manual.book Page 48 Thursday, October 5, 2006 12:21 PM 48 Affymetrix® Expression Console™ Software v.1.0 – User Guide Figure 5.6 Exon Core level analysis files in the Status window Figure 5.7 Gene Core level analysis files in the Status window Figure 5.8 Affymetrix Study window - Gene core level analysis results manual.book Page 49 Thursday, October 5, 2006 12:21 PM Chapter 5 | Exon Array Analysis 49 The CHP files are ready for QC analysis within the Expression Console software or other compatible software. CHP files are located in the same directory as the first CEL files in your input list. Outliers, samples outside the threshold boundaries, are highlighted in orange (Figure 5.9). Figure 5.9 Affymetrix Study Window - Gene core level analysis results with outliers manual.book Page 50 Thursday, October 5, 2006 12:21 PM 50 Affymetrix® Expression Console™ Software v.1.0 – User Guide manual.book Page 51 Thursday, October 5, 2006 12:21 PM Chapter 6 QC Tables and Graphs manual.book Page 52 Thursday, October 5, 2006 12:21 PM Chapter 6 manual.book Page 53 Thursday, October 5, 2006 12:21 PM 53 QC Tables and Graphs To help researchers establish quality control processes for gene expression analyses, Affymetrix has developed several controls. Researchers are encouraged to monitor these controls on a regular basis to assess assay data quality. Many of these control metrics are generated as a product of the primary analysis by the respective analysis algorithms. To help with monitoring these values, a collection of tools for viewing and graphing metrics associated with the array and the analysis are provided in the Affymetrix® Expression Console™ software. These metrics include but are not limited to: • Hybridization controls • Labeling controls • Internal control genes (Housekeeping controls) • Global array metrics • Algorithm parameters This section describes how to generate graphs and tables allowing the quality of individual hybridizations included in a single study to be easily assessed and gives some introductory guidance on their interpretation. For more detailed information about graph and table interpretation, refer to the whitepapers, Data Analysis Fundamentals and Exon Arrays Quality Analysis, at www.affymetrix.com. In general, Affymetrix highly encourages users to create a running log of the parameters to monitor quality and potentially flag outlier samples. Evaluation of particular samples should be based on the examination of all sample and array performance metrics in light of this history. A good, general rule for the examination of these types of quality control data is to look for outliers when compared to other highly related samples. For example, tissue A may have an overall low level of gene expression so that the percent of probes detected may normally be between 10–15%. Therefore, a %Present = 11 would not be an indication of a problem. However, if tissue B normally has a %Present between 45–50, and one sample has %Present = 11, that is an indication of a problem. Examination of which metrics or controls are outliers will provide insight into the source of the problem and possible solutions. manual.book Page 54 Thursday, October 5, 2006 12:21 PM 54 Affymetrix® Expression Console™ Software v.1.0 – User Guide The Expression Console software has the ability to store user defined thresholds for the QC metrics and highlight, in the reports and study tables, those metrics outside the thresholds and arrays that contain them. Expression Console software comes with an initial set of defaults for both metrics and thresholds, which the user can modify according to individual preferences. For more information, see Chapter 9, Advanced Analysis. Reports Reports available in the Expression Console software provide a tabular view of all QC metrics, including: • Algorithm parameters (See Appendix B, Algorithm Parameters and Outputs for a list of parameters, outputs, and their definitions.) • Global analysis metrics • Corner +, Corner -, Center +, and Center -, if available • Count and percentages of detection calls • Average signal value for each detection call type • Signals, detections, and 3'/5' ratio for spike and housekeeping controls Note that these metrics are algorithm specific. For details about the MAS 5.0 algorithm parameters, see the GeneChip® Operating System Software (GCOS) manual (PN 701439) and the Statistical Algorithms Reference Guide. For details about the PLIER algorithm parameters, see the Guide to Probe Logarithmic Intensity Error (PLIER) Estimation technical note and the Quality Assessment of Exon Arrays whitepaper at www.affymetrix.com. To view the full set of report metrics, select Report → View Full Report. manual.book Page 55 Thursday, October 5, 2006 12:21 PM Chapter 6 | QC Tables and Graphs 55 Full Report A Full Report (Report → View Full Report) displays a table of the chosen algorithm parameters and the resulting array QC metrics contained within the CHP files. The Threshold Test column displays whether or not all the metrics are within or one or more metrics are outside of the user-specified thresholds. Figure 6.1 Full Report - 3' Expression Array with MAS5 manual.book Page 56 Thursday, October 5, 2006 12:21 PM 56 Affymetrix® Expression Console™ Software v.1.0 – User Guide Custom Report Customization of the report to define a subset of columns (report metrics) for the tabular display is possible by selecting Report → New Report and checking the categories you want to display in the table. The list of available metrics to plot is based on the CHP files in the study. For example, if you have MAS 5.0 CHP files open, then parameters associated with PLIER are not visible. Figure 6.2 New Report - 3' Expression Array with MAS5 manual.book Page 57 Thursday, October 5, 2006 12:21 PM Chapter 6 | QC Tables and Graphs 57 The categories available for selection in the Report window depend on the expression array selected and the algorithms used to generate the CHP files in the study. Delete A Custom Report To delete a Custom Report: Click File → Utilities. 2. Select the Parameter Management tab (Figure 6.3). 3. In the Custom Reports section, select the report to be deleted. 4. Select Delete and then Close. 1. Figure 6.3 Parameter Management Tab manual.book Page 58 Thursday, October 5, 2006 12:21 PM 58 Affymetrix® Expression Console™ Software v.1.0 – User Guide Graphs In addition to the tabular display of the metrics, the Expression Console software supports several different graphical views, including signal histograms, box plots, line graphs, and heat maps depicting the correlation between arrays. The graphical displays, especially the box plots and heat maps, are useful in identifying outlier samples. Figure 6.4 Graph menu items Probe Cell Intensity View The Probe Cell Intensity View (Graph → Probe Cell Intensity View) is a view of the CEL intensities arranged by physical position on the array. This view is used to quickly determine if any features on the array do not have a signal. For in-depth image inspection and gridding evaluation, it is recommended that customers use GCOS or the manual grid application available within the Affymetrix GeneChip Command Console (AGCC) software package (Figure 6.5). manual.book Page 59 Thursday, October 5, 2006 12:21 PM Chapter 6 | QC Tables and Graphs 59 Figure 6.5 Probe Cell Intensity View The display area can be adjusted using the arrow keys, while the zoom level is controlled by the + and - buttons. Moving the mouse over a cell in the image displays the features XY coordinate and intensity values. Selecting a region (click and drag the mouse) zooms the display to show the selected region. Right clicking on the image enables the following options: • Select a gray scale palette (Gray) • Select a color scale palette that goes from black to red to yellow to white (Heat) • Adjust the range for the color scale (Scale) • Save the image in PNG format • Copy the image to the clipboard (Copy to clipboard) A particular file may appear solid black when you first open the Probe Cell Intensity View. Use the View → Zoom feature to zoom in on a particular area of investigation. manual.book Page 60 Thursday, October 5, 2006 12:21 PM 60 Affymetrix® Expression Console™ Software v.1.0 – User Guide Signal Histogram The signal histogram (Graph → Signal Histogram) is enabled when the CHP files are selected in the study window (Figure 6.6). This graph displays a histogram plot of the signal values for the selected CHP files in a box or line graph. A legend of the input CHP files is displayed in the upper right corner of the histogram. Figure 6.6 Signal Histogram - 3' IVT with MAS5 algorithm A right click popup menu provides the following options for the user: • Save the graphic as a PNG file (Save as PNG) • Save the underlying data used from the histogram to a TXT file. (Save data as TXT) • Copy the image to the clipboard (Copy to clipboard) • Use bars in the histogram to display the values (Bars only) • Use lines in the histogram to display the values (Lines only) • Use bars and lines to display the values (Bars + Lines) • Edit the scale of the Y axis (Scale) manual.book Page 61 Thursday, October 5, 2006 12:21 PM Chapter 6 | QC Tables and Graphs 61 A slider just below the histogram allows you to adjust the zoom factor of the X axis and a scroll bar allows you to adjust the visible region of the graph. Box Plot The box plot graph shows a standard box plot of data from either CHP or CEL files. A general rule to use when examining box plots is to look for individual arrays that are dramatically different from the others and most importantly from other replicates in the same group. A right-click popup menu is provided for all of the box plots containing: • Save the graphic as a PNG file (Save as PNG) • Save the data to a TXT file (Save data as TXT) • Copy the image to the clipboard (Copy to clipboard) A slider is provided on the X axis to adjust the zoom factor. This is the number of files to display on the X axis. Box Plot – Probe Cell Intensity The probe cell intensity box plots are generated from CEL file probe cell intensity values. This graph is generated by selecting Graph → Box Plot - Probe Cell Intensity. The probe cell intensity creates a box plot of the probe intensity values for each array. Probe cell intensities are prior to analysis/summarization and have not been normalized; therefore, some differences in the distributions are to be expected. At the feature intensity level, the Exon Arrays have about 6 million probes; some delays in the generation of these graphs are expected. manual.book Page 62 Thursday, October 5, 2006 12:21 PM 62 Affymetrix® Expression Console™ Software v.1.0 – User Guide Figure 6.7 Box Plot - Probe Cell Intensity - 3' IVT Expression Array Relative Probe Cell Intensity The distribution of the ratio of the intensity of each probe to the median probe intensity across all of the selected arrays is summarized in the Box Plot - Relative Probe Cell Intensity. Therefore, the plot compares the distribution of intensities on each array to the median probe intensity value for the group. As such, it is a good way to identify arrays with divergent probe intensity distributions relative to the other arrays in the study. manual.book Page 63 Thursday, October 5, 2006 12:21 PM Chapter 6 | QC Tables and Graphs Figure 6.8 Box Plot - Relative Probe Cell Intensity - 3' IVT Expression Array Figure 6.9 Box Plot - Relative Probe Cell Intensity - Exon Array 63 manual.book Page 64 Thursday, October 5, 2006 12:21 PM 64 Affymetrix® Expression Console™ Software v.1.0 – User Guide Box Plot – Signal The Signal Box Plot is generated from summarized probe set signal values in CHP files. This graph is generated by selecting the Graph → Box Plot - Signal menu item. Figure 6.10 Box Plot - Signal - 3' IVT Expression Array Figure 6.11 Box Plot - Signal - Exon Array manual.book Page 65 Thursday, October 5, 2006 12:21 PM Chapter 6 | QC Tables and Graphs 65 Box Plot – Relative Signal The Relative Signal Box Plot summarizes the distribution of the ratio of signal for each probe set to the median probe set signal across all the selected arrays. Therefore, the plot compares the distribution of probe set Signal values on each array to the median array for the group. As a result, it is a good way to identify arrays with divergent signal distributions relative to the other arrays in the study. Figure 6.12 Box Plot - Relative Signal - 3' IVT Expression Array MvA Plot The MvA plot (Figure 6.13) is a comparison plot comparing M (magnitude of change) on the Y-axis versus A on the X-axis, where M = log(Signal array1) - log(Signal array2) and A (average log(Signal)). The Y-axis is displayed on a log2 scale with green threshold lines for +/- two-fold changes. The color coding of the plot indicates the density of probes represented by that data point. The following are displayed: manual.book Page 66 Thursday, October 5, 2006 12:21 PM 66 Affymetrix® Expression Console™ Software v.1.0 – User Guide • The title of the graph, which indicates the two CHP files used to create the MvA plot • The Pearson’s correlation (r2) value Figure 6.13 MvA Plot - 3' Expression Array with MAS5 algorithm Correlations The relationship between two variables is described by their correlation. Two standard statistical measures of linear correlation (Pearson’s and Spearman) are provided in the Expression Console software. These are used to compare the signal estimates or detection p-values between two arrays. To simplify the interpretation of this data, the correlation values are presented as a heat map (Figure 6.14). The Spearman test is used with fewer samples since it is a rankbased test. Pearson’s test is used when there are more data, which allows normalcy to be confirmed. manual.book Page 67 Thursday, October 5, 2006 12:21 PM Chapter 6 | QC Tables and Graphs 67 For all of the heat maps, a right-click popup menu enables the following: • The graphic to be saved in PNG format (Save as PNG) • The underlying r2 values to be saved as a TXT file (Save data as TXT) • The image to be copied to the clipboard (Copy to clipboard) • A dialog box to adjust the min/max values to use to map values to a color scale (Edit Scale) In addition, a slider is provided on either axis to adjust the zoom factor controlling the number of files to show on a given axis. A scroll bar is provided to adjust the visible region of the graph. Clicking on a cell in the graph displays the associated files in the lower left part of the window. The display includes the file names and the value. Pearson’s Correlation (Signal) Signal concordance is evaluated using the Pearson’s correlation coefficient (r2) value to compare the signal values in two CHP files. The heat map contains a pairwise comparison of the signal values from all the selected CHP files, where the r2 values have been converted into a pseudocolor scale. manual.book Page 68 Thursday, October 5, 2006 12:21 PM 68 Affymetrix® Expression Console™ Software v.1.0 – User Guide Figure 6.14 Pearson’s Correlation (signal) - 3' IVT Expression Array Pearson’s Correlation (Detection P-Value) P-value concordance is evaluated using the Pearson’s correlation coefficient (r2) to compare the p-values from two CHP files. The resulting heat map contains a pairwise comparison of the detection pvalues from all of the selected CHP files, where the r2 values have been converted into a pseudocolor scale. Only MAS5 and Exon level analyses calculate detection p-values to generate the graph associated with them. manual.book Page 69 Thursday, October 5, 2006 12:21 PM Chapter 6 | QC Tables and Graphs 69 Figure 6.15 Pearson’s Correlation (detection p-value) - 3' IVT Expression Array Spearman Rank Correlation (Signal) This is the Spearman rank correlation of the signal values between two CHP files. The heat map contains a pairwise comparison of the signal values from all the selected CHP files, where the Spearman r2 values have been converted into a pseudocolor scale. manual.book Page 70 Thursday, October 5, 2006 12:21 PM 70 Affymetrix® Expression Console™ Software v.1.0 – User Guide Figure 6.16 Spearman Rank Correlation (signal) - 3' IVT Expression Array Spearman Rank Correlation (Detection P-Value) This is the Spearman rank correlation of the p-values between two CHP files. P-values are only available in MAS5 CHP files and Exon Arrays analyzed at the exon level. The heat map contains a pairwise comparison of the detection p-values from all the selected CHP files, where the Spearman r2 values have been converted into a pseudocolor scale. Only MAS5 and Exon level analyses calculate detection p-values to generate the graph associated with them. manual.book Page 71 Thursday, October 5, 2006 12:21 PM Chapter 6 | QC Tables and Graphs 71 Figure 6.17 Spearman Rank Correlation (detection p-value) - 3' Expression Array Line Graphs Line graphs allow the user to graph metrics or signal values for specified groups of probe sets for selected CHP files. Report Metrics The report metrics is a line graph of the selected metrics (Graph → Line Graph - Report Metrics) calculated and stored in the header of the CHP. These include the algorithm parameters, global array metrics, and spike-in and housekeeping control values that are stored in the CHP file. The available metrics for graphing depend on the metrics available for the CHP files in the current study. Metrics not present in the CHP file cannot be graphed. manual.book Page 72 Thursday, October 5, 2006 12:21 PM 72 Affymetrix® Expression Console™ Software v.1.0 – User Guide Figure 6.18 Report Metrics - 3' Expression Array analyzed with MAS5 Probe List The probe list is a line graph of probe sets defined in a probe list file). Select Graph → Line Graph - Probe List, the Select the probe list window opens (Figure 6.19). When you select the probe list of interest, Expression Console software creates a line graph of the probe sets (Figure 6.20). manual.book Page 73 Thursday, October 5, 2006 12:21 PM Chapter 6 | QC Tables and Graphs Figure 6.19 Select the probe list window Figure 6.20 Probe List Line Graph - 3' Expression Array 73 manual.book Page 74 Thursday, October 5, 2006 12:21 PM 74 Affymetrix® Expression Console™ Software v.1.0 – User Guide Figure 6.21 Probe List Line Graph - Exon Array New Probe List The New Probe List window (Graph → New Probe List) enables users to create their own probe list by selecting the probe sets they want to investigate (Figure 6.23). manual.book Page 75 Thursday, October 5, 2006 12:21 PM Chapter 6 | QC Tables and Graphs Figure 6.22 New Probe List - 3' Expression Array Figure 6.23 New Probe List - Exon Array 75 manual.book Page 76 Thursday, October 5, 2006 12:21 PM 76 Affymetrix® Expression Console™ Software v.1.0 – User Guide Open Probe List To open a probe list: 1. Select Graph → Open Probe List (Figure 6.24), and select the probe list. Figure 6.24 Select the probe list 2. Click Open. The Probe List of interest opens for viewing (Figure 6.25). Figure 6.25 View the Probe List manual.book Page 77 Thursday, October 5, 2006 12:21 PM Chapter 6 | QC Tables and Graphs Delete a Probe List To delete a probe list: Select File → Utilities. 2. Select the Parameter Management tab (Figure 6.26). 1. In the Probe List section, select the probe list to be deleted. 4. Click Delete and then Close. 3. Figure 6.26 Parameter Management Tab - Delete a Probe List 77 manual.book Page 78 Thursday, October 5, 2006 12:21 PM 78 Affymetrix® Expression Console™ Software v.1.0 – User Guide manual.book Page 79 Thursday, October 5, 2006 12:21 PM Chapter 7 Exporting Data manual.book Page 80 Thursday, October 5, 2006 12:21 PM Chapter 7 manual.book Page 81 Thursday, October 5, 2006 12:21 PM 81 Exporting Data The Affymetrix® Expression Console™ software enables the user to export the data in tables and graphs to various text or pdf documents. Exporting Tables and Graphs to PDF All of the currently opened/displayed tables and graphs can be exported to a PDF format. Select Export → Export All Tables/ Graphs To PDF and browse to the chosen folder. Figure 7.1 Tables and graphs in PDF format manual.book Page 82 Thursday, October 5, 2006 12:21 PM 82 Affymetrix® Expression Console™ Software v.1.0 – User Guide Probe Set Results (pivot table) to TXT The Export → Export Probe Set Results (pivot table) to TXT menu item displays a table of probe set results from the CHP files in the study. 1. Select Edit → Probe Level Summarizations Report Options. The Probe Level Summarizations Report Options dialog box opens (Figure 7.2). 2. Select the options to be displayed in the probe set results (pivot table): - Signal - Detection p-value (MAS5 and PLIER workflows) - Detection (MAS5) - #Pairs (MAS5) - #Used Pairs (MAS5) Figure 7.2 Probe Level Summarization Report Options P-values are only available when using MAS5 summarization for 3' Expression Arrays and Exon Arrays analyzed at the exon level. No p-values are available for the gene level analysis irrespective of the summarization method chosen. 3. Select Export → Export Probe Set Results. manual.book Page 83 Thursday, October 5, 2006 12:21 PM Chapter 7 | Exporting Data Figure 7.3 Export Probe Set Results (pivot table) to TXT 83 manual.book Page 84 Thursday, October 5, 2006 12:21 PM 84 Affymetrix® Expression Console™ Software v.1.0 – User Guide A Report to GCOS RPT File The Expression Console software is capable of creating GCOS formatted RPT files for MAS 5.0 CHP files. However, they cannot be generated by the other applications. The summarization workflows offered (i.e., RMA and PLIER-based workflows) are created with the Expression Console software, but not other applications. Use the Export → Export Report to GCOS RPT File menu item to save the CHP file information for all checked files in the current study to RPT files. Each CHP file creates a RPT file in the same directory using the same base file name. Figure 7.4 Export Report to GCOS RPT File A Study to TXT File To export the Expression Console study table to TXT, right-click on the study window and select Export Study To TXT. Figure 7.5 Export Study to TXT manual.book Page 85 Thursday, October 5, 2006 12:21 PM Chapter 7 | Exporting Data A Table as TXT File After running an analysis, select Report → View Full Report (or alternatively, a user customized report selection). When the report displays (Figure 7.6), right-click on the report and select Export Table As TXT. Figure 7.6 Export table as text 85 manual.book Page 86 Thursday, October 5, 2006 12:21 PM 86 Affymetrix® Expression Console™ Software v.1.0 – User Guide Saving Images Images of the graphs and plots in Expression Console can be exported for use in other applications two ways: • The image can be copied to the clipboard by right-clicking the image and selecting Copy to clipboard. • Save the image as PNG by right-clicking on the graph and selecting Save As PNG (Figure 7.7) or select File → Save As PNG. The application prompts the user for a name and location to save the file. The Expression Console software allows you to save a graph as a PNG (Portable Network Graphic), which is an image format that enables the graphic to be easily moved into another application or document without loss of quality. Figure 7.7 Save the histogram as a PNG manual.book Page 87 Thursday, October 5, 2006 12:21 PM Chapter 8 Controls and Thresholds manual.book Page 88 Thursday, October 5, 2006 12:21 PM Chapter 8 manual.book Page 89 Thursday, October 5, 2006 12:21 PM 89 Controls and Thresholds This section describes how to identify and modify the controls for 3' Expression Arrays and how to set user-defined thresholds for both Exon and 3' Expression Arrays. (Currently, modification of the controls for Exon Arrays is not supported with the Expression Console™ software.) For guidance on how to set thresholds for individual metrics for 3' Expression Arrays, refer to the whitepaper Data Analysis Fundamentals, and for Exon Arrays, refer to the whitepaper Exon Arrays Quality Analysis. Report Controls To help researchers establish quality control processes for gene expression analyses, Affymetrix has developed several controls which allow researchers to monitor assay data quality. These include but are not limited to: • hybridization controls • labeling controls • internal control genes • algorithm parameters • algorithm outputs The Expression Console software provides functionality to display and highlight individual metrics and the arrays containing metrics outside of user-defined tolerances. In general, Affymetrix highly encourages users to create a running log of these parameters to monitor quality and potentially flag outlier samples. The QC functionality built into the Expression Analysis Software can help with this. Evaluation of particular samples should be based on the examination of all sample and array performance metrics in light of the history of the metrics performance in an individual tissue and array type. When analysis results are viewed within the software, the metrics for the report controls are compared to a set of user definable thresholds. Any results identified as being outside of the selected thresholds are tagged as Outside of Bounds. manual.book Page 90 Thursday, October 5, 2006 12:21 PM 90 Affymetrix® Expression Console™ Software v.1.0 – User Guide For 3' Expression Arrays, the Report Controls window (Figure 8.1) enables the user to modify the group of probe sets that constitute the spike and housekeeping controls. The probe array type is selected from the drop-down menu at the top of the window to modify the control probes for that probe array type only. The list of control probe sets on the array are displayed on the left side of the window. By default, the probe sets are filtered to show only the AFFX controls using the Filter button at the bottom. The filter is case sensitive. In order for the report to contain control information, the controls must be defined before the analysis is run. If additional or different controls are required, the analysis must be rerun following redefining the controls. Figure 8.1 3' IVT Expression Array Report Controls window manual.book Page 91 Thursday, October 5, 2006 12:21 PM Chapter 8 | Controls and Thresholds 91 Spike Controls The 20x Eukaryotic Hybridization Controls are spiked into the hybridization cocktail, independent of RNA sample preparation, and are therefore used to evaluate sample hybridization efficiency on gene expression arrays. The default spike controls are listed as: • AFFX-r2-Ec-BioB • AFFX-r2-Ec-BioC • AFFX-r2-Ec-BioD • AFFX-r2-P1-Cre BioB is at the level of assay sensitivity (1:100,000 complexity ratio) and should be called Present at least 70% of the time. BioC, BioD, and cre should always be called Present with increasing signal values. Internal Control Genes (Housekeeping Genes) Internal control genes, or housekeeping genes, are gene transcripts that are constitutively expressed in most samples. These transcripts serve as internal controls, are useful for monitoring the quality of the starting sample, and are subject to any variability in the labeling of the sample and hybridization to the array, for 3' Expression Arrays. For Human, Mouse, and Rat 3' Expression Array types, β-actin and GAPDH are used to assess RNA sample and assay quality. Specifically, the signal values of the 3' probe sets for actin and GAPDH are most informative and, therefore, as a general recommendation, these should be compared to the signal values of the corresponding 5' probe sets. The ratio of the 3' probe set to the 5' probe set should generally be less than 3 for the One Cycle Labeling protocol. For more details on interpreting the housekeeping genes, see the whitepaper, Data Analysis Fundamentals at www.affymetrix.com. The Housekeeping controls are: • GAPDH • β-Actin manual.book Page 92 Thursday, October 5, 2006 12:21 PM 92 Affymetrix® Expression Console™ Software v.1.0 – User Guide Control probe set names are unique to each array design. Labeling Controls Poly-A RNA controls can be used to monitor the entire target labeling process. Each eukaryotic GeneChip probe array contains probe sets from several B. subtilis genes that are absent in eukaryotic samples (lys, phe, thr, and dap). These Poly-A RNA controls are in vitro synthesized, and the polyadenylated transcripts for the B. subtilis genes are premixed at staggered concentrations. The Poly-A controls can be spiked into a complex RNA sample, carried through the sample preparation process, and evaluated like the internal control genes. The GeneChip® Poly-A RNA Control Kit (P/N 900433) contains the following four exogenous, premixed control spikes: • Lys: AFFX-r2-Bs-lys (1:100,000) • Phe: AFFX-r2-Bs-phe (1:50,000) • Thr: AFFX-r2-Bs-thr (1:25,000) • Dap: AFFX-r2-Bs-dap (1:6,667) All of the Poly-A controls should be called Present with increasing signal values in the order of lys, phe, thr, dap. Identify / Remove Controls Most 3' Expression Array types have a set of controls identified by default. Follow the steps below to change or set the report controls for the specified array type. Select , or Edit → 3’ Expression Report Controls. The Report Controls dialog box opens (Figure 8.2). 2. Choose the array type from the Select Array Type drop-down menu at the top of the dialog box. 1. manual.book Page 93 Thursday, October 5, 2006 12:21 PM Chapter 8 | Controls and Thresholds 93 Figure 8.2 3’ Expression Report Controls dialog box In the probeset box , located on the left side of the window, highlight a probe set ID and select the 5'> to make that probe set the 5' probe, M> the middle probe, or 3'> the 3' probe. The upper box is used to define the spike-in control probe set IDs and the lower box the housekeeping control probe set IDs. For each probe set ID, a minimum of two probesets must be identified (for example, a 3' and a 5'). 4. After you identify the 3', M, or 5' probe sets, select the >> button to load it into the control list. 5. Add or remove probe set IDs by clicking on the appropriate button or . Control metrics for the probe set IDs contained in the two boxes of the right hand side of the window will be included in the report for any subsequent analyses performed. 3. manual.book Page 94 Thursday, October 5, 2006 12:21 PM 94 Affymetrix® Expression Console™ Software v.1.0 – User Guide To view all available probesets, clear the text box next to the Filter button, then click the Filter button. A list of all the probesets on the array are displayed. Or, to view specific probeset controls, type the identifying letters and/or numbers in the text field and click the Filter button. Delete Report Controls To delete all of the controls for an array type: Select File → Utilities. 2. Select the Parameter Management tab (Figure 8.3). 3. In the Report Controls section, select the array type. 4. Click Delete to remove the controls for that array type. 1. Figure 8.3 Parameter Management Tab - Delete Report Thresholds manual.book Page 95 Thursday, October 5, 2006 12:21 PM Chapter 8 | Controls and Thresholds 95 Report Thresholds The report thresholds (Figure 8.4) define boundary conditions against a set of metrics that are computed during the analysis step. These metrics include report controls and statistics based on probe set Signal and Detection calls. Fixed thresholds are applied to selected metrics in one of three ways: • Compare metric 1 to threshold value – Example, compare the Percent Present calls to a fixed value to confirm that the metric passes a minimum or maximum value. • Compare metric 1 to metric 2 – Example, compare two items, such as the average signal value for the Cre control against the average signal value for the bioD control to confirm that the detected signals are consistent with the known relative abundance in the sample. • Compare metric 1 to average of metric 1 across arrays by standard deviation – Example, compare the spread of values such as scaling factors across all selected arrays of the same type. The calculation uses the mean and standard deviation to describe the spread of the data being compared. The final in-bounds range is then determined as the mean plus and minus a user defined factor times the standard deviation times the user selected multiplier. manual.book Page 96 Thursday, October 5, 2006 12:21 PM 96 Affymetrix® Expression Console™ Software v.1.0 – User Guide Figure 8.4 Report Thresholds window Define / Modify Report Thresholds The Report Thresholds listed are default settings and can be used without further modification on the part of the user. Report thresholds can be changed at anytime and the tables will be dynamically updated to reflect the changes once they have been saved. If a desired metric is not present in the report for a 3' Expression Array type, the 3' Expression Controls must first be modified and then the analysis must be run. To set report thresholds: 1. Select Edit → Report Thresholds, , or Toolbox → Configurations → Specify report thresholds to open the window. manual.book Page 97 Thursday, October 5, 2006 12:21 PM Chapter 8 | Controls and Thresholds 2. 97 Select the comparison type: - Compare metric 1 to threshold value - Compare metric 1 to metric 2 - Compare metric 1 to average of metric 1 across arrays by standard deviation Select the comparison operator – less than, greater than, or equal to. 4. Select the item/value/range multiplier to compare to. The availability of this option is dependent on the comparison type you selected. 5. To delete a threshold item, click in the box so that it is highlighted in yellow, then click the Remove button. 3. To set thresholds for algorithm metrics, a CHP file analyzed with that algorithm must be loaded into the study window. If a CHP is not selected, the software is unable to determine the metrics that are available for thresholding; therefore, one cannot add thresholds for new metrics without first selecting a CHP file. Delete Report Thresholds To delete all of the thresholds for an array type: Select File → Utilities. 2. Select the Parameter Management tab (Figure 8.5). 3. In the Report Thresholds section, select the array type. 4. Click Delete to remove the report thresholds for that array type. 1. manual.book Page 98 Thursday, October 5, 2006 12:21 PM 98 Affymetrix® Expression Console™ Software v.1.0 – User Guide Figure 8.5 Parameter Management Tab - Delete Report Thresholds manual.book Page 99 Thursday, October 5, 2006 12:21 PM Chapter 9 Advanced Analysis manual.book Page 100 Thursday, October 5, 2006 12:21 PM Chapter 9 manual.book Page 101 Thursday, October 5, 2006 12:21 PM 101 Advanced Analysis Each analysis workflow provided in the Expression Console™ application has several parameters, which can be changed from the defaults provided by Affymetrix. This section describes how to change the parameters and save them to a configuration file. The file is then saved to the user profile for use in later analyses. Since Exon and 3' Expression Arrays have different algorithm options available, each array type has its own custom analysis configuration setup dialog. Affymetrix Power Tools (www.affymetrix.com Network/Analysis/Affymetrix Power → Support/ Tools) is a command line application that provides the ability to further customize algorithm parameters and is intended for use by more experienced users. Developers’ Affymetrix Power Tools is also available as source code and binaries for several different computing platforms. These can be found at the same location on the Affymetrix web site. 3' Expression Array Configurations This section describes how to set the algorithm parameters for MAS 5.0, PLIER, and RMA for use with 3' Expression Arrays. For additional information about setting MAS 5.0 parameters, see the white paper, Data Analysis Fundamentals and the Statistical Algorithms Reference Guide, on the Affymetrix web site. For explanations of the different normalization and background correction strategies for PLIER and RMA, see Appendix A, Algorithms. manual.book Page 102 Thursday, October 5, 2006 12:21 PM 102 Affymetrix® Expression Console™ Software v.1.0 – User Guide New Configuration – MAS5 1. Select Analysis → Advanced Expression Configurations. The Custom Analysis Algorithms dialog box opens (Figure 9.1). Figure 9.1 Advanced Analysis - MAS5 Custom Analysis Algorithms dialog box 2. Select the MAS5 algorithm and click OK. The New MAS5 Parameter settings dialog box opens (Figure 9.2). Figure 9.2 New MAS5 Parameter Settings dialog box 3. Select the settings and click Save. manual.book Page 103 Thursday, October 5, 2006 12:21 PM Chapter 9 | Advanced Analysis 4. 103 To change the detection parameters or mask out a subset of probes, click the Advanced button (Figure 9.2). The window expands to include Detection Parameters (Figure 9.3). Figure 9.3 New MAS5 Parameter Settings - Advanced window 5. Figure 9.4 Scale to select sets Enter new parameters and click Save. The Save window opens. If you select the Scale to select sets radio button, the Probe Set MSK file field is highlighted (Figure 9.4). Use the browse button to find the appropriate MSK file. manual.book Page 104 Thursday, October 5, 2006 12:21 PM 104 Affymetrix® Expression Console™ Software v.1.0 – User Guide To create a new MSK file, see Create a New Mask (MSK) File, on page 109. 6. Enter a name for this collection of parameters (Figure 9.5), and click Save. Figure 9.5 Name the new configuration 7. To analyze array data with the new parameters, select Analysis → Execute Advanced Configuration, and select the appropriate file name (Figure 9.6). manual.book Page 105 Thursday, October 5, 2006 12:21 PM Chapter 9 | Advanced Analysis 105 Figure 9.6 Execute Advanced Configurations A dialog box opens to announce analysis initiation (Figure 9.7). Figure 9.7 Advanced Analysis Suffix Identifier 8. Enter a suffix for run identification, if needed, and click OK to start the analysis. manual.book Page 106 Thursday, October 5, 2006 12:21 PM 106 Affymetrix® Expression Console™ Software v.1.0 – User Guide New Configuration – PLIER Workflow Select Analysis → Advanced Expression Configurations. The Custom Analysis Algorithms dialog box opens (Figure 9.8). 1. Figure 9.8 Advanced Analysis - PLIER Custom Analysis Algorithms dialog box 2. Select the PLIER workflow and click OK. The New custom workflow for PLIER dialog box opens (Figure 9.9). Figure 9.9 Advanced Configuration for PLIER 3. Select the settings and click Save. A dialog box opens to announce analysis initiation (Figure 9.10). manual.book Page 107 Thursday, October 5, 2006 12:21 PM Chapter 9 | Advanced Analysis 107 Figure 9.10 Advanced Analysis Suffix Identifier 4. Enter a suffix for run identification, if needed, and click OK to start the analysis. New Configuration – RMA 1. Select Analysis → Advanced Expression Configurations. The Custom Analysis Algorithms dialog box opens 2. Select the RMA algorithm and click OK. The New custom configuration for RMA dialog box opens (Figure 9.11). Figure 9.11 Advanced Configuration for RMA manual.book Page 108 Thursday, October 5, 2006 12:21 PM 108 Affymetrix® Expression Console™ Software v.1.0 – User Guide The RMA Background Correction field is automatically applied to the PM-only probes when selecting RMA. Since RMA has its own background correction algorithm, Expression Console software does not allow this to be modified. 3. Select the settings and click Save. A dialog box opens to announce analysis initiation (Figure 9.12). Figure 9.12 Advanced Analysis Suffix Identifier 4. Enter a suffix for run identification, if needed, and click OK to start the analysis. Delete an Advanced Configuration To delete an advanced configuration: Select File → Utilities. 2. Select the Parameter Management tab (Figure 9.13). 3. In the appropriate Configuration section, select the configuration to be deleted. 1. 4. Click Delete. manual.book Page 109 Thursday, October 5, 2006 12:21 PM Chapter 9 | Advanced Analysis 109 Figure 9.13 Parameter Management Tab - Delete Configuration MASK Files Create a New Mask (MSK) File The Expression Console application uses MSK files for either setting the scaling factor or excluding or masking certain user-selected probe pairs from an expression analysis. The Selected Probe Sets scaling option, in the advanced configuration for the MAS5 algorithm, adjusts the trimmed mean signal of the selected probe sets on a probe array to the user-specified target signal value. Expression Console software utilizes user-selected probe sets (specified by a Scale Factor mask file) to calculate the trimmed mean signal and derive the scale factor for the probe array so that Target Signal = Scale Factor x Trimmed Mean Signal of selected probe sets. For a selected set of arrays, predefined msk files are available from Affymetrix at www.affymetrix.com. The other use for mask files is to allow user-selected probe pairs to be excluded or masked from an expression analysis. manual.book Page 110 Thursday, October 5, 2006 12:21 PM 110 Affymetrix® Expression Console™ Software v.1.0 – User Guide For human, mouse, and rat, a list of 100 normalization genes is provided to the user. To create a MASK file: 1. Open Analysis → Advanced Expression Configurations and select New Mask (MSK) File (Figure 9.14). Figure 9.14 Advanced Expression Configurations - New Mask File The Select the Array Type dialog box opens. manual.book Page 111 Thursday, October 5, 2006 12:21 PM Chapter 9 | Advanced Analysis Figure 9.15 Select the Array Type 2. Select the array type and click Open. Figure 9.16 New MASK (MSK) File dialog box A window opens displaying all probe sets for that array type (Figure 9.17). 111 manual.book Page 112 Thursday, October 5, 2006 12:21 PM 112 Affymetrix® Expression Console™ Software v.1.0 – User Guide Figure 9.17 Probe Sets for Expression Array types Double-click on the probe sets of interest to highlight each set. When you click Save, an MSK file of the selected probe sets is created. 4. Make changes if necessary and click Save, or click the box to exit. 3. Open Mask (MSK) File Mask Files are specific for the probe array type. The Expression Console software will not open a mask file that is incompatible with the currently selected probe array type. To view or edit an existing MSK file: 1. Select Analysis → Advanced Expression Configurations → Open Mask (MSK) File. The Select the MSK file dialog box opens. manual.book Page 113 Thursday, October 5, 2006 12:21 PM Chapter 9 | Advanced Analysis 113 Figure 9.18 Select the MSK File dialog box Select the MSK file and click Open. The selected MSK file opens displaying the selected probe sets, which are highlighted. 3. Make changes if necessary and click Save, or click the box to exit. 2. MAS 5.0 CHP Files in the Command Console Format By default, the Expression Console software stores the MAS5 analysis results (Signal, Detection, Detection P-Value, # of Probe Pairs, and # of Probe Pairs Used) in a CHP file using the new Affymetrix GeneChip® Command Console™ (AGCC) format. This new format provides additional features such as : • An embedded unique file identifier • A copy of the CEL and DAT headers (if the input CEL/DAT files are also in the AGCC file format) manual.book Page 114 Thursday, October 5, 2006 12:21 PM 114 Affymetrix® Expression Console™ Software v.1.0 – User Guide The new file features provide the ability to trace the file’s lineage independent of the file name. To configure the software to write the MAS5 CHP files in the newer AGCC format: • Select Analysis → Advanced Expression Configurations → Save MAS5 CHP files in Command Console format (Figure 9.19). Figure 9.19 Save MAS5 CHP files in Command Console format manual.book Page 115 Thursday, October 5, 2006 12:21 PM Chapter 9 | Advanced Analysis 115 MAS 5.0 CHP Files in the GCOS Format To support the transition of software applications to use the newer Affymetrix file parsers (those capable of reading both the older CHP file format as well as the newer AGCC CHP file format), the Expression Console software has the ability to write the MAS5 analysis results to the older GCOS format CHP file. To configure the software to write MAS5 CHP files in the older GCOS format: • Select Analysis → Advanced Expression Configurations → Save MAS5 CHP files in GCOS format (Figure 9.20). Figure 9.20 Save MAS5 CHP files in GCOS format For more information about the Command Console (AGCC) file format and C++/Java parsers that read both GCOS and Command Console files, see www.affymetrix.com; then go to Support/Developer/ Fusion. Analysis results from PLIER and RMA cannot be stored in GCOS formatted CHP files. manual.book Page 116 Thursday, October 5, 2006 12:21 PM 116 Affymetrix® Expression Console™ Software v.1.0 – User Guide Advanced Configuration Exon Analysis 1. Select Analysis → Advanced Exon Configurations. The Advanced Exon Setup dialog box opens (Figure 9.21). Figure 9.21 Advanced Exon Setup dialog box A. Select the Analysis Type: The setup box in the upper left-hand corner enables users to select their analysis level for the configuration. For details on the different analysis levels, see the whitepaper, Exon Probeset Annotations and Transcript Cluster Groupings, at www.affymetrix.com. In the Expression Console software, each level is paired with a default set of library files for that analysis and array type provided by Affymetrix. To use other versions of the probe group file, the intensity layout file, background probes, or the manual.book Page 117 Thursday, October 5, 2006 12:21 PM Chapter 9 | Advanced Analysis 117 QC probeset file, browse to the desired file on the appropriate line of the library file section at the bottom of the screen. It is always possible to reload the default files for an analysis by clicking the Default Files button. To use a different metaprobeset file for Gene level analysis or probeset file for Exon analysis, select Other and browse to the file you wish to use in the library file section. B. Select the Summarization Method. For more information about the summarization methods, see the algorithm details in Appendix A, Algorithms and the whitepapers, Gene Signal Estimates from Exon Arrays and Guide to Probe Logarithmic Intensity Error (PLIER) Estimation, at www.affymetrix.com. C. Select the Background Correction. For more information about background correction, see the whitepaper, Exon Background Correction, at www.affymetrix.com. D. Select the Normalization Method. For more information about normalization, see the whitepaper, Gene Signal Estimates from Exon Arrays, at www.affymetrix.com. E. Confirm the library file selections in the bottom portion of the window. Note that selecting a confidence level in the Analysis Type automatically selects the appropriate file from Affymetrix. If you want to use your own custom meta-probeset file or probelist file, select Other in the analysis level to activate the browse feature for that line. 2. Click Save. manual.book Page 118 Thursday, October 5, 2006 12:21 PM 118 Affymetrix® Expression Console™ Software v.1.0 – User Guide Figure 9.22 Name the new configuration Enter a name for this configuration (Figure 9.22), and click Save. 4. To analyze array data with the new parameters, 3. Select Analysis → Execute Advanced Configuration. B. Select the appropriate file name. A. manual.book Page 119 Thursday, October 5, 2006 12:21 PM Chapter 9 | Advanced Analysis 119 Figure 9.23 Execute Advanced Configurations 5. To delete an advanced configuration: A. Select File → Utilities. 1) Select the Parameter Management tab (Figure 9.24). Figure 9.24 Parameter Management Tab - Delete Configuration 2) In the appropriate Configuration section, select the configuration to be deleted. manual.book Page 120 Thursday, October 5, 2006 12:21 PM 120 Affymetrix® Expression Console™ Software v.1.0 – User Guide 3) Click Delete. A dialog box opens to announce analysis initiation. Figure 9.25 Advanced Analysis Suffix Identifier B. Enter a suffix for run identification, if needed, and click OK to start the analysis. manual.book Page 121 Thursday, October 5, 2006 12:21 PM Chapter 10 Analysis Scripts manual.book Page 122 Thursday, October 5, 2006 12:21 PM Chapter 10 manual.book Page 123 Thursday, October 5, 2006 12:21 PM 123 Analysis Scripts To streamline the analysis workflow within the Expression Console™ software, the user can create an analysis script. In the analysis script, the user selects and saves a specific analysis configuration and a set of graphs and tables that are automatically generated at the end of each analysis. Expression Console analysis scripts include the following features: • Data table can be exported as a tab-delimited TXT file • Selected graphs and tables can be exported as PDF files • User-defined scripts are associated with a user profile Scripts are associated with the user profile open at the time the script is created. Analyses that have array-specific details contained within them, such as custom MAS 5.0 and custom Exon analysis scripts, are array-type specific. Analysis Script Creation To create an analysis script: 1. Select Analysis → Exon Analysis Scripts → New Script. The New Script dialog box (Figure 10.1) displays. manual.book Page 124 Thursday, October 5, 2006 12:21 PM 124 Affymetrix® Expression Console™ Software v.1.0 – User Guide Figure 10.1 New Script dialog box Select the workflow from the drop-down menu. If a study is open, then the analysis methods in the list are restricted by the array type in the study. 3. Select the table and graphs to be generated after the completion of the analysis from the relevant sections. 4. If desired, select the probe set results table for a tab-delimited export of the expression data. 2. 5. If desired, select the PDF option to save a PDF of the selected graphs and tables. The application prompts the user to name the exported files at the time of the analysis script execution. manual.book Page 125 Thursday, October 5, 2006 12:21 PM Chapter 10 | Analysis Scripts Analysis Script Deletion To delete an analysis script: Select File → Utilities. 2. Select the Parameter Management tab (Figure 10.2). 3. In the Scripts section, select the script to be deleted. 4. Click Delete. 1. Figure 10.2 Parameter Management Tab - Delete Analysis Script 125 manual.book Page 126 Thursday, October 5, 2006 12:21 PM 126 Affymetrix® Expression Console™ Software v.1.0 – User Guide Analysis Script Execution To execute an analysis script: Open a study. 2. Select the CEL files for analysis. 3. Select Analysis → Execute Analysis Script. The Select script menu box display (Figure 10.3). 1. Figure 10.3 Select Script menu If the script calls for any exports, the user is prompted for names and locations for saving files. manual.book Page 127 Thursday, October 5, 2006 12:21 PM Appendix A Algorithms manual.book Page 128 Thursday, October 5, 2006 12:21 PM Appendix A manual.book Page 129 Thursday, October 5, 2006 12:21 PM 129 Algorithms This appendix briefly describes the MAS 5.0, RMA, and PLIER algorithms as they are used to analyze data in the Affymetrix® Expression Console™ software. References and links to various publications, which describe the MAS 5.0, RMA, and PLIER algorithms in detail, are given. See a Comparison of Algorithms for a sideby-side comparison of the assumptions, advantages and disadvantages of using each individual algorithm. MAS 5.0 Algorithm The MAS 5.0 algorithm uses the Tukey’s biweight estimator to provide a robust mean Signal value and the Wilcoxon’s rank test to calculate a significance or p-value and Detection call for each probe set. Background estimation is provided by a weighted average of the lowest 2% of the feature intensities. Mismatch probes are utilized to adjust the perfect match (PM) intensity. Linear scaling of the feature level intensity values, using the trimmed mean, is the default to make the means equal for all arrays being analyzed. The MAS 5.0 algorithm (also known as the Statistical Algorithm) analyzes each array independently. As a result, individual probespecific affinities can not be considered and the ability to detect small changes between experiment and control samples is reduced in comparison to either RMA or PLIER. The primary use of the MAS 5.0 algorithm is to obtain a quick report regarding the performance of the arrays and to identify any obvious problems before submitting the final set of arrays to one of the multichip analysis methods (RMA, PLIER). For a more detailed description of the MAS 5.0 algorithm, see the Statistical Algorithms Reference Guide at www.affymetrix.com; then go to Support/By Support Type/Technical Documentation/Technical Notes/ Software. manual.book Page 130 Thursday, October 5, 2006 12:21 PM 130 Affymetrix® Expression Console™ Software v.1.0 – User Guide RMA Algorithm The Robust Multichip Analysis (RMA) algorithm fits a robust linear model at the probe level to minimize the effect of probe-specific affinity differences. This approach: • Increases sensitivity to small changes between experiment and control samples. • Minimizes variance across the dynamic range, but does compress calculated fold change values. RMA consists of three steps: 1. Background adjustment 2. Quantile normalization 3. Summarization This is a multi-chip analysis approach. Therefore, all arrays intended for comparison should be included together in the summarization step. For a more detailed description of the RMA algorithm, see the publication, Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data, Biostatistics, April 2003; Vol. 4; Number 2: 249–264. manual.book Page 131 Thursday, October 5, 2006 12:21 PM Appendix A | Algorithms 131 PLIER Algorithm The Probe Logarithmic Intensity Error Estimation (PLIER) algorithm method produces a signal by accounting for experimentally observed patterns in probe behavior and handling error at the appropriately low and high signal values. Similar to RMA, the PLIER algorithm also utilizes data across all arrays submitted for analysis to minimize the effect of probe-specific affinity differences. Unlike RMA, which uses a simple global background correction approach, the standard PLIER configuration uses a probe-specific background providing a higher degree of accuracy at the cost of increased signal variance. Exon Arrays use a more advanced and efficient background correction method utilizing surrogate mismatched probes. (See the technical note, GeneChip® Exon Array Design at www.affymetrix.com; then go to Support/By Support Type/Technical Documentation/Technical Notes/Software). Signal variance can be addressed following the PLIER analysis by applying a variance-stabilizing data transformation. A simplistic but effective approach is to add a value of 16 to each and every signal value (PLIER + 16 algorithm). This is a multi-chip analysis approach. Therefore, all arrays intended for comparison should be included together in the summarization step. For a more detailed description of the PLIER algorithm, see Guide to Probe Logarithmic Intensity Error (PLIER) Estimation at www.affymetrix.com; then go to Support/By Support Type/Technical Documentation/Technical Notes/Software). manual.book Page 132 Thursday, October 5, 2006 12:21 PM 132 Affymetrix® Expression Console™ Software v.1.0 – User Guide Comparison of Algorithms Table A.1 A comparison of the MAS5, RMA, and PLIER algorithms used with the Expression Console™ application Algorithm MAS5 RMA PLIER Advantages Disadvantages Single-array algorithm is independent of other data in the dataset: Not as sensitive as either RMA or PLIER to small changes in target abundance: • • Less computationally intensive than RMA or PLIER • Conservative • Smooth downweighting of outliers • Positive output values • Minimal bias Minimizes the variance seen across the arrays • Higher reproducibility of signal over singlearray analyses • Good differential change detection • Variance stable on log scale Assumptions Limited ability to adjust for probe specific affinity differences • Singlearray analysis • Multiplicative error • Unstable variance at low end • • Lower precision in signal calculation Signal is adjusted by the mis match probes (PM–MM) • Background imputed to handle negative differences • Multiplearray analysis • Multiplicative error • PM (perfect match) only • Single background is used to adjust each intensity Not as sensitive as PLIER in the ability to detect small fold changes • In cases where feature intensities disagree, may have more than one solution (mitigated by median pol ish) • Positive bias contributed to signal values • Compresses fold changes for low intensity probe sets Ability to detect small fold changes: More variance in indivdual signals than seen with RMA • Higher reproducibility of signal (lower coefficient of variation) with out loss of accuracy • Computationally intensive • Multiplearray analysis • Mixed error model PM–MM, PM only, etc. Higher sensitivity to changes in abundance for targets near back ground In cases where feature intensities disagree, may have more than one solution (mitigated by median polish) • • • Multiple background options • Performance relative to amount of model data provided • Smoothly handles intensities below background • Variance not stable on log scale • Dynamic weighting of the most informative probes in an experi ment to determine signal • High degree of accuracy for signal and fold change calculations • Lack of bias manual.book Page 133 Thursday, October 5, 2006 12:21 PM Appendix B Algorithm Parameters and Outputs manual.book Page 134 Thursday, October 5, 2006 12:21 PM Appendix B manual.book Page 135 Thursday, October 5, 2006 12:21 PM 135 Algorithm Parameters and Outputs The Expression Console™ software displays column headings for individual parameters and outputs in the Report window (Report → View Full Report), for each of the 3' Expression Array algorithms (MAS 5.0, RMA, PLIER). This appendix gives column heading definitions. MAS 5.0 Column Headings Threshold Test The minimum number of probe pairs a probe set must have in order for the probe set data to be included in the calculation of the report statistics HZ Number of horizontal zones used in background subtraction VZ Number of vertical zones used in background subtraction BG Minimum, maximum, average, and standard deviation of the background intensity calculated for the probe array Alpha1 (α1) Alpha2 (α2) Tau (τ) Noise (Raw Q) Significance level for the detection p-value in an analysis – Alpha1 is a user-modifiable parameter that is set in the New MAS5 Parameter Settings dialog box (Analysis → Advanced Expression Configurations → New Configuration). If the probe set detection p-value is < alpha1, the call is present. Second significance level for the detection p-value in an analysis and a user-modifiable parameter set in the same dialog box as Alpha1 – If the probe set detection p-value is greater than or equal to alpha2, the call is absent. If alpha1 is less than or equal to the Detection p-value < alpha2, the call is marginal. A user-modifiable parameter, ideally set to a value that is a little larger than the median of the discrimination scores of the probe sets whose targets are absent, to avoid false detected calls The degree of pixel-to-pixel variation among the probe cells used to calculate the background manual.book Page 136 Thursday, October 5, 2006 12:21 PM 136 Affymetrix® Expression Console™ Software v.1.0 – User Guide Scale Factor (SF) The scale factor specified in the Scaling tab of the Expression Analysis Settings dialog box or computed by the algorithm Norm Factor (NF) The normalization factor computed by the algorithm Scale Mask Contains the name and location of the mask file if one was used for the analysis RawQ Baseline Noise – The degree of pixel-to-pixel variation among the probe cells used to calculate the background in the baseline probe array BG Avg The average background intensity calculated for the probe array BG Std The standard deviation background intensity calculated for the probe array BG Max The maximum background intensity calculated for the probe array BG Min The minimum background intensity calculated for the probe array Noise Avg The average noise calculated for the probe array Noise Std The standard deviation noise calculated for the probe array Noise Max The maximum noise calculated for the probe array Noise Min The minimum noise calculated for the probe array Corner+ Avg The average cell intensity for the sense probe cells used in the grid alignment process Corner+ Count The corner count for the sense probe cells used in the grid alignment process Corner- Avg The average cell intensity of the antisense probe cells used in the grid alignment process Corner- Count The corner count of the antisense probe cells used in the grid alignment process Central- Avg The average cell intensity for the nine probe cells that comprise the cross at the center of an antisense probe array Central- Count The central count for the nine probe cells that comprise the cross at the center of an antisense probe array manual.book Page 137 Thursday, October 5, 2006 12:21 PM Appendix B | Algorithm Parameters and Outputs 137 #Probe Sets Exceeding Probe Pair Threshold The number of probe sets that exceed the probe pair threshold Probe Pair Threshold The minimum number of probe pairs a probe set must have in order for the probe set data to be included in the calculation of the report statistics Control Direction The direction (sense or antisense) of the target (sample) #P The number of probe sets present %P The percent of probe sets present Signal(P) The average signal for the probe sets defined as present #M The number of probe sets whose detection is marginal %M The percent of probe sets whose detection call is marginal Signal(M) The average signal for probe sets whose detection call is marginal #A The number of probe sets absent %A The percent of probe sets called present that are absent Signal(A) The average signal for the probe sets defined as absent Signal(All) The average signal for all probe sets on the array manual.book Page 138 Thursday, October 5, 2006 12:21 PM 138 Affymetrix® Expression Console™ Software v.1.0 – User Guide Column Headings for RMA and PLIER Metrics are dependent on the analysis algorithm, array type, and the level; therefore, not all of the metrics are always present in the Full Report. Raw Corner+ Avg The average cell intensity prior to any background for the sense probe cells adjustment used in the grid alignment process Raw Corner+ Count The count of the number of sense probe cells used in grid alignment Raw Corner- Avg The average cell intensity prior to any background for the antisense probe cells adjustment used in the grid alignment process Raw Corner- Count The count of the number of antisense probe cells used in grid alignment Raw Central- Avg The average cell intensity before background adjustment for the sense probe cells that comprise the cross at the center of the array Raw Central- Count The average cell intensity before background adjustment for the anti sense probe cells that comprise the cross at the center of the array Spike-probeID-signal The signal for the probe sets that correspond to the labeling and hybridization spike controls pm_mean The mean signal for all of the PM probes on the array mm_mean The mean signal for all of the MM probes on the array bgrd_mean The average signal for the probes used to calculate the background manual.book Page 139 Thursday, October 5, 2006 12:21 PM Appendix B | Algorithm Parameters and Outputs 139 pos_vs_neg_auc The area under the curve (AUC) for a receiver operator curve (ROC) comparing the intron controls to the exon controls by applying a threshold to the probe set summary – The ROC curve is generated by evaluating how well the probe set summary separates the positive controls from the negative controls (e.g., exon from intron). The assumption (which is only valid in part) is that the negative controls are a measure of false positives and the positive controls are a measure of true positives. An AUC of 1 reflects perfect separation whereas as an AUC value of 0.5 would reflect no separation. Note that the AUC of the ROC curve is equivalent to a rank sum statistic used to test for differences in the center of two distributions. Signal(A) The average signal for the probe sets defined as absent (Exon-level DABG > 0.01) Signal(All) The average signal for all probe sets on the array apt-opt-cdf-file The cdf file used for the analysis apt-opt-probe-count The total number of probes (features) on the array whether or not the probes are used apt-opt-qc-groups-file The file used to identify the control probe and probe sets for the analysis run #P The number of probe sets present – For exon level analysis, the DABG p-value is less than or equal to 0.01. %P The percent of probe sets present – For the exon level analysis, this is defined as the DABG probe level pvalue is less than or equal to 0.01. Signal(P) The average signal for the probe sets defined as present (Exon-level DABG <= 0.01) #A The number of probe sets absent – For exon level analysis, the DABG p-value is greater than 0.01. %A The percent of probe sets called present absent – For exon level analysis, the DABG p-value is greater than 0.01. Signal(A) The average signal for the probe sets defined as absent (Exon-level DABG > 0.01) Signal(All) The average signal for all probe sets on the array manual.book Page 140 Thursday, October 5, 2006 12:21 PM 140 Affymetrix® Expression Console™ Software v.1.0 – User Guide Probe Set Suffixes A number of standard metrics are created for particular groups of probe sets on the arrays. The key below describes the suffix and descriptions for the following groups of probe sets: - all_probeset (all probe sets within a group) - bac_spike (bacterial spikes and hybridization controls) - polya_spike (labeling controls) - neg_control (negative control probes) - pos_control (positive control probes) KEY: Suffix for Output Definition _probesets From the QC Report, the number of probe sets actually analyzed _atoms From the QC Report, the number of PM–MM or PM–GCBG probe pairs actually analyzed _mean The mean signal value for all the probe sets _stdev The standard deviation for all of the probe sets analyzed _mad_residual_mean The mean absolute deviation of the residual for a chip versus all chips in the data set _mad_residual_stdev The standard deviation of the residual for a chip versus all chips in the data set _rle_mean The mean absolute relative log expression (RLE) – This metric is generated by taking the probe set summary for a given chip and calculating the difference in log base 2 from the median value of that probeset over all the chips. The mean is then computed from the absolute RLE for all the probe sets for a given CEL file. _rle_stdev The standard deviation of the relative log expression (RLE) – This metric is generated by taking the probe set summary for a given chip and calculating the difference in log base 2 from the median value of that probeset over all the chips. The standard deviation is then computed from the absolute RLE for all the probe sets for a given CEL file. _percent_called The percent of probe sets from the exon analysis called present (DABG <= 0.01) manual.book Page 141 Thursday, October 5, 2006 12:21 PM Index manual.book Page 142 Thursday, October 5, 2006 12:21 PM manual.book Page 143 Thursday, October 5, 2006 12:21 PM 143 Index Symbols % Present call 53 Numerics 3’ Expression Array 3 configurations 101 probe set names 92 Expression Console™ advanced analysis 37, 101 analysis controls 35 analysis results 38 configuration file 101 controls only analysis 45 create profile 14 custom report 56 data analysis 36 delete profile 16 detection 54 Exon level analysis 43 exporting data 81 full report 55 Gene level analysis 43 graphs 58 hardware requirements 13 hybridization 46 installation instructions 13 library files 17 report metrics 56, 71 reports 54 signal 54 software requirements 13 specify controls 43 status window 38 study window 26 workflow 5 control probes negative 140 positive 140 controls 35, 89 correlation matrices 4 custom report delete 57 A advanced configuration delete 108 advanced expression configuration 37 AFFX controls 90 Affymetrix Power Tools 101 Affymetrix technical support 10 D DABG 140 Data Transfer Tool 13 detection calls 95 p-values 66, 68 documentation conventions used 7 AGCC software 13, 29 algorithm comparisons 132 parameters and outputs 135 algorithm parameters 53, 89 algorithms 35 analysis results 38 array metrics 25 B bacterial spikes 140 b-Actin 91 box plot 61 relative signal 64, 65 signal 64 box plots 58 C CEL files 3 CHP files 3 confidence levels 44, 46 control metrics 53, 93 DTT 25 E Exon Array 3, 43 advanced configuration 116 background correction 117 human 43 mouse 43 normalization method 117 rat 43 summarization method 117 F FAQS 7 Exon level analysis 43 all 45 core 45 extended 45 full 45 exporting data 81 probe set results to TXT 82 report to GCOS RPT file 84 study to TXT file 84 table as TXT file 85 tables and graphs to PDF 81 expression analysis scripts 37 expression arrays 3 G GAPDH 91 GCOS library file directory 17 RPT files 84 software 13, 29 gene expression 53 Gene level analysis 43 core 43, 44 extended 44 full 44 GeneChip® 3 manual.book Page 144 Thursday, October 5, 2006 12:21 PM 144 global array metrics 53 graphs 53 grid evaluation 58 Affymetrix® Expression Console™ Software v.1.0 – User Guide parameter settings 102 new 110 open 112 scale factor 109 heat maps 4, 58, 67 housekeeping controls 53, 71, 90 hybridization 3, 36 controls 53 multi-chip analysis 131 MvA plot 4, 65 identify controls 92 interanl control genes 91 internal control genes 89 NetAffx account 18 account information dialog box 19 library files 20 L labeling controls 53, 89 library files 30 copying files manually 21 custom 43 default location 17 download 17 error message 30 folder 17 NetAffx 17 set library path 30 line graphs 58, 71 linear correlations 66 linear scale 25 log scale 25 probe list 72 probe set suffixes 140 profile information dialog box 16 p-values 82 concordance 68 N I intensity files 25 summarization files 25 summarization report options 82 create 109 metrics 53 MSK file H probe level MASK files Q QC 3 analysis 49 functionality 89 metrics 54 O R outliers 3, 49, 58 outside of bounds 89 relative probe cell intensity 62 report controls 89, 90 define and modify thresholds 96 delete controls 94 delete thresholds 97 metrics 71 thresholds 95 P parameter management 94, 119 Pearson’s Correlation (detection p-value) 68 Pearson’s correlation (Signal) 67 RLE 140 RMA 25, 35, 101, 130 background correction algorithm 108 column headings 138 configurations 107 sketch workflow 46 Pearson’s test 66 PLIER 25, 35, 54, 101, 131 column headings 138 configurations 106 relative log expression 140 M MAS 5.0 25, 35, 101, 129 background estimation 129 CHP files in AGCC format 113 CHP files in GCOS format 115 column headings 135 configuration 102 PM–GCBG 140 PM–MM 140 PNG format 59 images 86 Poly-A RNA controls 92 probe cell intensity view 58 S screen captures 8 signal distributions 4 estimates 66 histogram 60 manual.book Page 145 Thursday, October 5, 2006 12:21 PM Index Spearman Rank Correlation detection p-value 70 signal 69 Spearman test 66 spike controls 90, 91 study window file consolidation 31 sort order for tables and graphs 31 suffix 47, 105 T technical support 10 threshold 39, 89 boundaries 49 test 55 U user profile management tab 16 145 manual.book Page 146 Thursday, October 5, 2006 12:21 PM 146 Affymetrix® Expression Console™ Software v.1.0 – User Guide