RNA spike-in controls & analysis methods for
Transcription
RNA spike-in controls & analysis methods for
RNA spike-in controls & analysis methods for trustworthy genome-scale measurements Sarah A. Munro, Ph.D. Genome-Scale Measurements Group ABRF Meeting March 29, 2015 Overview • External RNA Controls Consortium (ERCC) RNA spike-in controls • ‘erccdashboard’ analysis tool • ERCC 2.0: Building an updated suite of RNA controls Overview • External RNA Controls Consortium (ERCC) RNA spike-in controls • ‘erccdashboard’ analysis tool • ERCC 2.0: Building an updated suite of RNA controls How can we have trustworthy gene expression results? • We’re simultaneously measuring thousands of RNA molecules in gene expression experiments • But are we getting it right? External RNA Controls Consortium (ERCC) initiated by industry, hosted by NIST • Initiated by Janet Warrington, VP Clinical Genomics at Affymetrix • Open to all interested parties • Voluntary • More than 90 participants – Industry, Academia, Government – All major microarray technology developers – Other gene expression assay developers Spikeins ERCC control sequences are in NIST Standard Reference Material 2374 • DNA sequence library • 96 unique control sequences in DNA plasmids • Controls intended to mimic mammalian mRNA • In vitro transcription to make RNA controls NIST SRM 2374 and related data files are available directly from NIST @ http://tinyurl.com/erccsrm Making ERCC ratio mixtures with true positive and true negative ratios NIST Plasmid DNA Library RNA transcripts … in vitro transcription Pooling Mixtures with known abundance ratios Using ERCC ratio mixtures Treated (n>3) Control (n>3) Using ERCC ratio mixtures Treated (n>3) Control (n>3) Using ERCC ratio mixtures Treated (n>3) Control (n>3) Using ERCC ratio mixtures Treated (n>3) Control (n>3) Measurement process Expression Measures Statistical Analysis Multiple steps Many people & labs Takes days to weeks Example gene expression data Treated Control Are the RNA molecule ratios statistically different across the samples? Treated Control Evaluate technical performance with ERCC true positive and true negative ratios Treated Control Overview • External RNA Controls Consortium (ERCC) RNA spike-in controls • ‘erccdashboard’ analysis tool • ERCC 2.0: Building an updated suite of RNA controls Use erccdashboard to produce standard performance metrics for any experiment • R package is available from: – Bioconductor – NIST GitHub Site • Open source and open access for use in – Other analysis tools and pipelines – Commercial software Gauge technical performance with 4 erccdashboard figures • Developed as part of SEQC study, with ABRF partners • Technology-independent ratio performance measures • Assessed differences in performance across – Experiments – Laboratories – Measurement processes Munro, S. A. et al. Nature Communications 5:5125 doi: 10.1038/ncomms6125 (2014). Ambion ERCC Ratio Mixtures 23 Controls per Subpool Design abundance spans 220 range within each Subpool Spike-in design for SEQC RNA Sequencing Experiments Samples replicates for sequencing Rat Experiment Treated and Control Rat RNA Biological Replicates Interlaboratory Experiment Human Reference RNA Samples Technical Replicates What is the dynamic range of my experiment? Interlaboratory Experiment Log2 Normalized ERCC Counts Log2 Normalized ERCC Counts Rat Experiment Log2 ERCC Spike Amount (attomol nt µg-1 total RNA) Log2 ERCC Spike Amount (attomol nt µg-1 total RNA) What is the dynamic range of my experiment? Typical Sequencing ~40 million sequence reads per replicate Log2 ERCC Spike Amount (attomol nt µg-1 total RNA) Interlaboratory Experiment Log2 Normalized ERCC Counts Log2 Normalized ERCC Counts Rat Experiment Deep Sequencing ~260 million sequence reads per replicate Log2 ERCC Spike Amount (attomol nt µg-1 total RNA) What was the diagnostic power? Interlaboratory Experiment True Positive Rate True Positive Rate Rat Experiment False Positive Rate False Positive Rate What was the diagnostic power? Area Under the Curve (AUC) depends on the number of controls detected! False Positive Rate Interlaboratory Experiment True Positive Rate True Positive Rate Rat Experiment False Positive Rate AUC is a reasonable summary statistic… But we’d like to evaluate our diagnostic performance as a function of abundance… Log2 Normalized Ratio of Counts Rat Experiment MA Plot Log2 Normalized Average Counts LODR: Limit of Detection of Ratios Rat Experiment Reference RNA • Model P-values as a function of average signal DE Test P-values • Find P-value threshold based on chosen false discovery rate • Here FDR = 0.1 • Default is FDR = 0.05 Average Counts • Estimate LODR from intersection of model confidence interval upper bound and P-value threshold LODR: Limit of Detection of Ratios Rat Experiment Reference RNA LODR provides DE Test P-values • Specified confidence in the differentially expressed transcripts above LODR (90% chance of <10% FDR) Average Counts • Guidance for experimental design increase signal for transcripts above LODR estimate 4:1 LODR Log2 Ratio of Normalized Counts Rat Experiment MA Plot Log2 Normalized Average Counts 4:1 LODR Log2 Ratio of Normalized Counts Rat Experiment ** MA Plot * Log2 Normalized Average Counts 4:1 LODR ** * Log2 Ratio of Normalized Counts Rat Experiment MA Plot Log2 Normalized Average Counts Increased sequencing depth shifts endogenous transcript ratio measurements above LODR What are the LODR estimates for my experiment? Interlaboratory Experiment DE Test P-values DE Test P-values Rat Experiment Average Counts Average Counts How do the endogenous samples relate to LODR? Interlaboratory Experiment Log2 Ratio of Normalized Counts Log2 Ratio of Normalized Counts Rat Experiment 4:1 LODR Log2 Normalized Average Counts 4:1 LODR Log2 Normalized Average Counts How much technical variability & bias is there? Rat Experiment Interlaboratory Experiment Log2 Ratio of Normalized Counts Log2 Ratio of Normalized Counts Decreased Variability Significant Ratio Bias mRNA Fraction Differences Between Samples Contributes to Bias in ERCC Ratios Spike-in Spike-in Total RNA mRNA mRNA mRNA enrichment rRNA Sample 1 Sample 2 Sample 1 The RNA fractions are exaggerated for illustration purposes Sample 2 Dynamic Range AUC • Variability • Bias • LODR & Sample Transcripts LODR Diagnostic performance Limit of Detection of Ratios EVALUATE REPRODUCIBILITY ACROSS LABORATORIES Good Performance Poor Performance Interlaboratory Analysis Using erccdashboard performance metrics Lab 1-6 Illumina + poly-A selection (Illumina kit) Lab 7-9 Life Tech + poly-A selection (Life Tech kit) Lab 10-12 Illumina + ribosomal RNA depletion • Diagnostic performance was consistent within and amongst measurement processes • Lab 7 was an outlier for diagnostic performance LODR (Average Counts) Consistent LODR across 11 of 12 Labs • LODR agreement with AUC Laboratory Ratio bias is highly variable amongst experiments Ratio bias (rm) can be attributed to mRNA fraction difference between samples: Rs = nominal subpool ratio (E1/E2)s = empirical ratio • Shippy et al. 2006 mRNA fraction Difference Log(rm) • Large standard errors indicate that mRNA fraction isn’t the only factor contributing to ERCC ratio bias – mRNA enrichment protocol is a factor… Laboratory Protocol-dependent bias from poly-A selection affects ERCC controls due to short poly-A tails Lab 1-6 ILM Poly-A Lab 7-9 LIF Poly- Lab 10-12 ILM Ribo mRNA enrichment protocol biases vary across individual ERCCs but are consistent for a protocol mRNA enrichment protocol biases vary across individual ERCCs but are consistent for a protocol Results of the erccdashboard Publication • Ratio performance measures for any technology platform and any experiment – Diagnostic Power – Novel LODR metric – Technical Variability & Bias • Comparison across experiments • Quantification of mRNA fraction differences between samples • Show protocol-dependent bias Overview • External RNA Controls Consortium (ERCC) RNA spike-in controls • ‘erccdashboard’ analysis tool • ERCC 2.0: Building an updated suite of RNA controls ERCC 2.0: A New Suite of RNA Controls • Approached by industry and academia to build new RNA controls • NIST-hosted open, public ERCC 2.0 workshop – Workshop report and presentations available: slideshare.net/ERCC-Workshop • All interested parties are welcome to participate – Sequence contributions – Interlaboratory analysis • New and Improved mRNA Mimics • Transcript Isoforms • miRNA New and Improved mRNA Mimics • Additional controls • Expand distributions of RNA control properties – Length (> 2kb) – GC content – Poly-A tail length Transcript Isoform Controls • Transcript Design – Non-cognate Spike-in RNA Variants (SIRVs) developed by Lexogen – Cognate sequence selection in progress • Schizosaccharomyces pombe • Mixture design – Dynamic Range • 24 – Design Ratios • < 2:1 Lukas Paul, Lexogen Small and miRNA Controls • Needed for validation of clinical applications – Early Detection Research Network – Tgen • Other applications relevant to bacterial RNA-Seq • Non-cognate miRNA controls • Include some pre-miRNA • Direct RNA control synthesis by Agilent – no need for DNA templates Karol Thompson, FDA Recap • External RNA Controls Consortium (ERCC) RNA spike-in controls • ‘erccdashboard’ analysis tool • ERCC 2.0: Building an updated suite of RNA controls Acknowledgements • All External RNA Controls Consortium participants • NIST – – – – – – – – Marc Salit Steve Lund P. Scott Pine Justin Zook David Duewer Jerod Parsons Jennifer McDaniel Margaret Klein • Empa – Matthias Roesslein • SEQC study participants • Co-authors on erccdashboard manuscript: S. P. Lund, P. S. Pine, H. Binder, D. Clevert, A. Conesa, J. Dopazo, M. Fasold, S. Hochreiter, H. Hong, N. Jafari, D. P. Kreil, P. P. Łabaj, S. Li, Y. Liao, S. M. Lin, J. Meehan, C. E. Mason, J. Santoyo-Lopez, R. A. Setterquist, L. Shi, W. Shi, G. K. Smyth, N. Stralis-Pavese, Z. Su, W. Tong, C. Wang, J. Wang, J. Xu, Z. Ye, Y. Yang, Y. Yu, & M. Salit For more information contact: sarah.munro@nist.gov