Tutorial on Microarray Analysis using Bioconductor and R (Sample Study) Tools Used

Transcription

Tutorial on Microarray Analysis using Bioconductor and R (Sample Study) Tools Used
341: Introduction to Bioinformatics
Tutorial on Microarray Analysis using Bioconductor and R
(Sample Study)
February 11, 2011
Tools Used
•
•
Bioconductor, http://www.bioconductor.org/, provides open source tools for the
analysis and comprehension of high-throughput genomic data
R, http://www.r-project.org/, R is a free software environment for statistical computing
and graphics
Data Used
•
•
•
Resveratrol effect on lung carcinoma cell line - Analysis of lung carcinoma A549 cells
treated with resveratrol. Resveratrol is a phytoestrogen found in red wine. Results
provide insight into protective effect of resveratrol against lung cancer. Available from
the Gene Expression http://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS2966
4 CEL files (Micro-array output) Data {GSM228717.CEL, GSM228718.CEL,
GSM228719.CEL, GSM228720.CEL} available from above link
1 Sample Phenotype Mapping {GDS296-pheno.csv}
Tasks
1.
2.
3.
4.
Import and Normalise data
Compare tissue treated with Resveratrol with similar tissue treat with an Ethanol control
Filter most differentially expressed genes
Cluster and View Analysis
R Code
## Step 1- Load packages
library(affy)
library(limma)
## Step 2 - Import Sample Data
setwd("C:/Users/asrowe/Documents/Tutorial/celfiles")
phenoData <- read.AnnotatedDataFrame("GDS296-pheno.csv" ,header=TRUE, sep="\t")
## Step 3 - Import and Normalise Data using functions from the Bioconductor affy
package
eset <- justRMA(phenoData=phenoData)
## Step 4 - Differential expression filtering using Bioconductor limma package
design <- model.matrix(~substance, pData(eset))
fit <- lmFit(eset, design) # fit each probeset to model
efit <- eBayes(fit)
# empirical Bayes adjustment
tt <-topTable(efit, coef=2)
# table of differentially expressed probesets
fix(tt) #View
# Step 5 - H Cluster and Dendrogram from the R stats package
selected <- p.adjust(efit$p.value[, 2]) <0.05 #Select Adusted Points
esetSel <- eset [selected, ] #Filter Selected Points
heatmap(exprs(esetSel)) #Display Heatmap
GDS296-pheno.csv
Sample
substance
GSM228717
Control
GSM228718
Wine
GSM228719
Wine
GSM228720
Wine
Notes:
For those interested in more information on Normalisation methods and why RMA (justRMA) is
used, start here:
Bolstad, B.M., Irizarry R. A., Astrand, M., and Speed, T.P. (2003), A Comparison of
Normalization Methods for High Density Oligonucleotide Array Data Based on Bias and
Variance. Bioinformatics 19(2):185-193