clearer insight into the fundamental biology of
Transcription
clearer insight into the fundamental biology of
Alternative Two Sample Tests in Bioinformatics Xiaohui Zhong and Kevin Daimi Department of Mathematics, Computer Science and Software Engineering University of Detroit Mercy, 4001 McNichols Road, Detroit, MI 48221 {zhongk, daimikj}@udmercy.edu Abstract— Bioinformatics is a multidisciplinary field. Statistics is getting immense popularity in bioinformatics research. The goal of this paper is to introduce a survey of two sample tests applied to bioinformatics. The vast majority of these methods do not follow the classical two sample test techniques, which require strict assumptions. Thus, unlike other classical surveys, this paper will emphasize the justifications behind the deviations from the standard approach, and the implementation of such deviations. Index Terms— Statistical Methods, Sequence Analysis, Microarray, Two-sample Testing, Bootstrap Hypothesis Testing, Non-traditional Hypothesis Testing I. INTRODUCTION Bioinformatics is a rapidly growing discipline that has matured from the fields of Molecular Biology, Computer Science, mathematics, and Statistics. It refers to the use of computers to store, compare, retrieve, analyze and predict the sequence or the structure of molecules. According to Cohen [2], “The underlying motivation for many of the bioinformatics approaches is the evolution of organisms and the complexity of working with incomplete and noisy data.” Bioinformatics is a multidisciplinary field in which teams from Biology, Biochemistry, Mathematics, Computer Science, and Statistics work together to stipulate perception into the functions of the cell [3], and [10]. More precisely, Bioinformatics is the marriage between the fields of biology and computer science together in order to analyze biological data and consequently solve biological problems [12]. The need for collaboration in bioinformatics research and teaching is inevitable. “The explosive increase in biological information produced by large-scale genome sequencing and gene/protein expression projects has created a demand that greatly exceeds the demand for researchers trained both in biology and in computer science” [4]. According to the European Bioinformatics Institute [5], “Bioinformatics is an interdisciplinary research area that is the interface between the biological and computational sciences. The ultimate goal of bioinformatics is to uncover the wealth of biological information hidden in the mass of data and obtain a clearer insight into the fundamental biology of organisms. This new knowledge could have profound impacts on fields as varied as human health, agriculture, environment, energy and biotechnology.” The field of statistics plays a vital role in bioinformatics. Modified statistical techniques are being constantly evolving. Statistics is the science of collection, organization, presentation, analysis, and interpretation of data. [16], [18]. Statistical methods which summarize and present data is referred to as descriptive statistics. Data modeling methods that account for randomness and uncertainty in the observations and drawing inferences about the population of interest lie within the inferential statistics. When the focus is on the biological and health science information, biostatistics is applicable [18]. The techniques of statistics that have been applied include hypothesis test, ANOVA, Bayesian method, Mann–Whitney test method, and regressions tailored mainly to microarray data sets, which take into account multiple comparisons or cluster analysis and beyond. In bioinformatics, microarrays readily lend themselves to statistics resulting in a number of techniques being applied [15], [22]. The above mentioned methods assess statistical power based on the variation present in the data and the number of experimental replicates. They even help to minimize Type I and type II errors in demanding analysis. While these methods sound familiar to people with statistics background, they might be foreign to researchers in the field of bioinformatics. On the other hand, statisticians will enjoy the benefit of seeing how these techniques are being applied to the field of bioinformatics when getting to know what DNA sequences or protein sequences are. This paper aims to survey some basic statistical techniques, especially different kinds of hypothesis testing techniques that have been developed lately and used in the context of bioinformatics. The goal of this survey is to pinpoint the motivations for modifying the classical two-sample tests when applied to bioinformatics by researchers. The classical two-sample tests have strict assumptions. The reason that forced researchers to relax or violate some of these assumptions will be explored. II. CLASSICAL TWO-SAMPLE t -TESTS The classical two-sample t-test has been applied to only few bioinformatics problems. The reason for that should be clear shortly. An example is the following scenario. When measuring the level of gene expression in a segment of DNA, the process usually requires several repeated experiments in order to obtain the measurements of one cell type. This is due to biological and experimental variability. The objective is to compare the levels of the gene expression between two types of DNA based on the measured levels of gene expressions for these two types of DNA’s. Such a procedure is a typical classical two sample t-test. Assuming that M t ,itt are the measurements from type t 1 , 2 respectively, with 1 it n t , the null hypothesis H 0 : 1 2 is tested with alternative hypothesis 1 2 . The appropriate test statistics is t ( M 1 M 2 ) n1 n 2 S n1 n 2 where S 2 t 1 , (1) 1 2 ( M t ,i M 1 ) i 1 . n1 n 2 2 nt 2 2 Using the assumptions that M t ,it are nt NID( t , t ) random variables, the statistics t follows a t distribution with degrees of freedom n1 n 2 2 if the null hypothesis is true. While this test procedure is very simple, it requires very strict assumptions. Some or all of these assumptions cannot be met in real life applications. In some cases, it is either not known or hard to confirm whether the variables M t ,it are normally distributed. If they are normally distributed, then the requirement of both normal populations sharing a common variance could be hard to fulfill. Another requirement to be satisfied mandates these variables to be independent, which is generally true in many gene expressions measurements. In practice, some or all of these conditions are not satisfied, but the decision on the equality of two means is still needed. Thus, alternatives to this standard classical t-test are required. In this paper, we will survey several modified tests appearing in recent bioinformatics literature. III. TWO SAMPLE TEST WITH INTRA-DEPENDENCY Gilbert et al [9] compared the genetic diversity of the virus between two groups of children who were infected with HIV at birth. The children were classified into a group of 9 slow/non-progressors (group 1) and a group of 12 progressors (group 2). Between 3 to 7 HIV gag P17 sequences were sampled from each child and pair- wise sequence distances were derived for each child’s sample as the measures of diversity within a child. The goal was to assess whether the level of HIV genetic diversity differed between the two groups in order to help identify the role of viral evolution in HIV pathogenesis. In what follows, we will show why the authors have to deviate from the standard two sample test. We will first introduce and explain their statistical model. g Let M kij represent the distance between sequence i and j of child k in group g , g 1 or 2 . It was found that if a sequence is involved in two distances of a child’s sequences, then the two distances are positively correlated. Also the contrasts involving common individual are also positively correlated. Therefore, the conditions for a classical t -test described in section II is violated. This will force the application of this procedure to produce bias results. The natural option is to perform the test based on a subset of independent samples in which not all the information is fully considered. Thus, a new two-sample test that took account of the correlations between samples was proposed. The detail is described as follows: Assume that there are n g children from group g , g 1 or 2 respectively, and child k has m kg sequences sampled. Then there are N g ng k 1 mkg (mkg 1) / 2 many pair-wise distances from each group. also Q g ng 2(mkg k 1 There are 2) many covariances between the distances for the individuals in each group. The test M1 M 2 statistics is similar to (1) above, t . The (M 1 M 2 ) main idea is to estimate the standard deviation ( M 1 M 2 ) with the correlations g between M kij , assuming the null hypothesis H 0 : 1 2 is true. Here, the mean distances are defined as M g ( N g ) 1 ng k 1 i j g . M kij It is noticeable that the correlation only occurs within the group and particularly within individuals, so the estimate of the variance within one group can be discussed without indexing on g and k . Since there are n(n 1) / 2 pairwise distances, the standard estimate for the variance of M is 2 (M) (n(n 1) / 2 1)1 i j (Mij M)2 . But this estimate is too small because it did not account for the positive correlations between distances sharing the same sequences. Another option is 2 ( M ) (n 1) 1 i j ( M ij M ) 2 . However, this one is too large unless the correlations between the sequences are perfectly linear. Therefore, something in between these two estimates could be a more accurate estimate of the variance. Because the correlation only occurs between the pair-wise distances sharing the same sequence, this variance can be estimated by calculating the covariance in two parts: 2 ( M ) ( n( n 1) / 2) 1{2( n 2) 12 22 } where 12 cov(M ij , M il ) is the covariance of the pairwise distances that share the same sequence, and 22 var( M ij ) is the variance of all pair-wise distances. The empirical estimates of these two variances are: {( M ij M )(M il M ) ( M ij M )(M jl M )} 2 i j l 1 , n(n 1)(n 2) / 3 1 (2) 2 (n(n 1) / 2 1) 1 i j ( M ij M ) 2 . (3) Since there are two groups, the estimate can be modified to 2 ng 2 (M 1 M 2 ) N g1 {2(mkg 2) g2,1 g2, 2 } g 1 k 1 (4) where g21 ng (m k 1 g ( {(M kij i j l g g k ( mk 1)(mkg 2) / 3 1) 1 g M g )(M kil M g) g g ( M kij M g )(M kij M g )} ) and g2 2 ( N g 1) 1 ng k 1 i j g ( M kij M g )2 N g ( N g 1) / 2 2 g 1 2( N g 2) g2 1 is large enough, where g is the correlation coefficient of the two pair-wise distances sharing the same sequence in group g . The authors provided the comparative results for the DNA sequences of the 21 children described earlier. Classical two sample t –test was performed on the differences based on synonymous distance with sample means D 1 0.0113 and D 2 0.00713 , and sample sizes N 1 387 and N 2 523 respectively. The result suggested a difference between the two groups with p 2.2 10 6 . However, it was estimated that the correlations of the pair-wise distances within individuals are 1 0.55 and 2 0.61 respectively. The classical t –test ignored these positives correlations, which resulted in a smaller estimated variance for the difference of the means. Thus, the newly developed procedure was applied producing p 0.56 , which indicates that the difference between the mean distances of the two groups is not significant. The above two-sample test method provided an alternative to the traditional two-sample t –test to accommodate the situation where data within the group may be correlated. This approach will have significant impact on many areas. First, a new method for the existing statistical tests is introduced. This method not only can be applied in the area of bioinformatics, but can also be applied in other fields, such as finance, engineering, chemistry, and behavior science. Most important, it can have distinct significance in the bioinformatics domain. For example, in the analysis of DNA sequences [6], one of the tasks is to test the similarity or differentially expressed genes of two sequences by matching the subsequences. One of the assumptions for such matching rules is that the occurrences of the nucleotides must be independent. Such an assumption was found to be inaccurate in many DNA sequences. This method provides an alternative formula for the test statistics by calculating the variance of the mean of data that might be dependent on each other. Furthermore, the method for calculating the variance can be extended to building statistical models from data that might be interdependent. IV. BOOTSTRAP AND PERMUTATION METHODS ij 1 2 (M M ) Modified this way, the test statistics t 1 (M M 2 ) will have asymptotic normal distribution, provided that The test discussed in last section dealt with comparing means from two samples. With the advancements of biology and other bioscience, collections of microscopic DNA spots attached to a solid surface called microarrays are studied. With the power of computation, scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously. One of their objectives is to detect differentially expressed genes between two types of cells. Suppose we have two types of cells. Associated with each cell are a number of microarrays. Let the number of microarrays be n1 and n2 respectively. The n1 arrays contain m genes from the first type of cells, and the n2 arrays have m genes from the second type. Let M ijc be the expression value of the i th gene in the j th array in cell c, c 1,2 . Let t i , i 1,2...m be the two sample test statistics calculated using formula (1). The null hypothesis for each test is H io : i1 i 2 . When this hypothesis is being rejected as a positive result, the two genes will be differentially expressed. Assuming the cumulative distribution function of t i is Di (t ) when the null hypotheses are true, the p -value of each test can be calculated as pi (1 Di (| t i |) Di ( |t i |)) . These p values are arranged in ascending order p (1) p ( 2) ... p ( m) . Any gene tested with a p -value below certain threshold will be rejected (indicating the test is positive). These genes are ranked in the order of p -values with the smallest value as the most significant for further study. The remaining task is to find the distributions Di (t ) . There are many different ways to identify these distribution functions. Under the classical assumptions that all M ijc are normally identically distributed, the distributions are either student t –distribution or standard normal distribution. As discussed in the last section, such an assumption is either unrealistic or difficult to verify. As a result of increasing computing power, resampling methods, such as permutation and bootstrap methods are being widely used. These methods generate empirical distributions Di , which are also the distributions of p i . The classical bootstrapping/permutation resampling scheme is described as follows. Calculate the test statistics from the original sample t i for each gene using formula (1). All n1 n 2 arrays are put in the same pool. The n1 arrays are randomly drawn to be assigned to type 1 cell, and n1 arrays are randomly drawn to be assigned to type 2 cell. If the draws are with replacement, the bootstrap method will be used. If the draws are without replacement, the permutation method will be applied. Repeat the above steps B times. In the case of permutation, not all possible permutations have to be considered. In this paper, the two methods will be treated similarly. Calculate the t –statistic t ib , i 1,2,..., B using formula (1) for each sample. Under the null hypotheses that there is no differentially expressed gene, the t –statistics should have the same distribution regardless of how the arrays are arranged. Hence, the empirical p values can be calculated by: pi 1 B # { j :| t bj || t i |, j 1,2,..m} B b 1 (5) m This scheme was discussed and applied in a number of papers [1], [8], [13], [19]-[21]. It also has another alternative described in [13] as Posterior Mixing Scheme: Resample the n1 arrays from type 1 cell and place on type 1 cell, and resample n 2 arrays from type I cell and place on type II cell. Using the data in question, calculate tib1 , i 1,2,..., B for each sample using formula (1). Repeat the above two steps on the array from type II cell and obtain t ib2 , i 1,2,..., B . Finally, calculate tib n1 b n2 b ti1 ti2 . n1 n2 n1 n2 (6) Then p i ’s are calculated with formula (5). It was concluded that the Posterior Mixing Scheme will have better power [1 – P (type II error)] than the classical one. To our knowledge, this formula for calculating the test statistics has not been employed in the bioinformatics literature yet. The formula should be appealing to researchers to further investigate and validate it, and obtain more accurate results for identifying differentially expressed genes. Mukherjee et al [17] took the bootstrap method for calculating these statistics a step further. From the bootstrap schemes described above, the bootstrap, t i ’s are assumed to be normally distributed with empirical B b 1 t i and standard deviation . mean t i B b1 Formula (5) was not used to calculate the p -values. Instead the expected p -value was calculated by pi E( pi ) (1 Di (| x |) Di ( | x |)G(x,ti , )dx where Di is the cumulative distribution function of t i and G is the Gaussian. This procedure was applied to some widely analyzed microarray data with Di replaced by t -distribution with degrees of freedom n1 n 2 2 , and the variance 2 was set between 1 and 3. Results of ranking on the genes by this proposed bootstrap method and classical two sample t test were compared. It was found that the genes identified to be differentially expressed were subsequently confirmed by further costly test to rank an average of 25.5 places higher than genes ranked by the classical method [17]. This shows that the bootstrap method provides a powerful alternative to the classical method by estimating the p -values more accurately. Bootstrap two sample test is widely used by many researches in identifying the differentially expressed genes. This method is particularly suitable for the cases when the underlining distributions are unknown. For example, Troyanskaya et al [21] used this procedure to perform 50,000 permutation on a data set comprised of normal lung and squamous cell lung tumor specimens with the Bonferroni correction p -values. The result of this method was compared to the result of rank sum test and ideal discriminator method. It was concluded that the bootstrap two sample test is most appropriate for a high-sensitivity test [21]. Many other researchers, such as Pan [19], Ge [8], and Abul [1] also used this method as an integral part of their more comprehensive study of microarrays. The procedure of bootstrapping requires intensive computation. Computer packages/algorithms are also developed to tackle the issues related to computation time, storage and efficiency. Li et al [14] developed an algorithm, Fast Pval, to efficiently calculate very low pvalues from large number of resampled data. The software package, SAFEGUI, was designed to bootstrap resampling t-tests for testing gene categories [7]. value for 5% Type I error. This will produce 49 [5% of (1000-20)] miss-identified genes, which is even more than the actual differentially expressed genes. Thus, the Type I error for the entire array is greater than 5%, which is undesirable result. A well-known classical procedure to correct this problem is the Bonferroni correction by replacing the cut-off Type I error by m where m is the total number of tests [6], [21]. For m 1000 , Type I error becomes 0.0005, which forces the test to miss most of the significantly differentially expressed genes. Actually, the possible outcomes of any multiple tests can be described in the tabular format (Table 1) below. The numbers in parentheses represent the intended scenario. A variety of measurement schemes in the development of procedures dealing with microarray data were proposed. These include Per-comparison error rate (PCER), Family-wise error rate (FWER), False discovery rate (FDR), and positive False discovery rate (pFDR). They are stated as [8]: E(V ) m FWER= Pr(V 0) V FDR= E( | R 0) Pr( R 0) R V pFDR= E( | R 0) R PCER= TABLE 1 POSSIBLE OUTCOMES FOR 1000 GENES WITH 5% P-VALUE V. MULTIPLE TESTING WITH Q-VALUES Modified two-sample tests, and bootstrap twosample tests introduced in the last section concentrated on finding the p -values of the test so that genes can be ranked accordingly. Notice that the p -value is only the probability that the test statistic falls in the critical region controlled by the maximum tolerance of Type I error for one test. In the case of multiple tests, such as the gene expressions in microarrays, the Type I error can be inflated. For example, assume that 1000 genes are represented in each array of the two types of cells, and 20 out of the 1000 genes are differentially expressed. To find these 20 genes, two-sample t -tests are performed among 1000 pairs of genes using a p - Among these four measures, the most commonly used is the pFDR. Since this quantity is only meaningful and useful when R is positive, this rate is usually written as V FDR= E( ) which is the symbol used here. A R V , which R represents the ratio of number of false positive and the total tested positive. While the traditional multiple tests have to deal with thousands of test with only one cut-off value for the p -values, the false discovery rate takes into account the joint behavior of all the p -values. The false discovery rate is therefore a useful measure of the overall accuracy of a set of significant tests. We will discuss a method using a q -value developed by Storey et al [20]. The q-value method took into consideration the FDR balancing the identification of as many significant features as possible, while keeping a relatively low proportion of false positives. This method and an important application of this method [1] will be discussed below. A value similar to the p -value is defined by Storey et al [20] as the q –value corresponding to a particular p – value. Assume the p -values for each test are calculated straightforward estimate for FDR is FDR as pi by one of the methods introduced in previous sections. Then the q –value is calculated by: V ( ) q ( p i ) min FDR( ) min { } pi 1 pi 1 S ( ) where V ( ) # {false positive | pi , i 1,2,..., m} , and S ( ) #{ p i , i 1,2,..., m} . The objective is to simultaneously control the q -value and the p -value so that the FDR will not be out of proportion. A procedure for finding the q -values and the criteria for selecting the threshold in a sequential procedure are described below [20]. 1) Assume the test statistics are calculated by (1), with p -values p i calculated by (5), for i 1,2,..., m . 2) Arrange the p –values in ascending order p (1) p ( 2) ... p ( m) , which is also the order of genes in terms of their order against the null hypotheses. 3) Use one of the options described below to estimate the value of 0 . 0 mt 4) Estimate q( p( m) ) min p( m) t p( m ) # ( pi t ) 5) For j m 1, m 2,...,1 , estimate 0 mt q ( p ( j ) ) min { , q( p( j 1) )} t p( j ) # ( p i t ) 0 mp( j ) min{ , q( p j 1) )} j Now, two lists for p -values and q -values are simultaneously formed: p (1) p ( 2) ... p ( m) , q( p (1) ) q( p( 2) ) ... q( p( m) ) One can select the maximum index 1 k m in the above lists so that both p -values and q -values up to k th gene will satisfy both thresholds. The quantity 0 in step 3 is the proportion of null genes (no differences between the two cell types) of the total number m of genes tested. Despite the fact of having a difficult task to deal with, three different ways have been developed to estimate this quantity [20]. A. Rule of Thumb Method Let 0 # ( pi ) for some λ, 0 1 . m(1 ) The rationale for this estimate is that the null p – values are uniformly distributed after certain value, . A simple rule of thumb is choosing 0.5 . This implies that the value of π0 is estimated by # ( pi 0.5) 0 . 0.5m B. Bootstrap Method Assumed that all p -values are calculated from the original set of data. Calculate 0 ( k ) for k k , k 0,1,..., M , # ( pi k ) m(1 k ) max ,0 max 1 M from these p -values. Here, max is close to 1 and M is the number of desired points. Let min { 0 (k )} . Resample the data B times, 0 k M calculating 0b ( k ) # ( pib k ) from m(1 k ) the bootstrap p -values for all k each time. Define the mean square error to be: MSE ( k ) B b 1 ( 0b ( k ) ) 2 B . Then the estimate of the proportion of null genes will be: 0 min{ 0 ( k ),1} , where is the collection of k k s such that MSE ( k ) is minimum. A simple algorithm was given in [1]. C. Curve Fitting Method The ideal estimate for 0 ( ) is 0 ( max ) , where max is close to 1 since genes should be null in this region. However, the value of 0 ( ) is very sensitive to change of . To obtain a stable estimate, a natural cubic spline f ( ) is suggested to be fitted to the points {(k , ( k )) | k {0, , ,.., max } , the estimate is 0 f (1) . There were two suggestions for fitting the curve. Storey et al [20] suggested that the curve fitting should be weighted by (1 ) to control the instability near 1. However, Abul et al [1] suggested that the result with no weighting is better to avoid underestimation. For any new set of data, both weighted and un-weighted fitting should be tried and the better estimate used. The procedure of estimating 0 was extended to onesided hypothesis [1] with some adjustments. For example, if the tests are right-sided (up-regulated), the formula for the t -statistics remains the same as (1). The corresponding p values can also be calculated by the bootstrap process described in last section. However, formula (5) for calculating the p values should be modified to pi This 1 B B # { j : t bj t i , j 1,2,..m} b 1 change m will . make lim 0 ( ) 1 , 1 (5' ) which is meaningless. The adjustment will be to set max as the upper bound of for which 0 ( ) 1 . This results in max sup{0 1 | 0 ( ) 1} . Bootstrap or curve fitting will be deployed to estimate 0 , which is needed for finding the q -values. Experiments on some artificial data demonstrated that this approach could provide very accurate estimates. The procedure described above can guide researchers to fine tune the selection of genes for further experiments. By bounding false-discoveries, the amount of wasted time and cost can also be bounded with the same rate of falsediscoveries beforehand. This procedure has many applications in microarray experiments and gene analysis. VI. CONCLUSIONS Bioinformatics is being used in many fields such as molecular medicine, preventative medicine, gene therapy, drug development, and waste cleanup. The interdisciplinary nature of bioinformatics demands close collaboration between biologists, computer scientist, mathematicians, and statisticians. Statistics is playing a significant role in various applications of bioinformatics. One of the important areas of statistics that has been heavily used is two sample tests. These tests classically have rigorous postulations. Researcher involved in bioinformatics concluded that these tests are not readily suitable for their work due to the nature of many of the bioinformatics applications. Consequently, they were forced to weaken some/all of these postulations. This paper surveyed a number of methods that pushed researcher to diminish these constraints. Assumptions that were relaxed and the reasons behind this relaxation were demonstrated. It is our future goal to introduce studies dealing with variation of formulas for two sample tests, variety methods of controlling the false discovery rate, such as selecting proper sample size, methods taking into account the dependency of sample data, and extension of these techniques to multi-sample testing. While many of these techniques were proposed based on certain set of data or artificial data, work needs to be done on different data sets to validate the results. More importantly, statisticians can help in seeking theoretical justification or support for these methods. Computer scientists can assist in developing more efficient algorithms to implement these techniques. It is hoped that these methods can spark new ideas in the future research in bioinformatics. REFERENCES [1] O. Abul, R Alhajj, and F Polat, “A Powerful Approach for Effective Finding of Significantly Differentially Expressed Genes,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 3, No. 3, pp. 220-231, 2006. [2] J. Cohen, “Bioinformatics: An Introduction for Computer Scientists,” ACM Computing Surveys, Vol. 36, No. 2, pp. 122-158, 2004. [3] J. Cohen, “Computer Science and Bioinformatics,” Communications of the ACM, Vol. 48, No. 3, pp. 72-78, 2004. [4] Editorial, “Training for Bioinformatics and Computational Biology,” Bioinformatics, Vol. 17, No. 9, pp. 761-762, 2001. [5] European Bioinformatics Institute, Available: http://www.ebi.ac.uk/2can/home.html. [6] W. J. Ewens and G. R. Grant, Statistical Methods in Bioinformatics: An Introduction, New York: Springer-Verlag, 2001. [7] D. M. Gatti, M. Sypa, I. Rusyn, F. A. Wright, and W. T. Barry, “SAFEGUI: Resampling-Based Tests of Categorical Significance in Gene Expression Data Made Easy,” Bioinformatics, Vol. 25, No. 4, pp. 541-542, 2009. [8] Y. Ge, S. Dudoit, and T. P. Speed, ”ResamplingBased Multiple Testing for Microarray Data Analysis,” Dept. of Statistics, University of California, Berkeley, Tech. Rep. 633, 2003. [9] P. B. Gilbert, A. J. Rossini, and R. Shankarappa, “Two-Sample Tests for Comparing Intra-Individual Genetic Sequence Diversity between Populations,” Biometrics, Vol. 61, No. 1, pp. 106-117, 2005. [10] S. Gopal, A. Haake, R. P. Jones, and P. Tymann, Bioinformatics: A Computing Perspective, New York: McGraw Hill, 2009. [11] Y. Ji, Y. Lu and G. Mills, “Bayesian Models Based on Test Statistics for Multiple Hypothesis Testing Problems,” Bioinformatics, Vol. 24, No.7, pp. 943949, 2008. [12] M. LeBlanc, and B. Dyer, “Bioinformatics and Computing Curricula 2001: Why Computer Science is Well Positioned in a Post Genomic World,” ACM SIGCSE Bulletin, Vol. 36, No. 4, pp. 64-68, 2004. [13] S. Lele and E. Carlstein, “Two-Sample Bootstrap Tests: When to Mix?” Department of Statistics, University of Carolina at Chapel Hill, Tech. Rep. 2031. [14] M. J. Li, P. C. Sham, and J. Wang, “FastPval: A Fast and Memory Efficient Program to Calculate Very Low P-values from Empirical Distribution,” Bioinformatics, Vol. 26, No. 22, pp. 2897-2899, 2010. [15] P. Liu, J. T. Hwang, “Quick Calculations for Sample Size While Controlling False Discovery Rate with Application to Microarray Analysis,” Bioinformatics, Vol. 23, No. 6, pp. 739-746, 2007. [16] V. Mantzapolis, and X. Zhong, Probability and Statistics, Dubuque: Kendall Hunt Publishing Company, 2010. [17] S. N. Mukherjee, P. Sykacek, S. J. Roberts, and S. J. Gurr. “Gene Ranking Using Bootstrapped PValues,” SIGKDD Explorations, Vol. 5, No. 2, pp. 16-22, 2003. [18] M. Pagano, and K. Gauvreau, Principles of Biostatistics, Belmont: Brooks/Cole, 2000. [19] W. Pan, “A Comparative Review of Statistical Methods for Discovering Differentially Expressed Genes in Replicated Microarray Experiments,” Bioinformatics, Vol. 18, No. 4, pp. 546-554, 2002. [20] J. Storey and R. Tibshirani, “Statistical Significance for Genome-wide Experiments,” Proceedings of the National Academy of Sciences of the United Stated of America, Vol. 100, No. 16, pp. 9440-9445, 2003. [21] O. G. Troyanskaya, M. E. Garber, P. O. Brown, D. Botstein, and R. B. Altman, “Nonparametric Methods for Identifying Differentially Expressed Genes in Microarray Data,” Bioinformatics, Vol. 18, No. 11, pp1454-1461, 2002. [22] Y. Zhao, and W. Pan, Modified Nonparametric Approaches to Detecting Differently Expressed Genes in Replicated Microarray Experiments, Bioinformatics, Vol. 19, No. 9, pp. 1046-1054, 2003.