The Influence of Model Averaging on Clade Posteriors: An Example
Transcription
The Influence of Model Averaging on Clade Posteriors: An Example
Syst. Biol. 57(6):905–919, 2008 c Society of Systematic Biologists Copyright ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150802562392 The Influence of Model Averaging on Clade Posteriors: An Example Using the Triggerfishes (Family Balistidae) ALEX D ORNBURG ,1 FRANCESCO S ANTINI ,2 AND M ICHAEL E. ALFARO 2 1 School of Biological Sciences, Washington State University, Pullman, Washington 99164, USA; E-mail: dornburgalex@yale.edu 2 Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095, USA Abstract.—Although substantial uncertainty typically surrounds the choice of the best model in most phylogenetic analyses, little is known about how accommodating this uncertainty affects phylogenetic inference. Here we explore the influence of Bayesian model averaging on the phylogenetic inference of the triggerfishes (Family: Balistidae), a charismatic group of reef fishes. We focus on clade support as this area has received little attention and is typically one of the most important outcomes of phylogenetic studies. We present a novel phylogenetic hypothesis for the family Balistidae based on an analysis of two mitochondrial (12S, 16S) and three nuclear genes (TMO-4C4, Rhodopsin, RAG1) sampled from 26 ingroup species. Despite the presence of substantial model uncertainty in almost all partitions of our data, we found model-averaged topologies and clade posteriors to be nearly identical to those conditioned on a single model. Furthermore, statistical comparison of clade posteriors using the Wilcoxon signed-rank test revealed no significant differences. Our results suggest that although current model-selection approaches are likely to lead to overparameterization of the substitution model, the consequences of conditioning on this overparameterized model are likely to be mild. Our phylogenetic results strongly support the monophyly of the triggerfishes but suggest that the genera Balistoides and Pseudobalistes are polyphyletic. Divergence time estimation supports a Miocene origin of the crown group. Despite the presence of several young species-rich subclades, statistical analysis of temporal diversification patterns reveals no significant increase in the rate of cladogenesis across geologic time intervals. [Balistidae; Bayesian methods; diversification rates; macroevolution; model averaging; molecular clock; Tetraodontiformes.] The importance of model choice to phylogenetic inference is now widely recognized (reviewed in Sullivan and Joyce, 2005) and model selection procedures are commonly performed as a part of nearly all phylogenetic analyses. One limitation of most current approaches is that the pool of candidate models is small relative to the universe of reasonable models. For example, ModelTest (Posada and Crandall, 1998) currently considers only eight models with respect to substitution class, representing less than 4% of the possible 203 timereversible models (Huelsenbeck et al., 2004). When the candidate pool is expanded to include the entire universe of substitution models, “unnamed” models frequently fit the data better than the best-fit model identified by ModelTest (Posada and Crandall, 1998) and the most parameter-rich substitution model, GTR, rarely emerges as the best model, even for large multigene data sets (Huelsenbeck et al., 2004). If this result is generally true, it implies that most phylogenetic analyses rely on overparameterized models with respect to substitution type. Indeed, Kelchner and Thomas (2006) found that over 60% of the publications they surveyed used a variant of GTR. The effects of this overparameterization have not been studied but theoretically should lead to an increase in the error variance surrounding each parameter (Burnham and Anderson, 2003). This tradeoff could be particularly troublesome for phylogenetic inference as the topology itself is a parameter being inferred. A second, related concern for most commonly used model selection procedures is model uncertainty. Although substantial uncertainty typically surrounds the choice of the best model in most phylogenetic studies (Huelsenbeck et al., 2004; Alfaro and Huelsenbeck, 2006), the influence of this uncertainty on phylogenetic inference is not well understood. Theoretically, conditioning inference on a model that is only marginally better than other candidate models should lead to an underestimation of parameter variance and thus overconfidence (e.g., Hoeting et al., 1999; Buckley et al., 2002; Alfaro and Huelsenbeck, 2006). Empirically, studies suggest that accommodation of model uncertainty has only a minor effect on tree topology (e.g., Posada and Crandall, 2001; Beier et al., 2004; Nylander, 2004; Posada and Buckley, 2004). However, model uncertainty may have an especially pronounced effect on clade posterior probabilities as these have been shown to be particularly sensitive to model violation and underparameterization (Felsenstein, 1978; Huelsenbeck and Hillis, 1993; Sullivan and Swofford, 1997). Bayesian model averaging is a computationally tractable alternative to the current practice of conditioning analyses on a single phylogenetic model (Huelsenbeck et al., 2004; Posada and Buckley, 2004; Pagel and Meade, 2006; Posada, 2008). Using this method, each model contributes to the phylogenetic inference in proportion to its posterior probability (Huelsenbeck et al., 2004; Nylander, 2004). Despite the potential promise of this approach, it has been implemented in only a small number of studies (e.g., Beier et al., 2004; Huelsenbeck et al., 2004; Nylander, 2004; Lee and Hugall, 2006; Alfaro and Huelsenbeck, 2006). Here we use Bayesian model averaging and model averaging based on Akaike weights (Akaike, 1973) to explore the influence of model uncertainty on the phylogenetic inference of triggerfishes, a charismatic group of reef fishes. Triggerfish Phylogenetics Members of the family Balistidae (order Tetraodontiformes) are mostly tropical in distribution (Kuiter and Debelius, 2006) and are some of the most conspicuous members of the diurnal reef community. The 42 species 905 906 SYSTEMATIC BIOLOGY in this clade exhibit a high degree of ecological diversity. Owing to their small and powerful jaws, as well as a novel feeding repertoire that includes buccal manipulation (Wainwright and Friel, 2000), triggerfish are able to exploit a wide variety of invertebrate prey. Indeed, several balistids have been recognized as “keystone species” that control populations of Acanthaster planci, a sea star known to cause severe damage to coral reefs at high population densities (Ormond et al., 1973; Chen et al., 2001). Balistid intrafamilial relationships are poorly understood despite two previous phylogenetic studies based on morphological characters. Matsuura (1979), in his analysis of the osteology of extant balistoids, suggested that Canthidermis represents the sister group to all remaining triggerfishes due to its possession of “primitive” scale bones. He divided the remaining balistids into three lineages: (1) Abalistes; (2) a Rhinecantus + Sufflamen clade supported by a modified interhyal bone; and (3) an unresolved clade formed by all remaining genera. Tyler (1980) performed an evolutionary taxonomic investigation of the osteology and external features of both extant and fossil balistids and hypothesized that the Oligocene genera Balistomorphus and Oligobalistes represented ancestors of the modern crown group. Given the absence of a preopercular groove, Tyler (1980) further hypothesized Rhinecanthus and Balistapus to be the “most generalized” and hence most primitive extant triggerfish. Odonus and Xanthichthys were considered to be the “most derived” taxa, united by the presence of fanglike second medial teeth and by a strongly supraterminal mouth. The presence of a slightly supraterminal mouth identified Melichthys as the sister group to these previous two genera. The relationships of the remaining genera were not resolved (Tyler, 1980: fig. 66). Both of these studies assumed all balistid genera to be monophyletic though this hypothesis has never been explicitly tested. Subsequent morphological investigations of tetraodontiform relationships (Winterbottom, 1974; Leis, 1984; Santini and Tyler, 2003, 2004) included only a small number of balistids, leaving many questions about triggerfish relationships unanswered. Recently Holcroft (2005) included 14 balistids as part of a molecular study of tetraodontiform relationships. Her analysis retrieved four major groupings: (1) Melichthys niger; (2) Balistoides conspicillum; (3) a clade composed of Xanthichthys auromarginatus and Balistoides viridescens; (4) a clade of all remaining balistids including Canthidermis, Rhinecanthus, and Balistapus. Her topology is incongruent with many of the relationships presented in prior morphological studies, providing strong evidence for a polyphyletic Balistoides and placing Melichthys niger as sister to the remaining balistids, rather than Canthidermis and Abalistes (Matsuura, 1979) or Balistapus and Rhinecanthus (Tyler, 1980). Age and Evolutionary History of the Triggerfishes The fossil record of the Tetraodontiformes suggests that triggerfish are a relatively young group. Stem balistoids are not known until the Oligocene (35 Ma) despite a tetraodontiform fossil record that extends back to the VOL. 57 Cretaceous. This observation led Tyler and Santini (2002) to speculate that triggerfishes and filefishes were the last of the crown tetraodontiform families to appear. Recently two studies have presented conflicting estimates of balistid divergence times based on fossil-calibrated molecular data. Yamanoue et al. (2006) dated the split between balistids and their sister group, the monacanthids, at approximately 129.5 Ma, in the early Cretaceous but, due to the presence of only one triggerfish in their data set, could not determine the time of origin of the crown group. More recently, Alfaro et al. (2007) reanalyzed a data set containing Holcroft’s (2005) data, as well as newly sequenced taxa, in conjunction with 11 fossil calibration points and estimated that the split between the triggerfish and their sister taxon was about 90 million years younger (late Eocene, ∼40 Ma), with crown triggerfish first appearing between the late Oligocene and the early Miocene (∼25 Ma). Given the large discrepancy between these divergence time estimates, it is not surprising that they support conflicting hypotheses as explanations for the origin and subsequent diversification of triggerfishes. Alfaro et al. (2007) have suggested that reef association in triggerfishes is correlated with higher than expected rates of diversification within the family. A Jurassic or early Cretaceous origin of the group would undermine this hypothesis because scleractinian coral reefs are not known to extend past the Tertiary. Questions about diversification patterns of the triggerfishes at a finer scale have yet to be addressed. For example, Floeter et al. (2007) recently suggested two potentially important “speciation bursts” for numerous Atlantic reef fish groups corresponding to the late Miocene through Pliocene (8 to 2 Ma) and the Pliestocene (<1.5 Ma). It is currently not known if triggerfish diversification shows a similar pattern of increase. In fact, despite their circumtropical distribution (Table 1), to our knowledge there are no previous hypotheses of triggerfish biogeography or diversification. Here we test whether triggerfish subclades experienced significant increases in diversification rates over their evolutionary history. Objective We statistically compare single-model and modelaveraged posterior probabilities for clades of triggerfishes to investigate the influence of model uncertainty on phylogenetic confidence. We adopt a Bayesian approach using a reversible-jump Markov chain Monte Carlo sampler that allows model posteriors to be easily calculated (Green, 1995) and compare this to a procedure based on Akaike weights (Posada and Buckley, 2004). We also present the first detailed molecular phylogenetic study of the triggerfishes and integrate our data from multiple nuclear and mitochondrial loci with previously published morphological data to produce a total evidence topology for the family. Finally, we quantify temporal patterns of balistid cladogenesis using a relaxed clock method and use our chronogram as a framework to statistically test several hypotheses of balistid diversification. 2008 907 DORNBURG ET AL.—MODEL-AVERAGED PHYLOGENY OF THE BALISTIDAE TABLE 1. List of Balistid taxa examined in this study, locality data, voucher numbers, and GenBank accession numbers. GenBank accession number Taxon Abalistes stellatus Balistapus undulates Balistes capriscus Balistes polylepis Balistes punctatus Balistes vetula Balistoides conspicillum Balistoides viridescens Canthidermis maculatus Melichthys niger Melichthys vidua Odonus niger Pseudobalistes flavimarginatus Pseudobalistes fuscus Rhinecanthus aculeatus Rhinecanthus assassi Rhinecanthus lunula Rhinecanthus rectangulus Rhinecanthus verrucosus Sufflamen albicaudatum Sufflamen bursa Sufflamen chrysopterum Sufflamen fraenatum Xanthichthys auromarginatus Xanthichthys mento Xanthichthys ringens Locality Voucher 12S 16S Rhodopsin Tmo-4C4 RAG1 Marine Wholesale Fiji Alabama, USA Genbank Ghana Smithsonian Marine Wholesale Genbank Alabama, USA Oahu, HI Marine Wholesale Fiji Marine Wholesale Indian Ocean Fiji Red Sea Marine Wholesale Sri Lanka Solomon Islands Solomon Islands Marine Wholesale Marine Wholesale Genbank Oceania Caribbean Marine Wholesale PW-1324 MEA 264 CU-90721 KU 28370 MEA 264 5243 — KU uncat. CU-90732 MEA 312 MEA 168 MEA 129 — MEA 115 MEA 194 MEA 167 MEA 142 MEA 110 MEA 288 MEA 145 MEA 198 PW-1325 — MEA 164 MEA 116 MEA 132 AY700248 EU108802 AY700238 AY700239 EU108801 AY700240 AY700241 AY700250 AY700242 AY700243 EU108803 EU108804 EU108805 AY700244 AY700247 AY700245 EU108806 EU108807 EU108808 EU108809 AY700249 AY700251 NC 004416 AY700246 EU108810 EU108811 AY679632 EU108813 AY679622 AY679623 EU108812 AY679624 AY679625 AY679634 AY679626 AY679627 EU108814 EU108815 EU108816 AY679628 AY679631 AY679629 EU108817 EU108818 EU108819 EU108820 AY679633 AY679634 NC 004416 AY679630 EU108821 EU108822 EU108845 EU108849 EU108846 — EU108848 EU108850 EU108847 — EU108851 EU108852 EU108853 EU108854 EU108855 EU108856 EU108857 EU108858 EU108859 EU108860 EU108861 EU108862 EU108863 EU108864 — EU108865 EU108866 EU108867 EU108823 EU108826 EU108824 — EU108827 EU108828 EU108825 — EU108829 — EU108830 EU108831 EU108832 EU108833 EU108834 EU108835 EU108836 EU108837 EU108838 EU108839 EU108840 EU108841 — EU108842 EU108843 EU108844 AY700318 EU108869 AY700308 AY700309 EU108868 AY700310 AY700311 AY700320 AY700312 AY700313 EU108870 EU108871 EU108872 AY700314 AY308790 AY700315 EU108873 EU108874 EU108875 EU108876 AY700319 — AY700321 AY700316 EU108877 EU108878 M ETHODS Sampling Samples were obtained through tissue loan, marine wholesale, and field collection with voucher specimens deposited into the collection of the Charles R. Conner Museum at Washington State University (Table 1). Additional sequences were downloaded from GenBank (Table 1). Filefish (monacanthids) are uncontroversially recognized as the sister group to the triggerfishes (Winterbottom, 1974; Matsuura, 1979; Rosen, 1984; Santini and Tyler, 2003, 2004; Holcroft, 2005; Yamanoue et al., 2006; Alfaro et al., 2007) and we included three species to serve as outgroups in our study. Our ingroup sample included 26 species and 11 genera of balistids. This includes all genera of extant triggerfish lineages, except the most recently described rare genus Xenobalistes (Matsuura, 1981). DNA Extraction, PCR Amplification, and Sequencing Muscle tissue samples were stored in 70% ethanol prior to use. DNA was extracted for most taxa using the Chelex (Bio-Rad) protocol described in Walsh et al. (1991). Additional extractions for Balistes vetula, Pseudobalistes fuscus, and Balistes capriscus utilized the PureGene extraction kit and protocol (Gentra Systems). We used the polymerase chain reaction (PCR; Saiki, 1990) to amplify two mitochondrial genes, 12S rDNA (∼833 bp) and 16S rDNA (∼563 bp), and three nuclear genes, Rhodopsin (∼564 bp), Tmo4C4 (∼575 bp), and RAG1 (∼1471 bp). One microliter of genomic template was used per 25-µL reaction, containing 5 µL of 5× Go- Taq Flexi PCR buffer (Promega), 2 µL MgCl 2 (25 mM), 0.5µL dNTPs (8 µM), 1.25 µL of each primer (Table 2), and 0.125 µL of Promega GoTaq Flexi DNA polymerase (5 U/µL).Amplification of all gene fragments was conducted with an initial denaturing step at 94◦ C for 1 to 2 min; 37 cycles with a 0.5- to 1.5-min 94◦ C denaturing; a 45- to 75-s 48.5◦ C to 60◦ C annealing; and a 1- to 2-min 72◦ C extension, followed by an additional 5-min 72◦ C extension and a 10-min 23◦ C cool down. PCRs were performed on two MJ Research PTC-200 Peltier thermal cyclers and a Bio-Rad iCycler. All products were stored at −4◦ C after amplification. Excess dNTPs and unincorporated primers were removed from PCR products using ExoSap (Amersham Biosciences). Purified products were cycle-sequenced using the BigDye Terminator v.3.1 cycle sequencing kit (Applied BioScience) with each gene’s original or additional internal primers (Table 2) used for amplification. The cycle sequencing protocol consisted of 25 cycles with a 10-s 94◦ C denaturation, 5 s of 50◦ C annealing, and a 4-min 60◦ C extension. Sequences were produced at the Washington State University Center for Integrated Biotechnology Core Laboratory using an ABI 377 and an ABI3100. Sequence Alignment 12S and 16S rDNA sequences were aligned by eye to secondary structure models used in previously published studies of labrid fishes (Streelman et al., 2002; Clements et al., 2004; Westneat and Alfaro, 2005). Ambiguously aligned regions were identified by eye and removed prior to analysis for both mitochondrial genes. 908 VOL. 57 SYSTEMATIC BIOLOGY TABLE 2. Primers used for PCR amplification and sequencing. Gene 12s rDNA 16s rDNA Rag1 Rhodopsin Tmo-4C4 Primer Reference Phe2-L 12sd-R 12d-L 12Sb-H 16SAR-F 16SBR-R R-2510F R-3261R R-3098F DDRAG1F DDRag1R RH-545 RH-1073 TMO-FL-6A TMO-F1-5 TMO-RL-3 Holcroft, 2005 Holcroft, 2005 Holcroft, 2005 Holcroft, 2005 Holcroft, 2005 Holcroft, 2005 This study This study This study This study This study Chen et al., 2003 Chen et al., 2003 Clements et al., 2004 Clements et al., 2004 Clements et al., 2004 Alignment of the protein-coding genes (Rhodopsin, Rag1, and Tmo4C4) was trivial and done using a text editor (BBEdit, BareBones Software). Gene matrices were edited in Se-Al v.2.0 (Rambaut, 1996). We trimmed sequences to the size of the smallest fragment for each gene to minimize missing characters in the data matrix. Our final data matrix consisted of 754 bp of 12S, 475 bp of 16S, 404 bp of Rhodopsin, 545 bp of Tmo4C4, and 1205 bp of Rag1 for a total of 3383 characters used in analysis. Sequences were checked using NCBI’s BLAST and have been deposited in GenBank (Table 1). All aligned data matrices have been deposited in TreeBase (accession numbers: SN3541-20316 (12S), SN3541-20318 (16S), SN3541-20315 (RAG1), SN354120313 (Rhodopsin), SN3541-20310 (Tmo4C4), SN354120309 (concatenated data)). Bayesian Analysis We ran all MCMC chains in the analyses below for 20 million generations, sampling every 1000, as our preliminary analysis revealed this to be sufficient to ensure convergence of the chains. Convergence was assessed by visual inspection of the state likelihoods, potential scale reduction factors, and the average deviation of clade splits between replicate runs. To further insure that inference was based on samples from the target distribution, we discarded the first 25% of the 20,000 trees as burn-in. Morphological Analysis We assembled a list of 34 potentially informative characters by selecting all the characters that defined the familial clades of the Balistidae and Monacanthidae identified in Santini and Tyler (2003), as well as the characters used in Matsuura (1979) and Tyler (1980). We scored taxa using osteological descriptions from the literature as well as new clear and stained specimens (Appendix S1, available at http://www. systematicbiology.org). The morphological matrix was analyzed in MrBayes 3.1.2 (Ronquist and Huelsenbeck, 2003) using the default prior settings for morphological data. Visual inspection of model parameters and potential scale reduction factors revealed that the chain appeared to reach stationarity after 200,000 generations. Sequence AAA GCA TAA CAC TGA AGA TGT TAA GAT G GGG TTG GTA AAT CTC GTG C GCT GGC ACG AGT TTT ACC GGC C AGG AGG GTG ACG GGC GGT GTG T CGC CTG TTT ATC AAA AAC AT CCG GTC TGA ACT CAG ATC ACG T TGG CCA TCC GGG TMA ACA C CCC TCC ATY TCN CGM ACC ATC TT TGT GCC TGA TGY TYG TDG AYG ART TTC ACC AGT TTG AAT GGC AGC C AAC GCC TGA AYA GTT TAT TTG C GCA AGC CCA TCA GCA ACT TCC G CCR CAG CAC ARC GTG GTG ATC ATG GAA AAG AGT GTT TGA AAA TGA CCT CCG GCC TTC CTA AAA CCT CTC CAT CGT GCT CCT GGG TGA CAA AGT Bayesian Analysis of Molecular and Combined Data We analyzed each gene partition independently and performed a partitioned mixed model analysis of the combined data. For each data set we assigned default priors to all model parameters (topology: uniform; revmat: Dirichlet (1.0, 1.0, 1.0, 1.0, 1.0, 1.0); statefreq: Dirichlet (5.0, 5.0, 5.0, 5.0); pinvar: uniform (0.0, 1.0); Brlengths: exp (10.0); shape: uniform (0, 1)). We selected models of evolution for these analyses using two approaches (see Table 4): direct calculation of model posterior probabilities using RJ-MCMC (Huelsenbeck et al., 2004) and Akaike weights (Akaike, 1973) calculated using ModelTest (Posada and Crandall, 1998). We also performed a combined analysis of the morphological and molecular data by concatenating the morphological and gene matrices and using the MCMC chain parameters as described above. The mixed-partition analysis utilized the default settings for morphology and the best-fit model according to the AIC criterion. Single-Model and Model-Averaged Bayesian Analysis We used MrBayes 3.1.2 (Ronquist and Huelsenbeck, 2003) and a custom-written reversible jump MCMC sampler (Huelsenbeck et al., 2004) to perform singlemodel and model-averaged Bayesian analyses, respectively. Priors on all parameters of the model (with the exception of the substitution model itself) were identical for both sets of analysis: uniform on topology, flat Dirichlet (5.0, 5.0, 5.0, 5.0) on nucleotide frequencies, exp (10) on branch lengths, and a uniform (0, 50) prior on the gamma-shape parameter. Our reversible jump sampler further applied a uniform prior on the 203 possible substitution models. We performed parallel single-model and modelaveraged analyses using the following partitions of our data: all individual gene partitions, a concatenated mitochondrial gene (12S+16S) data matrix, a concatenated nuclear gene (Rhodopsin, RAG1, Tmo-4C4) data matrix, and the entire concatenated data. This approach allowed us to compare the effects of accommodating model uncertainty on data sets of varying length. We additionally performed seven single-model analyses on the following 2008 DORNBURG ET AL.—MODEL-AVERAGED PHYLOGENY OF THE BALISTIDAE data partitioning schemes for the concatenated data set: (1) One substitution model; (2) rDNA and lumped nuclear genes; (3) rDNA stem regions, rDNA loop regions, lumped nuclear genes; (4) rDNA and a model for each nuclear gene; (5) by gene; (6) rDNA stems, rDNA loops, each nuclear gene; (7) rDNA stems, rDNA loops, codon positions for each gene. We calculated marginal likelihoods and computed Bayes factor scores for each of these analyses to account for topological uncertainty as discussed in Brandley et al. (2005). Single-model clade posterior probabilities were regressed on model-averaged posteriors using the comparetree command in MrBayes and KaleidaGraph (Synergy Software). Statistical significance between the model-averaged posteriors and those contingent on a single model of sequence evolution was assessed using the nonparametric Wilcoxon signed-rank test (Simmons et al., 2004). Single-Model and Model-Averaged Maximum Likelihood Analysis We analyzed the concatenated molecular data set to assess the effect of nucleotide substitution model overparameterization on bootstrap support values. Corrected Akaike scores (AICc ) scores and Akaike weights for all possible models of nucleotide substitution were calculated and the best-fit model from ModelTest was compared to the 95% interval of credible models from the candidate pool. For each selected model, including the best-fit model, we used PAUP* 4.01b (Swofford, 2003) to perform 1000 bootstrap replicates. Each replicate incorporated two random sequence additions and the TBR branch-swapping algorithm. To save on computational time, a time limit of 4 h was assigned to each replicate. For the model-averaged analysis, the bootstrap value at each node was multiplied by the weight of the model and all values were summed to obtain a model-averaged bootstrap measure of support. These values were compared to the best-fit model’s support values and tested for significance using the Wilcoxon signed-rank test. Divergence Time Estimation We constrained three nodes in the balistid tree (Table 3) for divergence time analysis. Two of these calibrations were based on the fossil record (Table 3). The split between the Balistidae and the Monacanthidae (see Fig. 3, node 1) was based on four fossil stem balistids dated to 35 Ma: Balistomorphus orbiculatus, B. ovalis, B. spinosus, and Oligobalistes robustus (Tyler and Santini, 2002). We TABLE 3. Calibrations used in this study. Node MRCA of Monacanthidae and Balistidae Crown Balistidae Split Balistes and Canthidermis Minimum age/95% HPD (Myr) 35/70 22.9/29.9 5/50 Source Fossil Secondary constraint (Alfaro et al., 2007) Fossil 909 assigned a prior minimum age of 35 Myr to this calibration to reflect the age of these fossils and further assigned a mean age of 50 Myr (reflecting the appearance of several other tetraodontiform families in the fossil record) and an upper bound of 70 Myr (reflecting the appearance of the first stem tetraodontiforms) after Alfaro et al. (2007). We used soft upper bounds (i.e., upper bounds indicate the 95% cumulative density of the prior) on all fossil constraints to avoid artificially truncating the posterior distribution of our divergence time estimates (e.g., Yang and Rannala, 2006) We used the fossil Balistes procapriscus from the late Miocene to assign a minimum age of 5 Myr to the crown age of Balistes (Fig. 3, node 3; Santini and Tyler, 2003). We initially used this fossil to date the split between Balistes and Pseudobalistes fuscus following Alfaro et al. (2007); however, preliminary analysis revealed P. fuscus to be nested within Balistes. Based on this, we reassigned the calibration to the crown Balistes. We assigned an upper bound of 50 Myr to this calibration. We assigned a secondary constraint to the age of crown balistids (Fig. 3, node 2) corresponding to the 95% credible interval estimate for balistids from Alfaro et al. (2007). Our normally distributed prior assigned a mean age of 22.9 Myr to the split (d = 4.2 Myr). This age is congruent with the current paleontological evidence: no balistid fossils are known older than the middle Miocene (Schultz, 2004), whereas the stem balistids are at least 35 million years old. (Tyler and Santini, 2002). We estimated divergence times using the concatenated data under a model of uncorrelated log-normally distributed rates using BEAST (Drummond et al., 2006). A Yule (pure-birth) prior was assigned to rates of cladogenesis. Based on results from the MrBayes analysis, we partitioned our data into seven regions to allow a separate substitution model to be used for each ribosomal stem and loop region and also an individual model for each nuclear gene. We ran three independent analyses of 20 million generations, assessing convergence using Tracer 1.3 (Rambaut and Drummond, 2007). The first 25% of the generations were discarded as burn-in and the effective sample size (ESS) for model parameters was also assessed to check for good mixing of the MCMC (ESS exceeded 200 for all model parameters in our analysis). Diversification Statistics All diversification statistics were implemented in the software package R (R core-development team, 2006), using functions in the package Geiger (Harmon et al., 2008) and APE (Paradis et al., 2004). The global diversification rate of the Balistidae (λG ) as well as the diversification rate of five focal subclades was calculated using the method-of-moments estimator from Magallon and Sanderson (2001). We further tested whether diversification rates of subclades with a Pliocene/Pleistocene crown age differed significantly from λG using the method of Magallon and Sanderson (2001). To account for the pull of the present (Pybus and Harvey, 2000) in these estimates, we used 910 VOL. 57 SYSTEMATIC BIOLOGY extinction rates ranging from 0 to 0.5 in our estimates (see Table 7). These values represented the confidence interval obtained using the birth-death function to calculate relative extinction in the package Geiger (Harmon et al., 2008). To test for a nonconstant triggerfish diversification rate given our incomplete taxonomic sampling, we used the MCCR test function (Pybus and Harvey, 2000; Pybus et al., 2002) based on 20,000 simulations in Geiger (Harmon et al., 2008). The MCCR assumes no significant difference in diversification rates between lineages. We tested this assumption using the relative cladogenesis statistic (Nee et al., 1992). To test for significantly elevated rates of cladogenesis during specific time intervals, we used a novel function (Brock, unpublished) to calculate Kendall-Moran estimates of diversification rate (r) based on a pure-birth process (Baldwin and Sanderson, 1998; Nee, 2001) for each major geologic period, as well as each subdivision of the Miocene. This test accounts for incomplete taxon sampling and the impact of extinction on the distribution of waiting times (“pull of the present”; e.g., Pybus and Harvey, 2000). We compared observed values of r for each time interval to a null distribution generated by the simulation of 20,000 birth-death trees under global estimates of triggerfish diversification and extinction rates. R ESULTS Bayesian Analysis The average standard deviation of the clade splits between independent runs was less than 0.1% and potential scale reduction factors (Gelman and Rubin, 1992) for all parameters were approximately 1.00 for all analyses, suggesting that we adequately sampled the target distributions. Comparison of Bayes factor scores revealed that assigning separate models to each ribosomal stem or loop region, as well as each individual gene (seven separate partitions total), fit our data best. Analysis of the concatenated data set recovered a well-resolved phylogeny of the balistids and revealed five major clades: (1) Balistes (including Pseudobalistes fuscus); (2) Rhinecanthus; (3) Sufflamen; (4) Canthidermis + Abalistes; (5) all remaining balistids not in clades 1 to 4 (Fig. 1). The genera Balistes (clade 1) is strongly supported as the sister group to the remaining balistids. Pseudobalistes fuscus appears deeply nested within this group, a placement that renders the genus Balistes paraphyletic. Sufflamen (clade 2) is strongly supported as sister to Rhinecanthus (clade 3). The relationship of R. lunula, R. rectangulus, and R. verrucosus was unresolved, though these taxa formed the sister group to the remaining Rhinecanthus species. Clade 5 shows Balistoides and Pseudobalistes to be polyphyletic as currently defined. In the latter case, the two species of Pseudobalistes are recovered in clades 1 and 5, respectively. The two species of Balistoides are placed within two different subclades of clade 5, with strong support for a sister relationship between Balistoides viridescens and Pseudobalistes flavimarginatus. The remaining Balistoides in our study, B. conspicillum, forms the sister TABLE 4. Uncertainty in model choice. Best-fit models selected using the AIC (Akaike, 1973) implemented in ModelTest (Posada and Crandall, 1998) and PAUP (Swofford, 2003) compared to the posterior probability of the most visited model by the RJ-MCMC sampler (Huelsenbeck et al., 2004) for all data sets. Probability of a model is equal to the frequency it was visited by the RJ-MCMC sampler. Model averaging Gene partition 12S 16S RAG1 Rhodopsin Tmo-4C4 12S + 16S Concatenated data Model selected GTR+G 1, 1, 1, 2, 3, 2a GTR+G 1, 1, 1, 1, 2, 1a 1, 1, 1, 1, 2, 1a GTR+G 1, 2, 3, 4, 5, 4a Probability 0.50800 0.84673 0.24566 0.29684 0.61910 0.70176 0.57004 Model test Model selected Weight (AIC) GTR+I+G SYM+I+G GTR+I+G HKY+I+G K81uf+I GTR+I+G GTR+I+G 0.9956 0.7895 0.6931 0.3185 0.1586 0.9968 0.9990 a Unnamed models represented by substitution rate matrix (see Huelsenbeck et al., 2004). group to Melichthys. This placement leaves B. conspicillum placed deep within a clade consisting of Balistapus, Odonus, and Melichthys. The combined morphological/molecular tree (data not shown) was perfectly congruent with the tree based on molecular data only (Fig. 1). We attribute this to a nearly complete lack of resolution provided by analysis of the morphological data set only (data not shown). Posterior probabilities were qualitatively similar between the molecular and the molecular + morphological data and not significantly different by the Wilcoxon signedrank test (P = 0.232). Single-Model and Model-Averaged Bayesian Analysis Our analysis revealed that the most probable model was not always congruent with the model chosen by ModelTest (Table 4). Four of our seven data sets revealed the model with the highest posterior probability to be an unnamed model. Additionally, the posterior probability of the most visited model by the RJ-MCMC sampler was lower than the probability of the “best-fit” model chosen using the AIC in ModelTest for five of the seven data sets. Our model-averaged topology (Fig. 1) was qualitatively similar to the topology conditioned on a single model, with no conflicts between strongly supported relationships (PP > 95%). Visual inspection of clade support values revealed little difference between modelaveraged and single-model analyses (Fig. 1) for highly supported nodes (PP > 95%). Although qualitative differences were more obvious for lower support values (see Fig. 4), a Wilcoxon signed-rank test revealed that these were not statistically significant (Table 5). Further tests of PPs of <90 % and <50% were also not significant (P > 0.4 for all, data not shown). Single-Model and Model-Averaged Maximum Likelihood Analysis Maximum likelihood analysis recovered a single best topology (−LnL = −14,338.09) qualitatively similar to the tree inferred by Bayesian methods (Fig. 2). Visual 2008 911 DORNBURG ET AL.—MODEL-AVERAGED PHYLOGENY OF THE BALISTIDAE Paraluteres prionurus 1.0 1.0 Pervagor janthinosoma Outgroup Monacanthus ciliatus 1.0 1.0 Balistes punctatus Balistes vetula .95 .99 Pseudobalistes fuscus .95 (1) Balistes capriscus 1.0 1.0 Balistes polylepis 1.0 1.0 1.0 1.0 Rhinecanthus assasi Rhinecanthus aculeatus Rhinecanthus rectangulus 1.0 1.0 (2) Rhinecanthus lunula 1.0 .95 Rhinecanthus verrucosus Sufflamen fraenatum 1.0 1.0 Sufflamen bursa 1.0 .97 .97 .98 1.0 1.0 Sufflamen albicaudatum Sufflamen chrysopterum Abalistes stellatus Canthidermis maculata (4) Balistoides viridescens 1.0 1.0 Pseudobalistes flavimarginatus 1.0 1.0 Xanthichthys mento 1.0 1.0 Xanthichthys auromarginatus .96 .95 1.0 .98 Xanthichthys ringens Balistapus undulatus 1.0 .96 0.03 substitutions/site (3) Balistidae 1.0 .98 (5) Odonus niger 1.0 .99 Balistoides conspicillum .99 .98 1.0 1.0 Melichthys niger Melichthys vidua FIGURE 1. Fifty percent majority-rule consensus tree resulting from the single-model and model-averaged Bayesian analyses of the molecular concatenated data set. Posterior probabilities greater than 0.95 inferred by the single model analysis are shown above each node, model averaged PPs are depicted below. Branch lengths are in substitution units based on analysis of the molecular data by the single model only. The branch leading to Sufflamen fraenatum has been scaled by 50% to fit into this figure. Clade numbers (1 to 5) represent identified clades in the text: (1) Balistes; (2) Rhinecanthus; (3) Sufflamen; (4) Canthidermis + Abalistes; (5) all remaining balistids. 912 VOL. 57 SYSTEMATIC BIOLOGY TABLE 5. Results of Wilcoxon signed rank test. Comparison of posterior probabilities inferred by model averaging, MrBayes, and BEAST. All data sets analyzed produced statistically non-significant differences in phylogenetic inference between model-averaged results and those conditioned on a single substitution model. Data set Methods compared P-value Tmo-4C4 Rhodopsin RAG1 12S Concatenated data Concatenated data Concatenated data Model averaging vs. MrBayes Model averaging vs. MrBayes Model averaging vs. MrBayes Model averaging vs. MrBayes Model averaging vs. MrBayes BEAST vs. model averaging BEAST vs. MrBayes 0.1344 0.1814 0.4911 0.1817 0.1013 0.3210 0.2487 inspection of single-model versus model-averaged bootstrap values reveals little fluctuation around highly supported nodes. Additionally, support values below 90 appear not to experience high levels of fluctuation. The Wilcoxon signed-rank test reveals no statistical significant difference in bootstrap support values between single-model and model-averaged analyses (P > 0.97). Additional tests reveal no statistical difference in bootstrap values greater than 90 (P > 0.99) and no significant differences in values between 50 and 90 (P > 0.84). Divergence Time Estimation Our BEAST topology (Fig. 3) revealed the same major clades as the model-averaged consensus tree (Fig. 1), with some topological differences (see below). Additionally, this analysis recovered slightly higher support at some nodes (though these differences were not statistically significant; Table 5). We recover a crown age of the balistids at approximately 11.3 Myr (Table 6, node 2). Our chronogram places the stem ages of A. stellatus and Canthidermis maculata (Table 6, nodes 3 and 4) at approximately 10.0 and 9.9 Myr, respectively, indicating these genera as belonging among the oldest extant balistid lineages. Our analysis recovers Balistes deeply nested within the balistids, though this placement is weakly supported. Crown Sufflamen originated approximately 6.1 Ma (Table 6, node 5), whereas the crown age of Rhinecanthus (Table 6, node 10) indicates the group to be relatively young, appearing approximately 3.3 Ma. Eleven of the 24 nodes appear within the past 4 Myr, with Rhinecanthus, Xanthichthys, and Melichthys all being relatively young genera that originated in the last 4 to 2 Myr. Diversification Statistics We estimated a diversification rate (λG ) of 0.25 for crown triggerfishes. The MCCR test (Pybus and Harvey, 2000; Pybus et al., 2002) failed to reject a hypothesis of constant diversification rates for the triggerfishes (P = 0.10). Our global extinction rate was estimated to be 0 and the same log-likelihood score (−14.42) was given to the purebirth model using Magallon and Sanderson’s (2001) equation. Based on these results, we were unable TABLE 6. Median divergence estimates for Balistid nodes. Median age (Myr)/ 95% HPD Node 1. MRCA Monacanthidae + Balistidae 2. Crown Balistidae 3. MRCA Canthidermis + Abalistes + Sufflamen + Rhinecanthus 4. MRCA Abalistes + Sufflamen + Rhinecanthus 5. Crown Sufflamen 6. MRCA S. bursa + S. chrysopterum + S. albicaudatum 7. MRCA S. chrysopterum + S. albicaudatum 8. MRCA Sufflamen + Rhinecanthus 9. MRCA R. assasi + R. aculeatus 10. Crown Rhinecanthus 11. MRCA R. rectangulus + R. verrucosus + R. lunula 12. MRCA Rhinecanthus verrucosus + R. lunula 13. Crown Balistes 14. MRCA “B.” fuscus + B. vetula + B. capriscus + B. polylepis 15. MRCA B. vetula + B. capriscus + B. polylepis Abalistes Abalistes 16. MRCA B. capriscus + B. polylepis 17. MRCA Xanthichthys + “Pseudobalistes” + Balistes Balistapus + Melichthys + Odonus + Balistoides 18. Crown “Pseudobalistes” 19. MRCA Xanthichthys + “Pseudobalistes” 20. Crown Xanthichthys 21. MRCA X. auromarginatus + X. ringens 22. MRCA Xanthichthys + “Pseudobalistes” Balistapus + Melichthys + Odonus + Balistoides 23. MRCA Balistapus + Odonus 24. MRCA Odonus + Melichthys + Balistoides + Balistapus 25. MRCA Balistoides + Melichthys 26. Crown Melichthys 36.6/(35.2, 39.7) 11.3/(8.2, 15.9) 10.0/(7.1, 13.7) 9.9/(7.2, 13.7) 6.2/(4.0, 9.1) 3.9/(2.4, 5.9) 1.8/(0.7, 3.0) 8.9/(6.0, 12.8) 1.5/(0.7, 2.7) 3.3/(2.0, 5.3) 1.9/(1.0, 3.0) 1.6/(0.8, 3.0) 7.9/(5.8, 10.8) 6.7/(4.8, 9.4) 5.1/(3.3, 7.3) 1.5/(0.6, 2.5) 10.2/(7.6, 14.2) 0.9/(0.3, 1.8) 6.3/(4.0, 9.2) 2.0/(1.0, 3.2) 0.9/(0.3, 1.7) 9.1/(6.5, 12.5) 5.7/(3.4, 8.1) 7.7/(5.4, 10.7) 1.7/(4.1, 9.0) 2.5/(1.3, 4.1) to reject a purebirth model for the balistid diversification. Our Kendall-Moran estimates of speciation rate (Baldwin and Sanderson, 1998; Nee, 2001) across given time intervals revealed fluctuations in diversification rates over given time periods; however, none of these results were significantly different from λG (Table 7). Assessing rates of cladogenesis between lineages revealed no statistically significant rapid radiations for any of the major clades, including major lineages with crown ages in the Pliocene/Pleistocene such as Xanthichthys and Rhinecanthus. D ISCUSSION Despite often substantial uncertainty surrounding model choice and frequently overparameterized substitution models, our analysis revealed that modelaveraging had only a modest influence on inference of triggerfish phylogeny and clade support. This suggests that although the widespread use of GTR substitution models in phylogenetics is probably not statistically justified, phylogenetic inference is likely robust to both overparameterization of and uncertainty surrounding the substitution model. 2008 913 DORNBURG ET AL.—MODEL-AVERAGED PHYLOGENY OF THE BALISTIDAE Paraluteres prionurus Pervagor janthinosoma Outgroup Monacanthus ciliatus 100 100 Balistes punctatus Balistes vetula 92 92 Pseudobalistes fuscus 99 99 Balistes capriscus 99 99 Rhinecanthus aculeatus 96 96 (1) Balistes polylepis Rhinecanthus assasi Rhinecanthus rectangulus (2) 95 Rhinecanthus lunula 100 100 Rhinecanthus verrucosus Sufflamen fraenatum Sufflamen bursa 90 90 100 100 Sufflamen chrysopterum (3) Sufflamen albicaudatum Balistidae 92 92 Abalistes stellatus (4) Canthidermis maculata 100 Pseudobalistes flavimarginatus Balistoides viridescens 96 95 Xanthichthys mento 100 100 95 94 Xanthichthys auromarginatus Xanthichthys ringens (5) Balistapus undulatus Odonus niger 94 94 Balistoides conspicillum 100 100 Melichthys niger Melichthys vidua FIGURE 2. Maximum likelihood consensus tree estimated by single-model and model-averaged analysis. Single-model bootstrap support greater than 90 shown above each branch, model averaged support is depicted below. Clade numbers (1 to 5) represent the major clades referenced in the text. 914 SYSTEMATIC BIOLOGY VOL. 57 FIGURE 3. Fifty percent majority-rule consensus chronogram resulting from the concatenated data set. Posterior probabilities greater than 0.95 are indicated at each node by squares. Branch lengths are in units of time corresponding to upper and lower scale bars. Upper scale bar marks major geological intervals of interest, lower scale bar displays time (Ma) since present. All numbered nodes correspond to Table 6, where ages and 95% HPD are given. Nodes 1 (insert), 2, and 3 correspond with calibration points referenced in the text (Table 3). 2008 DORNBURG ET AL.—MODEL-AVERAGED PHYLOGENY OF THE BALISTIDAE 915 FIGURE 4. Linear regressions of model-averaged posteriors compared to those inferred in MrBayes for six gene partitions: (a) expected regression of two identical runs in GTR + G only, (b) expected regression of two identical model-averaged analyses, (c) Rhodopsin, (d) concatenated molecular data set, (e) 12S, (f) RAG1. Posteriors include all possible bipartitions, analyzed using the comparetree command in MrBayes. 916 SYSTEMATIC BIOLOGY TABLE 7. Kendall Moran estimates of speciation rate (λ). Median divergence time estimates for selected Balistid nodes (with 95% HPD). Node numbers correspond to numbered nodes in Figure 4. Geological time division Middle Miocene Late Miocene Pliocene Pleistocene KM estimate of λ P-value 0.3213 0.1762 0.1136 0.16607 0.3533 0.4308 0.6125 0.3729 In addition, our analyses provide the first species-level phylogeny for triggerfishes and suggest several novel hypotheses of their evolutionary history. Below we consider the value of model-averaging in phylogenetics as well as the implications of our combined data phylogeny and chronogram for the triggerfishes. Model-Averaging Our results are consistent with several previous studies that have shown topology to be generally robust to model uncertainty (e.g., Posada and Crandall, 2001; Beier et al., 2004; Nylander, 2004; Posada and Buckley, 2004). Although it has been suggested that model-averaged posterior probabilities differ from single-model posteriors (Beier et al., 2004), our statistical analyses suggest that these differences are not likely to be significant. Thus, with respect to substitution parameters, both topology and clade support appear to be relatively robust to the uncertainty surrounding model choice. One reason for this may be that the current practice of selecting models from a relatively small fraction of the possible substitution models still leads investigators to reasonably good models for their data. In our study, even though the GTR model did not always receive the highest posterior probability for any data partitions (Table 4), it frequently appeared within the 95% credible interval. Previously, Beier et al. (2004) noted visual differences between low model-averaged clade support values and those conditioned on a single model. As other studies have suggested (Huelsenbeck and Rannala, 2004; Lemmon and Moriarty, 2004), slight overparameterization does not seem to cause substantial problems in phylogenetic inference. However, the effects of gross overparameterization on phylogenetic inference of posterior probabilities and other parameters have not been systematically addressed and we suggest that the assessments of model adequacy and uncertainty are appropriate for phylogenetic statistical studies of complex data sets. Although we witnessed differences of clade support values of as much as 18% for poorly supported nodes in some of our data sets, none of these results were significant. The Wilcoxon signed-rank test is conservative by nature, and we do not know how qualitatively different PPs could be without yielding a significant result. Given that many analyses are potentially overparameterizing with respect to substitution model (see Kelchner and Thomas, 2006), our results should not be VOL. 57 taken to mean that accommodation of model uncertainty is not helpful. The method may provide more reliable estimates of branch lengths or of substitution rates for studies specifically focused on these parameters. Indeed, we observed qualitative differences between branch lengths estimated by single-model and model-averaged analyses for data sets used in this study (analysis by A.D. and M.E.A.). Furthermore, reversible-jump algorithms could allow for averaging across more divergent models. Accommodating uncertainty surrounding partitioning strategies, for example, might have a profound influence on topology and clade posteriors. Phylogenetic Relationships of the Balistidae Triggerfish represent one of the most conspicuous components of the diurnal coral reef fauna worldwide, yet until now little was known about their interspecific phylogenetic relationships. Our results conflict with several prior phylogenetic hypotheses concerning the oldest lineages of the balistids. The deeply nested placement and young age of Melichthys refutes Holcroft’s (2005) analysis, which proposed Melichthys niger to be sister to the rest of the triggerfish. Our analyses refute Tyler’s (1980) hypothesis that Balistapus represents one of the oldest lineages of the triggerfish and instead suggests that the absence of a preopercular groove represents a secondary loss. The long branches leading to Canthidermis and Abalistes tentatively support Matsuura’s (1979) proposal that taxa represent the oldest extant lineages of the balistids, though incomplete sampling prevented us from obtaining a crown age estimate for either lineage. Our results also contradict Tyler’s (1980) suggestion that Xanthichthys and Odonus are sister groups and suggest instead that a strongly supraterminal mouth and second medial teeth have evolved independently in each of these lineages. Additionally, the sister relationship between Sufflamen and Rhinecanthus recovered by our analysis was originally proposed by Matsuura (1979) and suggests that the modified interhyal observed in these lineages either arose in a recent common ancestor or was lost in other triggerfish clades. We recovered an early split between a clade of Balistes (+ Pseudobalistes fuscus) from the remaining triggerfish. Our results indicate that three currently recognized genera—Balistoides, Balistes, and Pseudobalistes—are nonmonophyletic and in need of nomenclatural revision. We propose a taxonomic regression of Pseudobalistes fuscus to Balistes fuscus (Bloch and Schneider, 1801) to retain the monophyly of the genus Balistes. To resolve the polyphyly of Balistoides, we suggest a revision from Balistoides viridescens (Fraser-Brunner, 1935) to Pseudobalistes viridescens. This is a classification originally proposed in Bloch and Schneider (1801) that is strongly supported by our data and retains the monophyly of the genus Pseudobalistes. This classification also identifies Balistoides to be a monotypic genus comprising B. conspicillum, a genetically unique member of the family Balistidae. 2008 DORNBURG ET AL.—MODEL-AVERAGED PHYLOGENY OF THE BALISTIDAE Divergence Times and Diversification Although we use the same calibrations in our analysis as were used in a recent study of all tetraodontiforms (Alfaro et al., 2007), we recover a slightly younger crown group age for triggerfish (originating 25 versus 11.3 Ma). This discrepancy is non significant, however, as the 95% highest posterior density intervals overlap between the two studies. Our estimates of an earlier appearance are still in accord with the known balistid fossil record (Tyler and Santini, 2002; Schultz, 2004) and also corresponds to the origins of several other species marine fish and invertebrates (e.g., Streelman et al., 2002; Klanten et al., 2004; Barber and Bellwood, 2005; Read et al., 2006; Wallace and Rosen, 2006). We consider our estimate of the triggerfish crown age to be inconsistent with a Jurassic/Cretaceous split between the filefish and triggerfish (Yamanoue et al., 2006), as this would imply a 100 Myr long fuse between the origin of the triggerfishes and their subsequent diversification. This discrepancy is expected and described in Alfaro et al. (2007), who argue that some of the calibration points used by Yamanoue et al. (2006) are fossils that have been erroneously dated. For example, the oldest gadiform fossil is 61 million years old and not 161 as stated by Yamanoue et al. (2006). Additionally, some of the other calibration points used by Yamanoue and colleagues are secondary calibration points recovered from the mammal/bird split. For these reasons we argue that the available tetraodontiform fossil data more strongly favors the younger stem and crown balistid ages recovered in our analysis than those presented by Yamanoue et al. (2006). Our chronogram reveals that many extant lineages are very young. Almost all triggerfish clades had formed by the Middle/Late Miocene, yet 19 out of the 26 taxa sampled hold their origins during the Pliocene/Pleistocene. Although visual inspection might suggest a recent elevation in the global diversification rate of triggerfish, statistical analysis shows that this trend is not significant. Instead, our results are suggestive that triggerfish as a whole did not experience elevated rates of diversification during paleoclimatic events associated with the Pliocene/Pleistocene as has been suggested in other groups (Taylor and Dodson, 1994; Palumbi et al., 1997; LaJeunesse, 2005; Floeter et al., 2007) but that their speciation rates may have stayed more consistent for alternate, and as yet unknown, reasons. Similarly, triggerfish subclades with crown ages in the Pliocene and Pleistocene are not diversifying more quickly than other subclades.This result underscores the need for rigorous statistical testing of macroevolutionary patterns, as visual inspection alone may not be enough to deduce patterns of change in diversification rate. Further ecological correlates of these results are difficult to explore. Despite the conspicuous nature of the balistids, robust published studies of the group’s ecology are disappointingly sparse (but see Bean et al., 2002), and further studies are needed before we are able to understand the ecological correlates underlying the group’s diversification. We propose that the triggerfish may be a model group with which to 917 study macroevolutionary patterns in marine fish, given the young age of the group, its circumglobal distribution, the ecological dependency on reefs of most of its members, and the availability of several well-preserved fossils. Further sampling may reveal novel morphological innovations or shed light on novel patterns of diversification that may be correlated with historical biogeographic hypotheses, including historical changes in currents and reef ecology as a result of climatic fluctuations during the Pliocene. ACKNOWLEDGMENTS We are incredibly grateful to all the people and institutions that have contributed to this work. We would especially like to thank Lindsay Godfrey for all the help sequencing, Chad Brock for help with diversification statistics, and Devin Drown for the help designing primers. We would also like to thank Hugo Alamillo, Barbara Banbury, Magnus Wood, and the rest of the Alfaro lab for their constant support during this project. This project would not have been possible without the tissue loans from Peter Wainwright at UC Davis, Jeffrey Hunt at the Smithsonian Institution, Mark McGrouther at the Australian Museum, and John Friel the Cornell University Museum of Vertebrates. A.D. received support for this project from a Washington State University Undergraduate Research Grant in Zoology, a WSU Center for Integrated Biotechnology Fellowship, and an NSF Undergraduate Research in Biology and Mathematics Fellowship (UBM 0531870). Additional support was provided by an NSF ITR grant (EB0336148) and by NSF DEB 0445453 to M.E.A. R EFERENCES Akaike, H. 1973. Information theory as an extension of the maximum likelihood principle. Pages 267–281 in Second Annual Symposium on Information Theory (B. N. Petrov, and F. Csaki, eds.). Akademi Kiado, Budapest. Alfaro, M. E., C. D. Brock, and F. Santini. 2007. Do reefs drive diversification in marine fish? Examples from the pufferfishes and their allies. Evolution 61:2104–2126. Alfaro, M. E., and J. P. Huelsenbeck. 2006. Comparative performance of Bayesian and AIC-based measures of phylogenetic model uncertainty. Syst. Biol. 55:89–96. Baldwin, B. G., and M. J. Sanderson. 1998. Age and rate of diversification of the Hawaiian silversword alliance (Compositae). Proc. Natl. Acad. Sci. USA 95:9402–9406. Barber, P. H., and D. R. Bellwood. 2005. Biodiversity hotspots: Evolutionary origins of biodiversity in wrasses (Halichoeres: Labridae) in the Indo-Pacific and new world tropics. Mol. Phylogenet. Evol. 35:235–253. Bean, K., G. P. Jones, and M. J. Caley. 2002. Relationships among distribution, abundance, and microhabitat specialization in a guild of coral reef triggerfish (family Balistidae). Mar. Ecol. Press Ser. 233:263–272. Beier, B. A., J. A. A. Nylander, M. W. Chase, and M. Thulin. 2004. Phylogenetic relationships and biogeography of the desert plant genus Fagonia (Zygophyllaceae), inferred by parsimony and Bayesian model averaging. Mol. Phylogenet. Evol. 33:91–108. Brandley, M., A. Schmitz, and T. W. Reeder. 2005. Partitioned Bayesian analyses, partition choice, and phylogenetic relationships of scincid lizards. Syst. Biol. 54:373–390. Buckley, T. R., P. Arensburger, C. Simon, and G. K. Chambers. 2002. Combined data, Bayesian phylogenetics, and the origin of the New Zealand cicada genera. Syst. Biol. 51:4–18. Burnham, K. P., and D. R. Anderson. 2003. Model selection and multimodel inference, a practical information-theoretic approach. Springer, New York. Chen, T. C., R. F. G. Ormond, and H. K. Mok. 2001. Feeding and territorial behavior in juveniles of three co-existing triggerfishes. J. Fish Biol. 59:524–532. Chen, W. J., C. Bonillo, and G. Lecointre. 2003. Repeatability of clades as a criterion of reliability: A case study for molecular phylogeny of 918 SYSTEMATIC BIOLOGY Acanthomorpha (Teleostei) with larger number of taxa. Mol. Phylogenet. Evol. 26:262–288. Clements, K. D., M. E. Alfaro, J. Fessler, and M. W. Westneat. 2004. Relationships of the temperate Australasian labrid fish tribe Odacini. Mol. Phylogenet. Evol. 32:575–587. Drummond, A. J., S. Y. W. Ho, M. J. Phillips, and A. Rambaut. 2006. Relaxed phylogenetics and dating with confidence. PLoS Biol. 4. e88. Felsenstein, J. 1978. Cases in which parsimony and compatability methods will be positively misleading. Syst. Zool. 27:401–410. Floeter, S. R., L. A. Rocha, D. R. Robertson, J. C. Joyeux, W. F. SmithVaniz, P. Wirtz, A. J. Edwards, J. P. Barreiros, C. E. L. Ferreira, J. L. Gasparini, A. Brito, J. M. Falcon, B. W. Bowen, and G. Bernardi. 2007. Atlantic reef fish biogeography and evolution. J. Biogeogr. 35:22–47. Fraser-Brunner, A. 1935. Notes on the Plectognath fishes. I. A synopsis of the genera of the family Balistidae. Ann. Mag. Nat. Hist. Ser. 10:658–663. Gelman, A., and D. B. Rubin. 1992. Inference from iterative simulation using multiple sequences. Stat. Sci. 7:457–511. Green, P. J. 1995. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82:711–732. Harmon, L., J. Weir, C. D. Brock, and W. Challenger. 2008. GEIGER: Investigating evolutionary radiations. Bioinformatics 24:129–131. Hoeting, J. A., D. Madigan, A. E. Raftery, and C. T. Volinsky. 1999. Bayesian model averaging: A tutorial. Stat. Sci. 14:382–417. Holcroft, N. I. 2005. A molecular analysis of the interrelationships of tetraodontiform fishes (Acanthomorpha: Tetraodontiformes). Mol. Phylogenet. Evol. 34:525–544. Huelsenbeck, J. P., and D. M. Hillis. 1993. Success of phylogenetic methods in the four-taxon case. Syst. Biol. 42:247–264. Huelsenbeck, J. P., B. Larget, and M. E. Alfaro. 2004. Bayesian phylogenetic model selection using reversible jump Markov chain Monte Carlo. Mol. Biol. Evol. 2004:1123–1133. Huelsenbeck, J. P., and B. Rannala. 2004. Frequentist properties of Bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models. Syst. Biol. 53:904–913. Kelchner, S. A., and M. A. Thomas. 2006. Model use in phylogenetics: Nine key questions. Trends Ecol. Evol. 22:87–94. Klanten, S. O., L. van Herwerden, J. H. Choat, and D. Blair. 2004. Patterns of lineage diversification in the genus Naso (Acanthuridae). Mol. Phylogenet. Evol. 32:221–235. Kuiter, R. H., and H. Debelius. 2006. World atlas of marine fishes. Hollywood Import and Export, Inc., Frankfurt. LaJeunesse, T. C. 2005. “Species” radiations of symbiotic dinoflagellates in the Atlantic and Indo-Pacific since the Miocene-Pliocene transition. Mol. Biol. Evol. 22:570–581. Lee, M. S. Y., and A. F. Hugall. 2006. Model type, implicit data weighting, and model averaging in phylogenetics. Mol. Phylogenet. Evol. 38:848–857. Leis, J. M. 1984. Tetraodontiformes: Relationships. Pages 459–463 in Ontogeny and systematics of fishes (H. G. Moser, W. J. Richards, D. M. Cohen, M. P. Fahay, A. W. Kendall Jr., and S. L. Richardson, eds.). Amer. Soc. Ichthyol. Herp. Lawrence, Kansas. Lemmon, A. R., and E. C. Moriarty. 2004. The importance of proper model assumption in Bayesian phylogenetics. Syst. Biol. 53:265–277. Magallon, S., and M. J. Sanderson. 2001. Absolute diversification rates in angiosperm clades. Evolution 55:1762–1780. Matsuura, K. 1979. Phylogeny of the superfamily Balistoidea (Pisces: Tetraodontiformes). Memoirs of the Faculty of Fisheries, Hokkaido University 26:49–149. Matsuura, K. 1981. Xenobalistes tumidipectoris, a new genus and species of triggerfish (Tetraodontiformes, Balistidae) from the Marianas Islands. Bull. Natl. Sci. Mus. Ser. A 7:191–200. Nee, S. 2001. Inferring speciation rates from phylogenies. Evolution 55:661–668. Nee, S., A. O. Mooers, and P. H. Harvey. 1992. Tempo and mode of evolution revealed from molecular phylogenies. Proc. Natl. Acad. Sci. USA 89:8322–8326. Nylander, J. A. A. 2004. Bayesian phylogenetics and the evolution of gall wasps. Comprehensive summaries of Uppsala dissertations from the Faculty of Science and Technology, 937. University of Uppsala, Sweden. Ormond, R. F. G., A. C. Campbell, S. M. Head, R. J. Moore, P. S. Rainbow, and A. P. Sanders. 1973. Formation and breakdown of aggregations VOL. 57 of the crown of thron starfish Acanthaster Planci (L.) in the Red Sea. Nature 246:167–169. Pagel, M., and A. Meade. 2006. Bayesian analysis of correlated evolution of discrete characters by reversible-jump Markov chain Monte Carlo. Am. Nat. 167:808–825. Palumbi, S. R., G. Grabowsky, T. Duda, L. Geyer, and N. Tachino. 1997. Speciation and population genetic structure in tropical Pacific Sea urchins. Evolution 51:1506–1517. Paradis, E., J. Claude, and K. Strimmer. 2004. APE; analyses of phylogenetics and evolution in R language. Bioinformatics 20:289–290. Posada, D. 2008. ModelTest: Phylogenetic model averaging. Mol Biol Evol 25:1253–1256. Posada, D., and T. R. Buckley. 2004. Model selection and model averaging in phylogenetics: Advantages of AIC and Bayesian approaches over likelihood ratio tests. Syst. Biol. 53:793–808. Posada, D., and K. A. Crandall. 1998. ModelTest: Testing the model of DNA substitution. Bioinformatics 14:817–818. Posada, D., and K. A. Crandall. 2001. Selecting the best-fit model of nucleotide substitution. Syst. Biol. 50:580–601. Pybus, O. G., and P. H. Harvey. 2000. Testing macro-evolutionary models using incomplete molecular phylogenies. Philos. Trans. R. Soc. Lond. B 267:2267–2272. Pybus, O. G., A. Rambaut, E. C. Holmes, and P. H. Harvey. 2002. New Inferences from tree shape: Number of missing taxa and population growth rates. Syst. Biol. 51:881–888. Rambaut, A. 1996. Se-Al. Available at http://beast.bio.ed.ac.uk/ software/seal Rambaut, A., and A. J. Drummond. 2007. Tracer v1.4. Available from http://beast.bio.ed.ac.uk/tracer Read, C. I., D. R. Bellwood, and L. van Herwerden. 2006. Ancient origins of Indo-Pacific coral reef fish biodiversity: A case study of the leopard wrasses (Labridae: Macropharyngodon). Mol. Phylogenet. Evol. 38:808–819. Ronquist, F., and J. P. Huelsenbeck. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19:1572–1574. Rosen, D. E. 1984. Zeiformes as primitive plectognath fishes. Am. Mus. Novit. 2782:1–45. Saiki, R. S. 1990. Amplification of genomic DNA. Pages 13–20 in PCR protocols (M. A. Innis, D. H. Gelfand, J. J. Sninsky, and T. J. White, eds.). Academic Press, San Diego, California. Santini, F., and J. C. Tyler. 2003. A phylogeny of the families of fossil and extant tetraodontiform fishes (Acanthomorpha, Tetraodontiformes), Upper Cretaceous to recent. Zool. J. Linn. Soc. 139:565–617. Santini, F., and J. C. Tyler. 2004. The importance of even highly incomplete fossil taxa in reconstructing the phylogenetic relationships of the Tetraodontiformes (Acanthomorpha: Pisces). Integr. Comp. Biol. 44:349–357. Schultz, O. 2004. A triggerfish (Osteichthyes: Balistidae: Balistes) from the Badenian (Middle Miocene) of the VIenna and the Styrian Basin (Central Paratethys). Ann. Naturhist. Mus. Wien 106A:345– 369. Simmons, M. P., K. M. Pickett, and M. Miya. 2004. How meaningful are Bayesian support values? Mol. Biol. Evol. 21:188–199. Streelman, J. T., M. Alfaro, M. W. Westneat, D. R. Bellwood, and S. A. Karl. 2002. Evolutionary history of the parrotfishes: Biogeography, ecomorphology, and comparative diversity. Evolution 56:961–971. Sullivan, J., and P. Joyce. 2005. Model selection in phylogenetics. Annu. Rev. Ecol. Evol. Syst. 36:445–466. Sullivan, J., and D. L. Swofford. 1997. Are guinea pigs rodents? The importance of adequate models in molecular phylogenetics. J. Mammal. Evol. 2:77–86. Swofford, D. L. 2003. PAUP* 4.00: Phylogenetic analysis using parsimony (*and other methods). Version 4.0. Sinauer Associates, Sunderland, Massachusetts. Taylor, E. B., and J. J. Dodson. 1994. A molecular analysis of relationships and biogeography within a species complex of Holarctic fish (genus Osmerus). Mol. Ecol. 3:235–248. Tyler, J. C. 1980. Osteology phylogeny and higher classification of the fishes of the order plectognathi tetraodontiformes. NOAA Technical Report NMFS Circular 431:1–422. Tyler, J. C., and F. Santini. 2002. Review and reconstruction of the tetraodontiform fishes from the Eocene of Monte Bolca, Italy, with comments on related Tertiary taxa. Studi e Rieche sui Giacimenti 2008 DORNBURG ET AL.—MODEL-AVERAGED PHYLOGENY OF THE BALISTIDAE Terziari di Bolca, Museo Civico di Storia Naturale di Verona 9:47– 119. Wainwright, P. C., and J. P. Friel. 2000. Effects of prey type on motor pattern variance in tetraodontiform fishes. J. Exp. Zool. 286:563– 571. Wallace, C. C., and B. R. Rosen. 2006. Diverse staghorn corals (Acropora) in high-latitude Eocene assemblages: Implications for the evolution of modern diversity patterns of reef corals. Proc. R. Soc. B 2006:975–982. Walsh, P. S., D. A. Metzger, and R. Higuchi. 1991. Chelex 100 as a medium for simple extraction of DNA for PCR-based typing from forensic material. Biotechniques 10:506–513. Westneat, M. W., and M. E. Alfaro. 2005. Phylogenetic relationships and evolutionary history of the reef fish family Labridae. Mol. Phylogenet. Evol. 36:370–390. 919 Winterbottom, R. 1974. The familial phylogeny of the Tetraodontiformes (Acanthoptrygii: Pisces) as eveidenced by their comparative myology. Smithson. Contrib. Zool. 155:1–201. Yamanoue, Y., M. Miya, J. G. Inoue, K. Matsuura, and M. Nishida. 2006. The mitochondrial genome of spotted green pufferfish Tetraodon nigroviridis (Teleostei: Tetraodontiformes) and divergence time estimation among model organisms in fishes. Gene. Genet. Syst. 81:29–39. Yang, Z., and B. Rannala. 2006. Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds. Mol. Biol. Evol. 23:212–226. First submitted 21 September 2007; reviews returned 13 November 2007; final acceptance 4 September 2008 Associate Editor: Frank Anderson