In addition to 54 fish species used in Azuma et al. (2008), 22 atherinomorphs (including 14 medakas) were newly added in this study, making a total number of species analyzed 76 (table S1). The 14 medakas included all of the three known species groups (celebensis, javanicus and latipes species groups), with the latter comprising two species (Oryzias luzonensis and O. latipes) and four regional populations of O. latipes (Shanghai, South Korea, southern and northern Japanese populations).
SPECIMENS AND DNA EXTRACTION
A portion of epaxial musculature (ca. 0.25 g) from fresh specimens of each species was excised and the tissue immediately preserved in 99.5% ethanol. Total genomic DNA from the ethanol-preserved tissue was extracted using DNeasy (Qiagen) or Aquapure genomic DNA isolation kit (Bio-Rad Laboratories, Inc.) following manufacturer’s protocols.
PCR AND SEQUENCINGS
Whole mitogenomes of the eight medaka species were amplified in their entirety using a long PCR technique (Cheng et al. 1994). Seven fish-versatile PCR primers for the long PCR were used in the following four combinations: L2508-16S + H12293-Leu; L2508-16S + H15149-CYB; L8343-Lys + H1065-12S; and L12321-Leu + S-LA-16S-H (for locations and sequences of these primers, see Inoue et al. 2000, 2001; Ishiguro et al. 2001, Kawaguchi et al. 2001; Miya & Nishida 2000) to amplify the entire mitogenome in two reactions. Long PCR reaction conditions followed Miya and Nishida (1999). Long PCR products diluted with TE buffer (1:19) were subsequently used as templates for short PCR reactions employing fish-versatile PCR primers in various combinations to amplify contiguous, overlapping segments of the entire mitogenome. The short PCR reactions were carried out following protocols previously described (Miya and Nishida 1999), then purified using Exosap-IT enzyme (GE Healthcare Bio-Sciences Corp.), and subsequently sequenced with dye-labeled terminators (BigDye terminator ver. 1.1/3.1, Applied Biosystems) and the primers used in the short PCRs. Sequencing reactions were conducted following the manufacturer’s instructions, followed by electrophoresis on ABI Prism 3100 or 3130 DNA sequencers (Applied Biosystems). A list of PCR primers used in this study is available from MM upon request.
SEQUENCE EDITING AND ALIGNMENT
Each sequence electropherogram was edited with EditView (ver. 1.01; Applied Biosystems) and the multiple sequences were concatenated using AutoAssembler (ver. 2.1; Applied Biosystems). The concatenated sequences were carefully checked and annotated using DNASIS (ver. 3.2; Hitachi Software Engineering) and a sequence file was created for each gene.
Mitogenome sequences from the 22 atherinomorphs were concatenated with the pre-aligned sequences used in Azuma et al. (2008) in a FASTA format, which was subjected to multiple alignment using MAFFT ver. 6 (Katoh & Toh 2008). The aligned sequences were imported into MacClade ver. 4.08 (Maddison & Maddison 2000) and the resulting gaps in the pre-aligned sequences were manually removed to reproduce the alignment used in Azuma et al. (2008). The dataset comprises 6966 positions from first and second codon positions of the 12 protein-coding genes (excluding ND6 gene), 1673 positions from the two rRNA genes and 1407 positions from the 22 tRNA genes (total 10,046 positions). The third codon positions of the protein-coding genes were entirely excluded because of their extremely accelerated rates of changes that may cause high level of homoplasy (Miya & Nishida 2000) and overestimation of divergence time (Benton & Ayala 2003).
Unambiguously aligned sequences were divided into four partitions (first, second codon positions, rRNA and tRNA genes) and subjected to the partitioned maximum-likelihood (ML) analysis using RAxML ver. 7.0.4 (Stamatakis 2006). General time reversible model with sites following a discrete gamma distribution (GTR + ; the model recommended by the author) was used and a rapid bootstrap (BS) analysis was conducted with 1000 replications (–f a option). This option performs BS analysis using GTRCAT, which is GTR approximation with optimization of individual per-site substitution rates, and classification of those individual rates into certain number of rate categories. After implementing the BS analysis, the program uses every fifth BS tree as a starting point to search for ML tree using GTR + model of sequence evolution to obtain more stable likelihood values.
DIVERGENCE TIME ESTIMATION
A relaxed molecular-clock method for dating analysis developed by Thorne and Kishino (2002) was used to estimate divergence times. This method accommodates unlinked rate variation across different loci (“partitions” in this study), allows the use of time constraints on multiple divergences, and uses a Bayesian MCMC approach to approximate the posterior distribution of divergence times and rates based on a single tree topology estimated from the other method (ML tree in this study). A series of software in a program package multidistribute (v9/25/2003) was used for these analyses.
Baseml in PAML ver. 3.14 was used to estimate model parameters for each partition separately under the F84 + model of sequence evolution (the most parameter-rich model implemented in multidistribute). Based on the outputs from baseml, branch lengths and the variance-covariance matrix were estimated using estbranches in multidistribute for each partition. Finally multidivtime in multidistribute was used to perform Bayesian MCMC analyses to approximate the posterior distribution of substitution rates, divergence times, and 95% credible intervals. In this step, multidivtime uses estimated branch lengths and the variance-covariance matrices from all partitions without information from the aligned sequences.
MCMC approximation with a burnin period of 100,000 cycles was obtained and every 100 cycles taken until a total of samples reaching 10,000. To diagnose possible failure of the Markov chains to converge to their stationary distribution, at least two replicate MCMC runs were performed with two different random seeds for each analysis.
Application of multidivtime requires values for the mean of the prior distribution for the time separating the ingroup root from the present (rttm) and its standard deviation (rttmsd) and we set conservative estimates of 4.45 (= 445 Mya) and 4.45 SD, respectively. The tip-root branch lengths were calculated using TreeStat v. 1.1 for all terminals and their average was divided by rttm (4.45) to estimate rate of the root node (rtrate) and its standard deviation (rtratesd), which were set to 0.074 and 0.074, respectively. The priors for the mean of the Brownian motion constant, brownmean and brownsd, were both set to 0.5, specifying a relatively flexible prior.
The multidivtime program allows for both minimum (lower) and maximum (upper) time constraints and it has been argued that multiple calibration points would provide overall more realistic divergence time estimates. We therefore sought to obtain an optimal phylogenetic coverage of calibration points across our tree, although we could set maximum constraints based on fossil records only for the three basal splits between Sarcopterygii and Actinopterygii, Polypteriformes and Actinopteri, Acipenseriformes and Neopterygii (A–C in figure 1; table S2). We also set lower and upper time constraints for three nodes in cichlids divergence, which show excellent congruities with Gondwanan continental fragmentations, assuming that they have never dispersed across oceans. Accordingly we set a total of 27 time constrains based on both fossil record and biogeographic events as shown in figure 1 and table S2.
2. RESULTS GENOME ORGANIZATION
The whole mitogenome sequences from the eight medaka species reported here for the first time were registered in DDBJ/EMBL/GenBank (table S1 in ESM). The genome contents (including 13 protein-coding, two rRNA and 22 tRNA genes and the control region) and gene orders were identical to those of typical vertebrates.
3. ACKNOWLEDGMENTS We thank Y. Azuma, Y. Yamanoue and other members of Marine Molecular Biology Laboratory, Ocean Research Institute, The University of Tokyo, for their invaluable advice and discussions. Sincere thanks are also go to J.L. Thorne for his advice in performing multidivtime analysis.
4. REFRENCES Azuma, Y. et al. 2008 Mitogenomic evaluation of the historical biogeography of cichlids toward reliable dating of teleostean divergences. BMC Evol. Biol.8, 215. (doi: 10.1186/1471-2148-8-215)
Benton, M.J. & Ayala, F. 2003 Dating the tree of life. Science300, 1698–1700. (doi: 10.1126/science.1077795)
Benton, M.J. & Donoghue, P.C. 2007 Paleontological evidence to date the tree of life. Mol. Biol. Evol. 24, 26–53. (doi:10.1093/molbev/msl150)
Cheng, S., Higuchi, R. & Stoneking, M. 1994 Complete mitochondrial genome amplification. Nat. Genet.7, 350–351. (doi:10.1038/369684a0)
Hurley, I.A. et al. 2007 A new time-scale for ray-finned fish evolution. Proc. R. Soc. B,274, 489–498. (doi: 10.1098/rspb.2006.3749)
Inoue, J.G.., Miya, M., Tsukamoto, K. & Nishida, M. 2000 Complete mitochondrial DNA sequence of the Japanese sardine Sardinops melanostictus. Fish. Sci.66, 924–932. (doi: 10.1111/j.1444-2906.2000.00148.x)
Inoue, J.G.., Miya, M., Tsukamoto, K. & Nishida, M. 2001 Complete mitochondrial DNA sequence of the Japanese anchovy Engraulis japonicus. Fish. Sci.67, 828–835. (doi: 10.1046/j.1444-2906.2001.00329.x)
Inoue, J.G. et al. 2009 The historical biogeography of the freshwater knifefishes using mitogenomic approaches: a Mesozoic origin and out-of-India migration of the Asian notopterids (Actinopterygii: Osteoglossomorpha). Mol. Phylogenet. Evol. (doi: 10.1016/j.ympev.2009.01.020)
Ishiguro, N.B., Miya, M. & Nishida, M. 2001 Complete mitochondrial DNA sequence of ayu, Plecoglossus altivelis. Fish. Sci.67, 474–481. (doi: 10.1046/j.1444-2906.2001.00283.x)
Janvier, P. 1996 Early vertebrates. Oxford: Oxford University Press.
Katoh, K. & Toh, H. 2008 Recent developments in the MAFFT multiple sequence alignment program. Briefings Bioinformat. 9, 286–298. (doi:10.1093/bib/bbn013)
Kawaguchi, A., Miya, M. & Nishida, M. 2001 Complete mitochondrial DNA sequence of Aulopus japonicus (Teleostei: Aulopiformes), a basal Eurypterygii: longer DNA sequences and higher-level relationships. Ichthyol. Res. 48, 213–223. (doi: 10.1007/s10228-001-8139-0)
Masters, J.C., de Wit, M.J. & Asher, R.J. 2006 Reconciling the origins of Africa, India and Madagascar with vertebrate dispersal scenarios. Folia Primatol. 77, 399–418. (doi: 10.1159/000095388)
Miya, M. & Nishida, M. 1999 Organization of the mitochondrial genome of a deep-sea fish Gonostoma gracile (Teleostei: Stomiiformes): first example of transfer RNA gene rearrangements in bony fishes. Mar. Biotechnol. 1, 416–426. (doi:10.1007/PL00011798)
Miya, M. & Nishida, M. 2000 Use of mitogenomic information in teleostean molecular phylogenetics: a tree-based exploration under the maximum-parsimony optimality criterion. Mol. Phylogenet. Evol.17, 437–455. (doi:10.1006/mpev.2000.0839)
Parenti, L.R. 2008 A phylogenetic analysis and taxonomic revision of ricefishes, Oryzias and relatives (Beloniformes, Adrianichthyidae) Zool. J. Linn. Soc. 154, 494–610.
Patterson, C. 1993 Osteichthyes: Teleostei. In The fossil record 2 (ed. M.J. Benton), pp. 621–656. London: Chapman & Hall. .
Smith, A.G., Smith, D.G. & Funnell, B.M. 1994 Atlas of Mesozoic and Cenozoic coastlines. Cambridge: Cambridge University Press.
Storey, B.C. 1995 The role of mantle plumes in continental breakup: case histories from Gondwanaland. Nature377, 301–308. (doi:10.1038/377301a)
Thorne, J.L. & Kishino, H. 2002 Divergence time and evolutionary rate estimation with multilocus data. Syst. Biol. 51, 689–702. (doi: 10.1080/10635150290102456)
Tyler, J.C. & Sorbini, L. 1996 New superfamily and three new families of tetraodontiform fishes from the Upper Cretaceous: the earliest and most morphologically primitive plectognaths. Smithson. Contrib. Paleobiol.82, 1–59.
Wilson, M.V H., Brinkman, D.B. & Neuman, A.G. 1992 Cretaceous Esocoidei (Teleostei): early radiation of the pikes in North American fresh waters. J. Paleontol. 66, 839–846.
Yamanoue, Y., Miya, M., Inoue, J.G., Matsuura, K. & Nishida, M. 2006 The mitochondrial genome of spotted green pufferfish Tetraodon nigroviridis (Teleostei: Tetraodontiformes) and divergence time estimation among model organisms in fishes. Gene Genet. Syst.81, 29–39.
Yang, Z. 1997 PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci.13, 555–556.
Zhu M. et al. 2006 A primitive fish provides key characters bearing on deep osteichthyan phylogeny. Nature441, 77–80. (doi:10.1038/nature04563)
Supplementary Table S1. List of species used in this study with DDBJ/GenBank/EMBL accession numbers. Taxonomic treatment of species of the family Adrianichthyidae follows Parenti (2008)
T U 145 The upper and lower bounds of separation between
L 112 Indo-Madagascar landmass and Gondwanaland (Smith et al. 1994; Storey 1995; Masters et al. 2006)
U U 120 The upper and lower bounds of separation between African and L
L 100 South American landmasses (Smith et al. 1994; Storey 1995)
Supplementary Table S3. Comparisons of divergence time estimates between the present study and previous studies
Node This study Azuma et al. Yamanoue et al.
Sarcopterygii vs. Actinopterygii 428 (419–442) 429 (417–449) 470 (415–524)
Teleostei vs. Neopterygii 364 (346–378) 365 (348–378) 390 (340–442)
Euteleostei vs. Otocephala 289 (269–310) 288 (268–307) 315 (270–363)
Cyprinus vs. Danio 153 (125–183) 147 (120–174) 167 (131–208)
Acanthopterygii vs. Paracanthopterygii 209 (191–225) 207 (190–224) 223 (191–264)
Percomorpha vs. Berycomorpha 200 (185–217) 198 (183–215) 206 (174–245)
Oryzias vs. Tetraodontiformes 180 (166–195) 176 (163–191) 184 (154–221)
Oryzias vs. Cichlidae 150 (139–161) 152 (141–165) ——
Gasterosteus vs. Tetraodontidae 173 (159–189) 170 (156–185) 192 (153–235)
Takifugu vs. Tetraodon 78 (63–93) 78 (65–93) 73 (57–94)
* Estimated with biogeography-based time constraints on cichlid divergence
FIGURE LEGEND Figure S1. Maximum likelihood tree from analysis of whole mitogenome sequences (10,046 positions excluding third codon positions) from 76 fish species using RAxML ver. 7.0.4. Numerals beside internal branches indicate bootstrap probabilities based on 1000 replicates. Scale indicates expected number of substitutions per site.