Estimation of Species Trees

Abstract

During the last decades, gene trees have been often interpreted as species phylogenies. However, the extensive gene tree discordance found in multi‐locus datasets has put into question this interpretation, and a variety of new methods that explicitly consider species trees have been proposed in recent years. Some of these explicitly consider evolutionary processes that can lead to true gene tree discordance, namely incomplete lineage sorting, gene duplication and loss and horizontal gene transfer. Choosing the most appropriate species tree method for the data at hand is not straightforward due to different data prerequisites, model assumptions, analytical strategies and computational implementations.

Key Concepts:

  • We could think of at least three different phylogenetic layers corresponding to species trees, locus trees and gene trees. These depict, respectively, the history of the sampled species, loci and genes copies.

  • Traditional phylogenetic inference has focused on the reconstruction of gene trees, assumed to be accurate proxies for species history.

  • True species, locus and gene trees can be incongruent due to the effect of evolutionary processes like incomplete lineage sorting, gene duplication and loss and horizontal gene transfer. This incongruence might appear larger due to estimation error.

  • Extensive gene tree incongruence unveiled in multi‐locus datasets has encouraged the development of methods that explicitly reconstruct species trees. The supermatrix approach combines loci into a super‐alignment and estimates the corresponding supergene tree. Supertree methods combine gene trees to obtain an estimate of the species tree. Other methods co‐estimate gene and species trees in a single step using full probabilistic models.

  • Different species tree methods require distinct data specifications, mainly related with the consideration of paralogs, number of sampled species, missing taxa and number of loci.

  • Rooting and reconstruction uncertainty must be carefully considered before choosing a species tree method.

Keywords: supermatrix; supertree; incomplete lineage sorting; gene duplication and loss; horizontal gene transfer; hybridisation; multispecies coalescent; reconciliation

Figure 1.

Species, locus and gene tree. The species tree (wide tree in the background) represents the history of the sampled species (A, B and C), whose internal nodes correspond to speciation events. Species tree branches represent evolving populations, described by their size (width) and time (branch length). The locus tree (green strips) evolves inside the species tree, depicting the history of the sampled loci (A0,B0,C0). Its branches represent populations of evolving loci, described in the same way as in the species tree. Internal nodes can correspond to either speciations or locus‐related events like duplication, losses and transfers (not shown). The gene tree (thin black lines) evolve within the locus tree, and represents the history of the sampled gene copies, with branch lengths usually indicating amount of evolution (substitutions per site).

Figure 2.

Evolutionary processes that can lead to species tree/gene tree discordance. Only the species tree (wide tree in the background) and gene tree (thin lines) are used in this representation (the locus tree is omitted for clarity). Different evolutionary events are indicated with colours: incomplete lineage sorting/deep coalescence (black), gene duplication (orange) and loss (light blue), horizontal gene transfer (violet) and hybridisation (red). Branch and gene copy colours indicate the locus (0=green, 1=orange). Dashed lines represent lost lineages, either by gene loss or replacement with a foreign copy (xenolog).

close

References

Ane C, Larget B, Baum DA, Smith SD and Rokas A (2007) Bayesian estimation of concordance among gene trees. Molecular Biology and Evolution 24: 412–426.

Bansal MS, Burleigh JG, Eulenstein O and Fernández-Baca D (2010) Robinson‐Foulds supertrees. Algorithms for Molecular Biology 5: 18.

Bansal MS and Eulenstein O (2013) Algorithms for genome‐scale phylogenetics using gene tree parsimony. IEEE/ACM Transactions on Computational Biology and Bioinformatics/IEEE, ACM 10: 939–956.

Baum BR (1992) Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees. Taxon 41: 3–10.

Boussau B, Szollosi GJ, Duret L et al. (2013) Genome‐scale coestimation of species and gene trees. Genome Research 23: 323–330.

Chaudhary R, Bansal MS, Wehe A, Fernández‐Baca D and Eulenstein O (2010) iGTP: a software package for large‐scale gene tree parsimony analysis. BMC Bioinformatics 11: 574.

Chaudhary R, Burleigh JG and Fernández‐Baca D (2013) Inferring species trees from incongruent multi‐copy gene trees using the Robinson‐Foulds distance. Algorithms for Molecular Biology: AMB 8: 28.

Degnan JH, DeGiorgio M, Bryant D and Rosenberg NA (2009) Properties of consensus methods for inferring species trees from gene trees. Systematic Biology 58: 35–54.

Degnan JH and Rosenberg NA (2006) Discordance of species trees with their most likely gene trees. PLoS Genetics 2: e68.

Doyon JP, Scornavacca C, Gorbunov KY et al. (2010) An efficient algorithm for gene/species trees parsimonious reconciliation with losses, duplications and transfers. In: Tannier E (ed.) Comparative Genomics, vol. 6398, pp. 93–108. Berlin, Heidelberg: Springer.

Goodman M, Czelusniak J, Moore G, Romero‐Herrera AE and Matsuda G (1979) Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences. Systematic Zoology 28: 132–163.

Heled J and Drummond AJ (2010) Bayesian inference of species trees from multilocus data. Molecular Biology and Evolution 27: 570–580.

Hovmöller R, Knowles LL and Kubatko LS (2013) Effects of missing data on species tree estimation under the coalescent. Molecular Phylogenetics and Evolution 69: 1057–1062.

Jeffroy O, Brinkmann H, Delsuc F and Philippe H (2006) Phylogenomics: the beginning of incongruence? Trends in Genetics 22: 225–231.

Kingman JFC (1982a) The coalescent. Stochastic Processes and their Applications 13: 235–248.

Kingman JFC (1982b) On the genealogy of large populations. Journal of Applied Probability 19A: 27–43.

Kubatko LS, Carstens BC and Knowles LL (2009) STEM: species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics 25: 971–973.

Larget BR, Kotha SK, Dewey CN and Ané C (2010) BUCKy: gene tree/species tree reconciliation with Bayesian concordance analysis. Bioinformatics 26: 2910–2911.

Liu L (2008) BEST: Bayesian estimation of species trees under the coalescent model. Bioinformatics (Oxford, England) 24: 2542–2543.

Liu L and Yu L (2011) Estimating species trees from unrooted gene trees. Systematic Biology 60: 661–667.

Liu L, Yu L and Edwards SV (2010) A maximum pseudo‐likelihood approach for estimating species trees under the coalescent model. BMC Evolutionary Biology 10: 302.

Liu L, Yu L, Kubatko L, Pearl DK and Edwards SV (2009a) Coalescent methods for estimating phylogenetic trees. Molecular Phylogenetics and Evolution 53: 320–328.

Liu L, Yu L, Pearl DK and Edwards SV (2009b) Estimating species phylogenies using coalescence times among sequences. Systematic Biology 58: 468–477.

Maddison DR and K‐S Schulz (eds.) (2007) The Tree of Life Web Project. http://tolweb.org

Maddison W (1997) Gene trees in species trees. Systematic Biology 46: 523–536.

Mallo D, de Oliveira Martins L and Posada D (2014) Unsorted homology within locus and species trees. Systematic Biology. doi: 10.1093/sysbio/syu050.

Mirarab S, Reaz R, Bayzid MS et al. (2014) ASTRAL: genome‐scale coalescent‐based species tree estimation. Bioinformatics. 30(17): i541–i548.

Mossel E and Roch S (2010) Incomplete lineage sorting: consistent phylogeny estimation from multiple loci. IEEE/ACM Transactions on Computational Biology and Bioinformatics/IEEE, ACM 7: 166–171.

Nguyen N, Mirarab S and Warnow T (2012) MRL and SuperFine+MRL: new supertree methods. Algorithms for Molecular Biology: AMB 7: 3.

de Oliveira Martins L, Posada D and Mallo D (in press) A Bayesian supertree model for genome‐wide species tree reconstruction. Systematic Biology.

Oliver JC (2008) AUGIST: inferring species trees while accommodating gene tree uncertainty. Bioinformatics 24: 2932–2933.

Page RD and Charleston MA (1997) From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem. Molecular Phylogenetics and Evolution 7: 231–240.

Page RDM (1994) Parallel phylogenies: reconstructing the history of host‐parasite assemblages. Cladistics 10: 155–173

Pamilo P and Nei M (1988) Relationships between gene trees and species trees. Molecular Biology and Evolution 5: 568–583.

Ragan MA (1992) Phylogenetic inference based on matrix representation of trees. Molecular Phylogenetics and Evolution 1: 53–58.

Rannala B and Yang Z (2003) Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164: 1645–1656.

Rasmussen MD and Kellis M (2012) Unified modeling of gene duplication, loss, and coalescence using a locus tree. Genome Research 22: 755–765.

Salichos L and Rokas A (2013) Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 497: 327–331.

Seo T‐K (2008) Calculating bootstrap probabilities of phylogeny using multilocus sequence data. Molecular Biology and Evolution 25: 960–971.

Takahata N (1989) Gene geneology in three related populations: consistency probability between gene and population trees. Genetics 122: 957–966.

Tofigh A, Hallett M and Lagergren J (2011) Simultaneous identification of duplications and lateral gene transfers. IEEE/ACM Transactions on Computational Biology and Bioinformatics/IEEE, ACM 8: 517–535.

Whidden C, Zeh N and Beiko RG (2014) Supertrees based on the subtree prune‐and‐regraft distance. Systematic Biology 63: 566–581.

Wu Y (2012) Coalescent‐based species tree inference from gene tree topologies under incomplete lineage sorting by maximum likelihood. Evolution 66: 763–775.

Yu Y, Ristic N and Nakhleh L (2013) Fast algorithms and heuristics for phylogenomics under ILS and hybridization. BMC Bioinformatics 14(suppl. 1): S6.

Further Reading

Knowles LL and Kubatko LS (2010) Estimating Species Trees: Practical and Theoretical Aspects. Hoboken, NJ: Wiley‐Blackwell.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Mallo, Diego, de Oliveira Martins, Leonardo, and Posada, David(Nov 2014) Estimation of Species Trees. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0025781]