Phylogeny Reconstruction

Abstract

Phylogenies provide the necessary comparative framework to study biological problems from an evolutionary perspective. Molecular phylogenies reconstructed with probabilistic methods (i.e. maximum likelihood and Bayesian inference) are the most popular because they enable statistical inference from a potentially large number of characters (sequences). The main steps to infer phylogenies from sets of homologous sequences using probabilistic methods include sequence masking, multiple sequence alignment, trimming and phylogenetic inference under the concatenation or coalescent approaches. Commonly used software is listed. Given the increasing use of genomic data, I comment on the specifics of phylogenomics and emphasise the importance of data quality and systematic error assessment. Understanding the properties and assumptions, strengths and limitations of available phylogenetic methods is essential for adequately selecting inference methods and for correctly interpreting their results.

Key Concepts

  • Phylogenies are hypotheses of the evolutionary history of organisms (or their sequences).
  • Phylogenetic trees are essential for comparative studies in ecology and evolution.
  • Probabilistic phylogenetic methods provide a statistical framework for estimating historical patterns, inferring parameters of evolutionary processes and testing of hypotheses.
  • Coalescent methods that account for gene tree discordance are an alternative to data concatenation.
  • Phylogenomics provides an increased resolving power but might require especial attention to data quality and systematic error.
  • Mixture models can increase data fit and reduce some systematic errors.

Keywords: Bayesian; coalescence; evolution; maximum likelihood; molecular evolution; molecular phylogenetics; phylogenetic tree; systematics

Figure 1. Flowchart of the main steps for phylogeny reconstruction.
close

References

Ali RH, Bogusz M and Whelan S (2019) Identifying clusters of high confidence homologies in multiple sequence alignments. Molecular Biology and Evolution 36: 2340–2351.

Anderson CNK, Liu L, Pearl D and Edwards SV (2012) Tangled trees: the challenge of inferring species trees from coalescent and noncoalescent genes. In: Anisimova M (ed.) Evolutionary Genomics: Statistical and Computational Methods, Vol. 2, Methods in Molecular Biology, pp 3–28. Humana Press: New York.

Anisimova M and Gascuel O (2006) Approximate likelihood‐ratio test for branches: a fast, accurate, and powerful alternative. Systematic Biology 55: 539–552.

Bergsten J (2005) A review of long‐branch attraction. Cladistics 21: 163–193.

Bryant D and Hahn MW (2020) The concatenation question. In: Scornavacca C, Delsuc F, Galtier N (eds.) Phylogenetics in the Genomic Era. Authors open access book hal‐02535070, pp 3.4:1–3.4:23.

Degnan JH and Rosenberg NA (2009) Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends in Ecology & Evolution 24: 332–340.

Delsuc F, Brinkmann H and Philippe H (2005) Phylogenomics and the reconstruction of the tree of life. Nature Reviews. Genetics 6: 361–375.

Donoghue MJ, Doyle JA, Gauthier J, Kluge AG and Rowe T (1989) The importance of fossils in phylogeny reconstruction. Annual Review of Ecology and Systematics 20: 431–460.

Edwards AWF and Cavalli‐Sforza LL (1964) Reconstruction of evolutionary trees. In: Heywood VH and McNeill J (eds.) Phenetic and Phylogenetic Classification, pp 67–76. Systematics Association: London.

Edwards SV, Liu L and Pearl DK (2007) High‐resolution species trees without concatenation. Proceedings of the National Academy of Sciences of the United States of America 104: 5936–5941.

Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17: 368–376.

Felsenstein J (1985a) Phylogenies and the comparative method. The American Naturalist 125: 1–15.

Felsenstein J (1985b) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39: 783–791.

Galtier N and Gouy M (1998) Inferring pattern and process: maximum‐likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. Molecular Biology and Evolution 15: 871–879.

Gatesy J and Baker RH (2005) Hidden likelihood support in genomic data: can forty‐five wrongs make a right? Systematic Biology 54: 483–492.

Harris BJ, Harrison CJ, Hetherington AM and Williams TA (2020) Phylogenomic evidence for the monophyly of bryophytes and the reductive evolution of stomata. Current Biology 30: 2001–2012.

Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their application. Biometrika 57: 97–109.

Hoang DT, Chernomor O, von Haeseler A, Minh BQ and Le SV (2018) UFBoot2: improving the ultrafast bootstrap approximation. Molecular Biology and Evolution 35: 518–522.

Holder M and Lewis PO (2003) Phylogeny estimation: traditional and Bayesian approaches. Nature Reviews Genetics 4: 275–284.

Huelsenbeck JP and Ronquist F (2001) MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17: 754–755.

Huelsenbeck JP and Rannala B (2004) Frequentist properties of Bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models. Systematic Biology 53: 904–913.

Irisarri I and Meyer A (2016) The identification of the closest living relative(s) of tetrapods: phylogenomic lessons for resolving short ancient internodes. Systematic Biology 65: 1057–1075.

Irisarri I, Baurain D, Brinkmann H, et al. (2017) Phylotranscriptomic consolidation of the jawed vertebrate timetree. Nature Ecology & Evolution 1: 1370–1378.

Jermiin LS, Jayaswal V, Ababneh F and Robinson J (2008) Phylogenetic model evaluation. In: Keith JM (ed.) Bioinformatics, Volume I: Data, Sequence Analysis, and Evolution, Methods in Molecular Biology, pp 331–364. Springer Verlag: Totowa, NJ.

Knowles LL and Kubatko LS (2010) Estimating species trees: an introduction to concepts and models. In: Knowles LL and Kubatko LS (eds) Estimating Species Trees: Practical and Theoretical Aspects, p 232. John A. Wiley and Sons, Inc.: Hoboken, New Jersey.

Kumar S, Filipski AJ, Battistuzzi FU, Kosakovsky Pond SL and Tamura K (2012) Statistics and truth in phylogenomics. Molecular Biology and Evolution 29: 457–472.

Lanfear R, Calcott B, Kainer D, Mayer C and Stamatakis A (2014) Selecting optimal partitioning schemes for phylogenomic datasets. BMC Evolutionary Biology 14: 82.

Lartillot N and Philippe H (2004) A Bayesian mixture model for across‐site heterogeneities in the amino‐acid replacement process. Molecular Biology and Evolution 21: 1095–1109.

Lartillot N, Lepage T and Blanquart S (2009) PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating. Bioinformatics 25: 2286–2288.

Le SQ, Gascuel O and Lartillot N (2008) Empirical profile mixture models for phylogenetic reconstruction. Bioinformatics 24: 2317–2323.

Lemmon EM and Lemmon AR (2013) High‐throughput genomic data in systematics and phylogenetics. Annual Review of Ecology, Evolution, and Systematics 44: 99–121.

Lemoine F, Domelevo Entfellner JB, Wilkinson E, et al. (2018) Renewing Felsenstein's phylogenetic bootstrap in the era of big data. Nature 556: 452–456.

Liu L, Yu L, Kubatko L, Pearl DK and Edwards SV (2009) Coalescent methods for estimating phylogenetic trees. Molecular Phylogenetics and Evolution 53: 320–328.

Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH and Teller E (1953) Equations of state calculations by fast computing machines. Journal of Chemical Physics 21: 1087–1092.

Parry LA, Edgecombe GD, Eibye‐Jacobsen D and Vinther J (2016) The impact of fossil data on annelid phylogeny inferred from discrete morphological characters. Proceedings of the Royal Society B: Biological Sciences 283: 20161378.

Philippe H, Brinkmann H, Lavrov DV, et al. (2011) Resolving difficult phylogenetic questions: why more sequences are not enough. PLoS Biology 9: e1000602.

Rannala B and Yang Z (2003) Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164: 1645–1656.

Rodríguez‐Ezpeleta N, Brinkmann H, Roure B, et al. (2007) Detecting and overcoming systematic errors in genome‐scale phylogenies. Systematic Biology 56: 389–399.

Shen X‐X, Hittinger CT and Rokas A (2017) Contentious relationships in phylogenomic studies can be driven by a handful of genes. Nature Ecology & Evolution 1: 126.

Simion P, Philippe H, Baurain D, et al. (2017) A large and consistent phylogenomic dataset supports sponges as the sister group to all other animals. Current Biology 27: 958–967.

Stamatakis A, Hoover P and Rougemont J (2008) A rapid bootstrap algorithm for the RAxML web servers. Systematic Biology 57: 758–771.

Talavera G and Castresana J (2007) Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Systematic Biology 56: 564–577.

Tan G, Muffato M, Ledergerber C, et al. (2015) Current methods for automated filtering of multiple sequence alignments frequently worsen single‐gene phylogenetic inference. Systematic Biology 64: 778–791.

Than C, Ruths D and Nakhleh L (2008) PhyloNet: A software package for analyzing and reconstructing reticulate evolutionary relationships. BMC Bioinformatics 9: 1–16.

Walker JF, Brown JW and Smith SA (2018) Analyzing contentious relationships and outlier genes in phylogenomics. Systematic Biology 67: 916–924.

Whelan S, Irisarri I and Burki F (2018) PREQUAL: detecting non‐homologous characters in sets of unaligned homologous sequences. Bioinformatics 34: 3929–3930.

Wu Y‐C, Rasmussen MD, Bansal MS and Kellis M (2014) Most parsimonious reconciliation in the presence of gene duplication, loss, and deep coalescence using labeled coalescent trees. Genome Research 24: 475–486.

Yu Y, Than C, Degnan JH and Nakhleh L (2011) Coalescent histories on phylogenetic networks and detection of hybridization despite incomplete lineage sorting. Systematic Biology 60: 138–149.

Further Reading

Bleidorn C (2017) Phylogenomics: an Introduction, p 222. Springer.

Kapli P, Yang Z and Telford MJ (2020) Phylogenetic tree building in the genomic age. Nature Reviews Genetics 21: 428–444.

Lemey P, Salemi M and Vandamme A‐M (eds.) (2009) The Phylogenetic Handbook, 2nd edn, p 723. Cambridge University Press: New York.

Scornavacca C, Delsuc F, Galtier N (eds.) (2020) Phylogenetics in the Genomic Era, 568 pp. Authors open access book, hal‐02535070.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Irisarri, Iker(Oct 2020) Phylogeny Reconstruction. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0029211]