Phylogeny Reconstruction

Abstract

Phylogenies provide the necessary comparative framework to study biological problems from an evolutionary perspective. Therefore, selecting the appropriate data sources and methods of inference that ensure reconstructing robust phylogenetic hypotheses becomes essential. Molecular phylogenies reconstructed with probabilistic methods (i.e. maximum likelihood and Bayesian inference) are by far the most popular, as they produce powerful statistical estimation of phylogenies from a potentially large number of characters (sequences). Working within a probabilistic framework further allows using available statistical tools for a posteriori analyses on the reconstructed phylogenies. Most recent advances in phylogenetics, including sophisticated evolutionary models, methods that reconcile different gene genealogies, the analysis of genome‐scale data (phylogenomics) or the development of new and more effective computational methods take advantage of probabilistic methods of phylogeny inference. Hence, an understanding of the properties, strengths and limitations of probabilistic methods is fundamental.

Key Concepts:

  • Phylogenetic trees represent the evolutionary history of organisms (or sequences) in terms of relationships and the amount of genetic differentiation.

  • Phylogenies are essential to fully understand the evolutionary dimension in any biological discipline.

  • Maximum likelihood and Bayesian inference are probabilistic methods for the statistical estimation of phylogenies based on explicit models of sequence evolution.

  • Probabilistic methods of phylogeny reconstruction provide a statistical framework for estimating historical patterns, inferring intrinsic parameters of evolutionary processes and testing of hypotheses.

  • Methods that attempt to reconcile different gene trees with the underlying species phylogeny have recently been developed as an alternative to data concatenation.

  • The intersection of genome‐scale sequence data and probabilistic methods of phylogeny reconstruction has given rise to phylogenomics, which is emerging as a very promising field in evolutionary studies.

Keywords: phylogenetics; evolution; phylogenetic tree; molecular evolution; evolutionary model; phylogenomics; maximum likelihood; Bayesian inference; species tree reconciliation

Figure 1.

Flowchart of the phylogenetic pipeline.

close

References

Abascal F, Zardoya R and Telford MJ (2010) TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations. Nucleic Acids Research 38(Suppl.2): W7–W13.

Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN and Csaki F (eds) Second International Symposium of Information Theory pp. 267–281. Budapest: Akademiai Kiado.

Anderson CNK, Liu L, Pearl D and Edwards SV (2012) Tangled trees: the challenge of inferring species trees from coalescent and noncoalescent genes. In: Anisimova M (ed.) Evolutionary Genomics: Statistical and Computational Methods, Vol. 2, pp. 3–28. New York: Humana Press.

Anisimova M and Gascuel O (2006) Approximate likelihood‐ratio test for branches: a fast, accurate, and powerful alternative. Systematic Biology 55(4): 539–552.

Castresana J (2000) Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Molecular Biology and Evolution 17(4): 540–552.

Ciccarelli FD, Doerks T, von Mering C et al. (2006) Toward automatic reconstruction of a highly resolved tree of life. Science 311(5765): 1283–1287.

Darwin C (1859) On the Origin of Species by Means of Natural Selection, or Preservation of Favoured Races in the Struggle for Life. London: John Murray.

Delsuc F, Brinkmann H and Philippe H (2005) Phylogenomics and the reconstruction of the tree of life. Nature Reviews Genetics 6(5): 361–375.

Do CB, Mahabhashyam MSP, Brudno M and Batzoglou S (2005) ProbCons: probabilistic consistency‐based multiple sequence alignment. Genome Research 15(2): 330–340.

Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32(5): 1792–1797.

Edwards AWF and Cavalli‐Sforza LL (1964) Reconstruction of evolutionary trees. In: Heywood VH and MCNeill J (eds) Phenetic and Phylogenetic Classification pp. 67–76. London: Systematics Association 6.

Edwards SV, Liu L and Pearl DK (2007) High‐resolution species trees without concatenation. Proceedings of the National Academy of Sciences of the USA 104: 5936–5941.

Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17: 368–376.

Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39(4): 783–791.

Galtier N and Gouy M (1998) Inferring pattern and process: maximum‐likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. Molecular Biology and Evolution 15(7): 871–879.

Gatesy J and Baker RH (2005) Hidden likelihood support in genomic data: can forty‐five wrongs make a right? Systematic Biology 54: 483–492.

Gil M, Zanetti MS, Zoller S and Anisimova M (2013) CodonPhyML: fast maximum likelihood phylogeny estimation under codon substitution models. Molecular Biology and Evolution doi: 10.1093/molbev/mst034

Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their application. Biometrika 57: 97–109.

Holder M and Lewis PO (2003) Phylogeny estimation: traditional and Bayesian approaches. Nature Reviews Genetics 4: 275–284.

Huelsenbeck JP and Rannala B (2004) Frequentist properties of Bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models. Systematic Biology 53(6): 904–913.

Huelsenbeck JP and Ronquist F (2001) MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17(8): 754–755.

Huerta‐Cepas J, Capella‐Gutierrez S, Pryszcz LP et al. (2011) PhylomeDB v3.0: an expanding repository of genome‐wide collections of trees, alignments and phylogeny‐based orthology and paralogy predictions. Nucleic Acids Research 39(Suppl. 1): D556–D560.

Jayaswal V, Jermiin LS and Robinson J (2005) Estimation of phylogeny using a general Markov model. Evolutionary Bioinformatics Online 1: 62–80.

Jermiin LS, Jayaswal V, Ababneh F and Robinson J (2008) Phylogenetic model evaluation. In: Keith JM (ed.) Bioinformatics: Volume I: Data, Sequence Analysis, and Evolution, Vol. 452, pp. 331–364. Totowa: Springer Verlag.

Katoh K, Misawa K, Kuma K‐I and Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research 30(14): 3059–3066.

Knowles LL and Kubatko LS (2010) Estimating species trees: an introduction to concepts and models. In: Knowles LL and Kubatko LS (eds) Estimating Species Trees: Practical and Theoretical Aspects, 232. Hoboken, New Jersey: John A. Wiley and Sons, Inc.

Kumar S, Filipski AJ, Battistuzzi FU, Pond SLK and Tamura K (2012) Statistics and truth in phylogenomics. Molecular Biology and Evolution 29(2): 457–472.

Letsch H and Kjer K (2011) Potential pitfalls of modelling ribosomal RNA data in phylogenetic tree reconstruction: evidence from case studies in the Metazoa. BMC Evolutionary Biology 11(1): 146.

Liu L, Yu L, Kubatko L, Pearl DK and Edwards SV (2009) Coalescent methods for estimating phylogenetic trees. Molecular Phylogenetics and Evolution 53(1): 320–328.

Löytynoja A and Goldman N (2005) An algorithm for progressive multiple alignment of sequences with insertions. Proceedings of the National Academy of Sciences of the USA 102(30): 10557–10562.

Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH and Teller E (1953) Equations of state calculations by fast computing machines. Journal of Chemical Physics 21(6): 1087–1092.

Notredame C, Higgins DG and Heringa J (2000) T‐Coffee: a novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology 302: 205–217.

Pagel M and Meade A (2005) Mixture models in phylogenetic inference. In: Gascuel O (ed.) Mathematics of Evolution and Phylogeny, pp. 121–142. New York: Oxford University Press.

Pagel M and Meade A (2008) Modelling heterotachy in phylogenetic inference by reversible‐jump Markov chain Monte Carlo. Philosophical Transactions of the Royal Society B: Biological Sciences 363(1512): 3955–3964.

Posada D and Buckley TR (2004) Model selection and model averaging in phylogenetics: advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests. Systematic Biology 53(5): 793–808.

Reeves JH (1992) Heterogeneity in the subsitution process of amino acid sites of proteins coded by mitochondrial DNA. Journal of Molecular Evolution 35: 17–31.

Stamatakis A, Blagojevic F, Nikolopoulos D and Antonopoulos C (2007) Exploring new search algorithms and hardware for phylogenetics: RAxML meets the IBM cell. Journal of VLSI Signal Processing 48(3): 271–286.

Sullivan J and Joyce P (2005) Model selection in phylogenetics. Annual Review of Ecology, Evolution, and Systematics 36: 445–466.

Suzuki Y, Glazko GV and Nei M (2002) Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics. Proceedings of the National Academy of Sciences of the USA 99(25): 16138–16143.

Telford MJ and Copley RR (2011) Improving animal phylogenies with genomic data. Trends in Genetics 27(5): 186–195.

Townsend TM, Mulcahy DG, Noonan BP et al. (2011) Phylogeny of iguanian lizards inferred from 29 nuclear loci, and a comparison of concatenated and species‐tree approaches for an ancient, rapid radiation. Molecular Phylogenetics and Evolution 61: 363–380.

Wiens JJ and Morrill MC (2011) Missing data in phylogenetic analysis: reconciling results from simulations and empirical data. Systematic Biology 60(5): 719–731.

Yang Z (1996) Among‐site rate variation and its impact on phylogenetic analyses. Trends in Ecology and Evolution 11(9): 367–372.

Yang Z and Rannala B (2005) Branch‐length prior influences Bayesian posterior probability of phylogeny. Systematic Biology 54: 455–470.

Yang Z and Rannala B (2012) Molecular phylogenetics: principles and practice. Nature Reviews Genetics 13(5): 303–314.

Further Reading

Felsenstein J (2004) Inferring Phylogenies. Sunderland, Massachusetts: Sinauer Associates, 664 pp.

Gascuel O (ed.) (2005) Mathematics of Evolution and Phylogeny, 416 pp. New York: Oxford University Press.

Lemey P, Salemi M and Vandamme A‐M (eds) (2009) The Phylogenetic Handbook, 2nd edn, 723 pp. New York: Cambridge University Press.

Swofford DL, Olsen GJ, Waddel PJ and Hillis DM (1996) Phylogenetic inference. In: Hillis DM, Moritz C and Mable BK (eds) Molecular Systematics, pp. 407–514. Sunderland, Massachusetts: Sinauer Associates.

Yang Z (2006) Computational molecular evolution. New York: Oxford University Press, 357 pp.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Irisarri, Iker, and Zardoya, Rafael(Jun 2013) Phylogeny Reconstruction. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0001521]