Methodologies for Phylogenetic Inference

Abstract

Phylogenetic inference from homologous molecular sequences is key to hypothesis testing and problem solving not only in evolutionary biology but also in a wide variety of other fields – from medicine to ecology. Model‐based phylogenetic methods rely on Markov substitution models to describe the molecular evolution as a stochastic process of character substitution over time on a phylogenetic tree relating the sequences. Model parameters are estimated with standard statistical inference methods, namely, Bayesian and maximum likelihood approaches. A typical phylogenetic analysis first infers a multiple sequence alignment. Given this alignment, a phylogenetic tree is then estimated together with branch lengths and model parameters. Ideally, the alignment and phylogeny should be estimated simultaneously, amongst others to take alignment uncertainty into account.

Key Concepts

  • Sequence Alignment estimates an assignment of homologous molecular characters, that is, nucleotides, amino acids or codons related by common ancestry.
  • Phylogenetic Tree is the hierarchical representation of evolutionary relationships between homologous molecular sequences. The leaves of the tree usually represent the present day sequences, while the internal nodes represent the common ancestors.
  • Model of Molecular Evolution is the mathematical description of the process of sequence change through time, such as character substitutions, insertions and deletions.
  • Phylogenetic Likelihood is the probability function of observing the sequence data given the model of molecular evolution and phylogenetic tree.
  • Frequentist Phylogenetic Inference relies on optimised phylogenetic likelihood to estimate parameters of a model of molecular evolution and a phylogenetic tree.
  • Bayesian Phylogenetic Inference relies on phylogenetic likelihood and a prior probability distribution of parameters to obtain a posterior probability distribution of parameters of a model of molecular evolution and a phylogenetic tree.
  • Branch Support quantifies the uncertainty of phylogenetic inference by assigning statistical confidence to the inferred partitions (i.e. clades) on a phylogenetic tree.
  • Alignment Uncertainty quantifies the statistical confidence of sequence alignment, which compounds the uncertainty of phylogenetic inference.

Keywords: sequence alignment; phylogenetic tree; substitution model; insertion–deletion model; molecular evolution; alignment uncertainty; likelihood; branch support

Figure 1.

Overview of a standard work‐flow for phylogenetic inference. Molecular sequences are modelled to be evolving on a phylogenetic tree (phylogeny) according to a character substitution and insertion–deletion (indel) process. The tree topology describes common ancestry by speciation and gene duplication. The branch lengths represent the amount of change. Characters in the observed sequences related by substitutions only are termed homologous. A multiple sequence alignment (MSA) is a matrix where each row consists of one sequence, enriched by gaps to reflect indels, so that homologous characters are matched in the same column.

Ideally, owing to their interrelation, homology and phylogeny (blue box) should be estimated jointly using a model of substitution and indel (J). Amongst other advantages, the joint approach allows taking uncertainty in the MSA into account (for instance, by marginalising it out).

However, a sequential approach (green boxes) is prevalent. Here, first an MSA is reconstructed (A), often followed by filtering (F) – that is, removing unreliable columns – promoted as a way to increase the signal‐to‐noise ratio of the MSA. The actual tree estimation step (T) typically assumes a substitution model and treats the gaps in the MSA as missing data.

close

References

Anisimova M (ed.) (2012) Evolutionary genomics: statistical and computational methods. Vols 1 and 2. Methods in Molecular Biology. New York: Humana Press, Springer.

Anisimova M, Cannarozzi G and Liberles D (2010) Finding the balance between the mathematical and biological optima in multiple sequence alignment. Trends in Evolutionary Biology 2 (1): e7.

Anisimova M and Gascuel O (2006) Approximate likelihood‐ratio test for branches: a fast, accurate, and powerful alternative. Systematic Biology 55 (4): 539–552.

Anisimova M, Gil M, Dufayard J‐F, Dessimoz C and Gascuel O (2011) Survey of branch support methods demonstrates accuracy, power, and robustness of fast likelihood‐based approximation schemes. Systematic Biology 60 (5): 685–699.

Anisimova M and Kosiol C (2009) Investigating protein‐coding sequence evolution with probabilistic codon substitution models. Molecular Biology and Evolution 26 (2): 255–271.

Benner SA, Cohen MA and Gonnet GH (1993) Empirical and structural models for insertions and deletions in the divergent evolution of proteins. Journal of Molecular Biology 229 (4): 1065–1082.

de Oliveira Martins L, Mallo D and Posada D (2013) Phylogenetic likelihood. eLS. Chichester: John Wiley & Sons, Ltd.

de Oliveira T, Pybus OG, Rambaut A, et al. (2006) Molecular epidemiology: HIV‐1 and HCV sequences from Libyan outbreak. Nature 444 (7121): 836–837.

Dessimoz C and Gil M (2010) Phylogenetic assessment of alignments reveals neglected tree signal in gaps. Genome Biology 11: R37.

Durbin R, Eddy SR, Krogh A and Mitchison G (1998) Biological Sequence Analysis. Cambridge, MA: Cambridge University press.

Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14 (9): 755–763.

Fleissner R, Metzler M and von Haeseler A (2005) Simulteneous statistical alignment and phylogeny reconstruction. Systematic Biology 54 (4): 548–561.

Gascuel O (1997) BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Molecular Biology and Evolution 14 (7): 685–695.

Gotoh O (1999) Multiple sequence alignment: algorithms and applications. Advances in Biophysics 36: 159–206.

Loytynoja A (2012) Alignment methods: strategies, challenges, benchmarking, and comparative overview. Methods in Molecular Biology 855: 203–235.

Löytynoja A and Goldman N (2008) Phylogeny‐aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320 (5883): 1632–1635.

Lunter G, Miklos I, Drummond A, Jensen JL and Hein J (2005) Bayesian coestimation of phylogeny and sequence alignment. BMC Bioinformatics 6: 83.

McGuire G, Denham MC and Balding DJ (2001) Models of sequence evolution for DNA sequences containing gaps. Molecular Biology and Evolution 18 (4): 481–490.

Minh BQ, Nguyen MA and von Haeseler A (2013) Ultrafast approximation for phylogenetic bootstrap. Molecular Biology and Evolution 30 (5): 1188–1195.

Nesse RM, Stearns SC and Omenn GS (2006) Medicine needs evolution. Science 311 (5764): 1071.

Posada D (2012) Selection of phylogenetic models of molecular evolution. eLS. Chichester: John Wiley & Sons, Ltd.

Qian B and Goldstein RA (2001) Distribution of Indel lengths. Proteins 45 (1): 102–104.

Redelings BD and Suchard MA (2005) Joint Bayesian estimation of alignment and phylogeny. Systematic Biology 54 (3): 401–418.

Ren F, Tanaka H and Yang Z (2009) A likelihood look at the supermatrix‐supertree controversy. Gene 441 (1–2): 119–125.

Rivas E (2005) Evolutionary models for insertions and deletions in a probabilistic modeling framework. BMC Bioinformatics 6: 63.

Rivas E and Eddy SR (2008) Probabilistic phylogenetic inference with insertions and deletions. PLoS Computational Biology 4 (9): e1000172.

Saitou N and Nei M (1987) The neighbor‐joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4 (4): 406–425.

Saslis‐Lagoudakis CH, Savolainen V, Williamson EM, et al. (2012) Phylogenies reveal predictive power of traditional medicine in bioprospecting. Proceedings of the National Academy of Sciences of the United States of America 109 (39): 15835–15840.

Scaduto DI, Brown JM, Haaland WC, et al. (2010) Source identification in two criminal cases using phylogenetic analysis of HIV‐1 DNA sequences. Proceedings of the National Academy of Sciences of the United States of America 107 (50): 21242–21247.

Siepel A and Haussler D (2004) Combining phylogenetic and hidden Markov models in biosequence analysis. Journal of Computational Biology 11 (2–3): 413–428.

Stamatakis A, Hoover P and Rougemont J (2008) A rapid bootstrap algorithm for the RAxML Web servers. Systematic Biology 57 (5): 758–771.

Szöllsi GJ, Tannier E, Daubin V and Boussau B (2015) The inference of gene trees with species trees. Systematic Biology 64 (1): e42–e62.

Thorne JL, Kishino H and Felsenstein J (1991) An evolutionary model for maximum likelihood alignment of DNA sequences. Journal of Molecular Evolution 33 (2): 114–124.

Thuiller W, Lavergne S, Roquet C, et al. (2011) Consequences of climate change on the tree of life in Europe. Nature 470 (7335): 531–534.

Whelan S, Lio P and Goldman N (2001) Molecular phylogenetics: state‐of‐the‐art methods for looking into the past. Trends in Genetics 17 (5): 262–272.

Wong KM, Suchard MA and Huelsenbeck JP (2008) Alignment uncertainty and genomic analysis. Science 319 (5862): 473–476.

Worobey M, Han GZ and Rambaut A (2014) A synchronized global sweep of the internal genes of modern avian influenza virus. Nature 508: 254–257.

Yang Z and Rannala B (2012) Molecular phylogenetics: principles and practice. Nature Reviews Genetics 13 (5): 303–314.

Further Reading

Felsenstein J (2003) Inferring Phylogenies, 2nd (664 pages) ISBN‐10: 0‐87‐893177‐5 edn. Sinauer Associates.

Semple C and Steel M (2003) Phylogenetics. Oxford Lecture Series in Mathematics and Its Applications 24. New York: Oxford University Press (256 pages) ISBN‐13: 978‐0‐19‐850942‐4.

Yang Z (2006) Computational Molecular Evolution. Oxford Series in Ecology and Evolution. Oxford University Press (376 pages) ISBN‐10: 0‐19‐856702‐2.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Gil, Manuel, and Anisimova, Maria(Apr 2015) Methodologies for Phylogenetic Inference. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0025545]