Phylogenetic Likelihood

Abstract

The input data for any phylogenetic analysis is a set of characters belonging to different individuals or loci, and assumed to have a common ancestor. Given a set of aligned deoxyribonucleic acid (DNA) or protein sequences, the likelihood of a phylogenetic tree depicting their ancestry relationships will be proportional to the probability of the alignment having been generated along this tree. The likelihood can be used not only as an objective criterion to find the optimal phylogenetic tree, but also to compare trees and evolutionary models, always in a probabilistic framework. The likelihood is the central ingredient of any statistical phylogenetic analysis, as it makes the connection between the data (the alignment) and the model, including the tree, branch lengths and other evolutionary assumptions.

Key Concepts:

  • The phylogenetic likelihood is the probability of the DNA sequence alignment X given a model of nucleotide substitution with parameters θ and phylogenetic tree τ (topology plus branch lengths).

  • The phylogenetic likelihood can be similarly calculated for amino acid or coding sequences, and they are all based on the instantaneous probability of a change of state.

  • The phylogenetic likelihood is the basis for any probabilistic phylogenetic inference, for both classical and Bayesian analyses.

  • There are many substitution models available, and the likelihood allows us to compare them and to find the best model, as well as the best phylogenetic tree.

  • In maximum likelihood phylogenetic estimation, the objective is to find the set of parameter values, particularly tree topology and branch lengths, that maximize the likelihood function.

  • In a Bayesian setting, the posterior probability of a particular set of phylogenetic parameter values (e.g. topology and branch lengths) is proportional to their likelihood multiplied by the prior probability of these values. The objective is then to describe these values with their associated posterior probabilities.

Keywords: likelihood; tree inference; substitution models; dating; molecular adaptation; phylogenetic analysis; probability

Figure 1.

Alignment and tree. There are three DNA‐aligned sequences 1, 2 and 3 connected by a tree with internal nodes 4 and 5.

close

References

Bromham L and Penny D (2003) The modern molecular clock. Nature Reviews Genetics 4(3): 216–224. doi:10.1038/nrg1020.

Drummond AJ , Ho SYW , Phillips MJ and Rambaut A (2006) Relaxed phylogenetics and dating with confidence. PLoS Biology 4(5): e88. doi:10.1371/journal.pbio.0040088.

Edwards AWF and Cavalli‐Sforza LL (1964) Reconstruction of evolutionary trees. Phenetic and Phylogenetic Classification 6: 67–76.

Edwards AWF (1992) Likelihood. Baltimore, Maryland, USA: The Johns Hopkins University Press; Expanded edition.

Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17(6): 368–376.

Galassi M , Davies J , Theiler J et al. (2009) GNU Scientific Library Reference Manual, 3rd edn. Godalming, UK: Network Theory Ltd.

Goldman N (1993) Statistical tests of models of DNA substitution. Journal of Molecular Evolution 36(2): 182–198.

Goldman N and Yang Z (1994) A codon‐based model of nucleotide substitution for protein‐coding DNA sequences. Molecular Biology and Evolution 11(5): 725–736. http://www.ncbi.nlm.nih.gov/pubmed/7968486.

Hordijk W and Gascuel O (2005) Improving the efficiency of SPR moves in phylogenetic tree search methods based on maximum likelihood. Bioinformatics 21(24): 4338–4347. doi:10.1093/bioinformatics/bti713.

Huelsenbeck JP and Crandall K (1997) Phylogeny estimation and hypothesis testing using maximum likelihood. Annual Review of Ecology and Systematics 28: 437–466.

Huelsenbeck JP and Rannala B (1997) Phylogenetic methods come of age: testing hypotheses in an evolutionary context. Science 276(5310): 227–232.

Huelsenbeck JP , Ronquist F , Nielsen R and Bollback JP (2001) Bayesian Inference of phylogeny and its impact on evolutionary biology. Science 294(5550): 2310.

Jukes TH and Cantor CR (1969). Evolution of protein molecules. In: Munro HN (ed.) Mammalian Protein Metabolism. pp 21–123. New York: Academic Press.

Kiryu H (2011) Sufficient statistics and expectation maximization algorithms in phylogenetic tree models. Bioinformatics 27(17): 2346–2353. doi:10.1093/bioinformatics/btr420.

Lopez P , Casane D and Philippe H (2002) Heterotachy, an important process of protein evolution. Molecular Biology and Evolution 19(1): 1–7.

Lunter G and Hein J (2004) A nucleotide substitution model with nearest‐neighbour interactions. Bioinformatics 20(Suppl. 1): i216–i223. doi:10.1093/bioinformatics/bth901.

Muse SV and Gaut BS (1994) A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Molecular Biology and Evolution 11(5): 715–724.

Neyman J (1971) Molecular studies of evolution: a source of novel statistical problems. In: Gupta SS and Yackel J (eds) Statistical Decision Theory and Related Topics, pp 1–27. New York, NY: Academic Press.

Posada D and Buckley TR (2004) Model selection and model averaging in phylogenetics: Advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests. Systematic Biology 53(5): 793–808. doi:10.1080/10635150490522304.

Robinson DM , Jones DT , Kishino H , Goldman N and Thorne JL (2003) Protein evolution with dependence among codons due to tertiary structure. Molecular Biology and Evolution 20(10): 1692–1704. doi:10.1093/molbev/msg184.

Seo T‐K and Kishino H (2009) Statistical comparison of nucleotide, amino acid, and codon substitution models for evolutionary analysis of protein‐coding sequences. Systematic Biology 58(2): 199–210.

Tavaré S (1986) Some probabilistic and statistical problems on the analysis of DNA sequences. Lectures on Mathematics in the Life Sciences 17: 57–86.

Thorne JL and Kishino H (2002) Divergence time and evolutionary rate estimation with multilocus data. Systematic Biology 51(5): 689–702. doi:10.1080/10635150290102456.

Weinberg MD (2012) Computing the Bayes factor from a Markov chain Monte Carlo simulation of the posterior distribution. Bayesian Analysis 7(3): 737–770. doi:10.1214/12‐BA725.

Yang Z (1994) Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. Journal of Molecular Evolution 39: 306–314.

Further Reading

Felsenstein J (2003) Inferring Phylogenies, 2nd edn. Sunderland, MA, USA: Sinauer Associates.

Yang Z (2006) Computational Molecular Evolution. Oxford, UK: Oxford University Press

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
de Oliveira Martins, Leonardo, Mallo, Diego, and Posada, David(Sep 2013) Phylogenetic Likelihood. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0005141]