Selection of Phylogenetic Models of Molecular Evolution

Abstract

The use of different models of molecular evolution can change the conclusions derived from the evolutionary analysis of deoxyribonucleic acid (DNA) and protein sequence alignments. Several methods have been developed for the selection of the probabilistic model of nucleotide substitution or amino acid replacement that best fits the particular data at hand. Simulation studies indicate that these techniques work very well, and in recent years these methods have been implemented in a number of programs such as ModelTest, ProtTest and recently jModelTest. These programs also provide various tools for quantifying uncertainty in model selection, model averaging or multimodel inference.

Key Concepts:

  • Models of molecular evolution allow us to calculate probabilities of change between nucleotide and amino acid sequences.

  • The use of different models of evolution can change the outcome of the phylogenetic analysis.

  • Different datasets can be bestā€fitted by distinct models.

  • Model selection techniques are quite accurate at identifying the generating model in simulations.

  • Programs like jModelTest and ProtTest facilitate the routinary selection of models of evolution in phylogenetics.

Keywords: model selection; likelihood ratio tests; AIC; BIC; DT; model averaging

Figure 1.

Console of jModelTest 2, simultaneously running four different models on four different threads.

close

References

Abascal F, Zardoya R and Posada D (2005) ProtTest: selection of best‐fit models of protein evolution. Bioinformatics 21: 2104–2105.

Abdo Z, Minin VN, Joyce P and Sullivan J (2005) Accounting for uncertainty in the tree topology has little effect on the decision‐theoretic approach to model selection in phylogeny estimation. Molecular Biology and Evolution 22: 691–703.

Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN and Csaki F (eds) Second International Symposium on Information Theory, pp. 267–281. Budapest: Akademiai Kiado.

Arbiza L, Patricio M, Dopazo H and Posada D (2011) Genome‐wide heterogeneity of nucleotide substitution model fit. Genome Biology and Evolution 3: 896–908.

Box GEP (1976) Science and statistics. Journal of the American Statistical Association 71: 791–799.

Buckley TR and Cunningham CW (2002) The effects of nucleotide substitution model assumptions on estimates of nonparametric bootstrap support. Molecular Biology and Evolution 19: 394–405.

Cannarozzi GM and Schneider A (eds) (2012) Codon Evolution: Mechanisms and Models. Oxford, UK: Oxford University Press.

Darriba D, Taboada GL, Doallo R and Posada D (2011) ProtTest 3: fast selection of best‐fit models of protein evolution. Bioinformatics 27: 1164–1165.

Edwards AWF (1972) Likelihood. Cambridge, UK: Cambridge University Press.

Fan Y, Wu R, Chen M‐H, Kuo L and Lewis PO (2011) Choosing among partition models in Bayesian phylogenetics. Molecular Biology and Evolution 28: 523–532.

Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17: 368–376.

Frati F, Simon C, Sullivan J and Swofford DL (1997) Evolution of the mitochondrial cytochrome oxidase II gene in Collembola. Journal of Molecular Evolution 44: 145–158.

Goldman N (1993) Statistical tests of models of DNA substitution. Journal of Molecular Evolution 36: 182–198.

Hasegawa M, Kishino K and Yano T (1985) Dating the human‐ape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution 22: 160–174.

Huelsenbeck JP and Crandall KA (1997) Phylogeny estimation and hypothesis testing using maximum likelihood. Annual Review of Ecology and Systematics 28: 437–466.

Huelsenbeck JP, Larget B and Alfaro ME (2004) Bayesian phylogenetic model selection using reversible jump Markov Chain Monte Carlo. Molecular Biology and Evolution 21: 1123–1133.

Jukes TH and Cantor CR (1969) Evolution of protein molecules. In: Munro HM (ed.) Mammalian Protein Metabolism. pp. 21–132. New York: Academic Press.

Keane TM, Creevey CJ, Pentony MM, Naughton TJ and Mclnerney JO (2006) Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evolutionary Biology 6: 29.

Kedzierska AM, Drton M, Guigó R and Casanellas M (2012) SPIn: model selection for phylogenetic mixtures via linear invariants. Molecular Biology and Evolution 29: 929–937.

Kelsey CR, Crandall KA and Voevodin AF (1999) Different models, different trees: the geographic origin of PTLV‐I. Molecular Phylogenetics and Evolution 13: 336–347.

Kendall M and Stuart A (1979) The Advanced Theory of Statistics. London: Charles Griffin.

Kosakovsky Pond SL and Frost SD (2005) Datamonkey: rapid detection of selective pressure on individual sites of codon alignments. Bioinformatics 21: 2531–2533.

Kosakovsky Pond SL, Frost SD and Muse SV (2005) HYPHY: hypothesis testing using phylogenies. Bioinformatics 21: 676–679.

Lemmon AR and Moriarty EC (2004) The importance of proper model assumption in Bayesian phylogenetics. Systematic Biology 53: 265–277.

Luo A, Qiao H and Zhang Y (2010) Performance of criteria for selecting evolutionary models in phylogenetics: a comprehensive study based on simulated datasets. BMC Evolutionary Biology 10: 242.

Minin V, Abdo Z, Joyce P and Sullivan J (2003) Performance‐based selection of likelihood models for phylogeny estimation. Systematic Biology 52: 674–683.

Posada D (2001) The effect of branch length variation on the selection of models of molecular evolution. Journal of Molecular Evolution 52: 434–444.

Posada D (2006) ModelTest Server: a web‐based tool for the statistical selection of models of nucleotide substitution online. Nucleic Acids Research 34: W700–W703.

Posada D (2008) jModelTest: phylogenetic model averaging. Molecular Biology and Evolution 25: 1253–1256.

Posada D and Buckley TR (2004) Model selection and model averaging in phylogenetics: advantages of akaike information criterion and bayesian approaches over likelihood ratio tests. Systematic Biology 53: 793–808.

Posada D and Crandall KA (1998) MODELTEST: testing the model of DNA substitution. Bioinformatics 14: 817–818.

Posada D and Crandall KA (2001a) Selecting models of nucleotide substitution: an application to human immunodeficiency virus 1 (HIV‐1). Molecular Biology and Evolution 18: 897–906.

Posada D and Crandall KA (2001b) Selecting the best‐fit model of nucleotide substitution. Systematic Biology 50: 580–601.

Posada D and Crandall KA (2001c) Simple (wrong) models for complex trees: a case from retroviridae. Molecular Biology and Evolution 18: 271–275.

Pupko T, Huchon D, Cao Y, Okada N and Hasegawa M (2002) Combining multiple data sets in a likelihood analysis: which models are the best? Molecular Biology and Evolution 19: 2294–2307.

Ripplinger J and Sullivan J (2008) Does choice in model selection affect maximum likelihood analysis? Systematic Biology 57: 76–85.

Rodrigue N, Lartillot N and Philippe H (2008) Bayesian comparisons of codon substitution models. Genetics 180: 1579–1591.

Schwarz G (1978) Estimating the dimension of a model. The Annals of Statistics 6: 461–464.

Suchard MA, Weiss RE and Sinsheimer JS (2001) Bayesian selection of continuous‐time Markov chain evolutionary models. Molecular Biology and Evolution 18: 1001–1013.

Sullivan J and Joyce P (2005) Model selection in phylogenetics. Annual Review of Ecology, Evolution and Systematics 36: 445–466.

Sullivan J, Markert JA and Kilpatrick CW (1997) Phylogeography and molecular systematics of the Peromyscus aztecus species group (Rodentia: Muridae) inferred using parsimony and likelihood. Systematic Biology 46: 426–440.

Sullivan J and Swofford DL (1997) Are guinea pigs rodents? The importance of adequate models in molecular phylogenies. Journal of Mammalian Evolution 4: 77–86.

Tamura K (1994) Model selection in the estimation of the number of nucleotide substitutions. Molecular Biology and Evolution 11: 154–157.

Tavaré S (1986) Some probabilistic and statistical problems in the analysis of DNA sequences. In: Miura RM (ed.) Some Mathematical Questions in Biology – DNA Sequence Analysis. pp. 57–86. Providence, RI: American Mathematical Society.

Thorne JL and Goldman N (2003) Probabilistic models for the study of protein evolution. In: Balding DJ, Bishop M and Cannings C (eds) Handbook of Statistical Genetics. pp. 209–226. Chichester, England: Wiley.

Yang Z (1996) Among‐site rate variation and its impact on phylogenetic analysis. Trends in Ecology & Evolution 11: 367–372.

Yang Z, Goldman N and Friday A (1995) Maximum likelihood trees from DNA sequences: a peculiar statistical estimation problem. Systematic Biology 44: 384–399.

Zhang J (1999) Performance of likelihood ratio tests of evolutionary hypotheses under inadequate substitution models. Molecular Biology and Evolution 16: 868–875.

Further Reading

Johnson JB and Omland KS (2003) Model selection in ecology and evolution. Trends in Ecology & Evolution 19: 101–108.

Kelchner SA and Thomas MA (2007) Model use in phylogenetics: nine key questions. Trends in Ecology and Evolution 22: 87–94.

Posada D (2009) Selecting models of molecular evolution. In: Vandamme AM, Salemi M and Lemey P (eds) The Phylogenetic Handbook, 2nd edn. Cambridge, UK: Cambridge University Press.

Posada D (ed.) (2009) Selection of models of DNA evolution with jModelTest. In: Bioinformatic Analysis of DNA Sequences. Clifton, NJ, USA: Humana Press.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Posada, David(Jun 2012) Selection of Phylogenetic Models of Molecular Evolution. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0022845]