Molecular Phylogeny Reconstruction


Molecular phylogenetics deals with the inference of evolutionary relationships among individuals, populations, species and higher taxonomic entities using molecular data. By modelling patterns of molecular change in protein and deoxyribonucleic acid (DNA) sequences over time, scientists now routinely reconstruct evolutionary histories of species and evaluate confidence levels of the inferences. Many different approaches to estimate phylogenies exist and comparisons among results are key to determine the robustness of the inference. Molecular phylogenetic inferences have been not only supportive of traditional phylogenies but also instrumental in resolving some difficult questions regarding branching orders within many evolutionary lineages. Because of the vast and growing databases of molecular sequence information, this area promises to be an important key to understanding the history and relationships of all life forms on this planet.

Key Concepts

  • Molecular data are a powerful source of information to reconstruct relationships among individuals, populations, species and higher taxonomic groups.
  • Largeā€scale sequencing projects are providing enormous amounts of molecular data to reconstruct detailed phylogenetic trees.
  • Reconstructing phylogenetic trees is a multistep process that requires the identification of homologous sequences, their alignment and finally the reconstruction of lineage relationships.
  • Evaluating the accuracy of phylogenetic trees is fundamental. This is achieved by comparing the phylogenies obtained by different approaches and also the statistical significance (e.g. bootstrap support values) of each phylogeny.
  • Phylogenetic approaches vary in their accuracy based on evolutionary processes, substitution models, rate variation and other biological factors that are intrinsic to each species.

Keywords: phylogeny; evolution; molecular evolution; sequence analysis; bioinformatics

Figure 1. An alignment of a portion of the γ‐fibrinogen gene sequence from five mammals. Insertion–deletion mutations predicted by sequence alignment are shown with hyphens (‐) and the missing data are shown with question marks (?).
Figure 2. Rooted (A) and unrooted (B) tree of five sequences. Branch lengths are drawn proportional to evolutionary distance, which can be expressed in the units of time or the number of substitutions.


Alfaro ME and Holder MT (2006) The posterior and the prior in Bayesian phylogenetics. Annual Review of Ecology, Evolution, and Systematics 37: 19–42.

Ashkenazy H, Sela I, Levy Karin E, Landan G and Pupko T (2019) Multiple sequence alignment averaging improves phylogeny reconstruction. Systematic Biology 68: 117–130.

Baldauf SL and Palmer JD (1993) Animals and fungi are each other's closest relatives: congruent evidence from multiple proteins. Proceedings of the National Academy of Sciences of the USA 90: 11558–11562.

Bergsten J (2005) A review of long‐branch attraction. Cladistics 21: 163–193.

Chatzou M, Magis C, Chang J‐M, et al. (2015) Multiple sequence alignment modeling: methods and applications. Briefings in Bioinformatics 17: 1009–1023.

Chiari Y, Cahais V, Galtier N and Desluc F (2012) Phylogenomic analyses support the position of turtles as the sister group of birds and crocodiles (Archosauria). BMC Biology 10: 65.

Chowdhury B and Garai G (2017) A review of multiple sequence alignment from the perspective of genetic algorithm. Genomics 109: 419–431.

Crawford NG, Faircloth BC, McCormack JE, et al. (2012) More than 1000 ultraconserved elements provide evidence that turtles are the sister group of archosaus. Biology Letters 8: 783–786.

Darriba D, Taboada GL, Doallo R and Posada D (2012) jModelTest 2: more models, new heuristics and parallel computing. Nature Methods 9: 772–772.

Dessimoz C and Gil M (2010) Phylogenetic assessment of alignments reveals neglected tree signal in gaps. Genome Biology 11: R37.

Dopazo H and Dopazo J (2005) Genome‐scale evidence of the nematode‐arthropod clade. Genome Biology 6: R41.

Douady CJ, Delsuc F, Boucher Y, Doolittle WF and Douzery EJP (2003) Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. Molecular Biology and Evolution 20: 248–254.

Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17: 368–376.

Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39: 783–791.

Felsenstein J (1988) Phylogenies from molecular sequences: inference and reliability. Annual Reviews in Genetics 22: 521–565.

Fitch W (1971) Toward defining the course of evolution: minimum change for a specific tree topology. Systematic Zoology 20: 406–416.

Gatesy JC (1997) More DNA support for the Cetacea/Hippopotamidae clade: the blood‐clotting protein gene gamma‐fibrinogen. Molecular Biology and Evolution 14: 537–543.

Guindon S and Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic Biology 52: 696–704.

Hall BG (2005) Comparison of the accuracies of several phylogenetic methods using protein and DNA sequences. Molecular Biology and Evolution 22: 792–802.

Hedges SB, Tao Q, Walker M and Kumar S (2018) Accurate timetrees require accurate calibrations. Proceedings of the National Academy of Sciences 115: E9510–E9511.

Hordijk W and Gascuel O (2005) Improving the efficiency of SPR moves in phylogenetic tree search methods based on maximum likelihood. Bioinformatics 21: 4338–4347.

Huelsenbeck JP, Ronquist F, Nielsen R and Bollback J (2001) Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294: 2310–2314.

Kapli P, Yand Z and Telford MJ (2020) Phylogenetic tree building in the genomic age. Nature Reviews Genetics 21: 428–444.

Koonin EV (2010) The origin and early evolution of eukaryotes in the light of phylogenomics. Genome Biology 11: 209.

Kumar S and Dudley J (2007) Bioinformatics software for biologists in the genomics era. Bioinformatics 23: 1713–1717.

Kumar S and Filipski A (2007) Multiple sequence alignment: in pursuit of homologous DNA positions. Genome Research 17: 127–135.

Kumar S, Stecher G, Li M, Knyaz C and Tamura K (2018) Mega X: Molecular Evolutionary Genetics Analysis across computing platforms. Molecular Biology and Evolution 35: 1547–1549.

Lanfear R, Calcott B, Ho SYW and Guindon S (2012) PartitionFinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Molecular Biology and Evolution 29: 1695–1701.

Lemey P and Salemi M (2009) The phylogenetic handbook: A practical approach to phylogenetic analysis and hypothesis testing. Cambridge University Press: Cambridge.

Minh BQ, Schmidt HA, Chernomor O, et al. (2020) IQTREE 2: New models and efficient methods for phylogenetic inference in the genomic era. Molecular Biology and Evolution 37: 1530–1534.

Mukherjee S, Stamatis D, Bertsch J, et al. (2018) Genomes Online Databases (GOLD) v.7: updates and new features. Nucleic Acids Research 47: D649–D659.

Murphy WJ, Pringle TH, Crider TA, Springer MS and Miller W (2007) Using genomic data to unravel the root of the placental mammal phylogeny. Genome Research 17: 413–421.

Nguyen L‐T, Schmidt HA, von Haeseler A and Minh BQ (2015) IQ‐TREE: a fast and effective stochastic algorithm for estimating maximum‐likelihood phylogenies. Molecular Biology and Evolution 32: 268–274.

Paraskevis DP, Lemey P, Salemi M, et al. (2003) Analysis of the evolutionary relationships of HIV‐1 and SIVcpz sequences using Bayesian inference: implications for the origin of HIV‐1. Molecular Biology and Evolution 20: 1986–1996.

Pond SLK, Frost SD and Muse SV (2005) HyPhy: hypothesis testing using phylogenies. Bioinformatics 21: 676–679.

Posada D and Buckley TR (2004) Model selection and model averaging in phylogenetics: advantages of Akaike Information Criterion and Bayesian approaches over likelihood ratio tests. Systematic Biology 53: 793–808.

Putnam NH, Srivastava M, Hellsten U, et al. (2007) Sea anemone genome reveals ancestral eumetazoan gene repertoire and genomic organization. Science 317: 86–94.

Ronquist F and Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19: 1572–1574.

Saitou N and Nei M (1987) The neighbour‐joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 6: 514–525.

Satta Y, Klein J and Takahata N (2000) DNA archives and our nearest relative: the trichotomy problem revisited. Molecular Phylogenetics and Evolution 14: 259–275.

Springer MS, Murphy WJ, Eizirik E and O'Brian SJ (2003) Placental mammal diversification and the cretaceous‐tertiary boundary. Proceedings of the National Academy of Sciences of the USA 100: 1056–1061.

Stamatakis A, Ludwig T and Meier H (2005) RAxML‐III: a fast program for maximum likelihood‐based inference of large phylogenetic trees. Bioinformatics 21: 456–463.

Stamatakis A (2006) RAxML‐VI‐HPC: Maximum likelihood‐based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22: 2688–2690.

Suzuki Y, Glazko GV and Nei M (2002) Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics. Proceedings of the National Academy of Sciences of the USA 99: 16138–16143.

Takahashi K and Nei M (2000) Efficiencies of fast algorithms of phylogenetic inference under the criteria of maximum parsimony, minimum evolution, and maximum likelihood when a large number of sequences are used. Molecular Biology and Evolution 17: 1251–1258.

Thompson JD, Linard B, Lecompte O and Poch O (2011) A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PLoS One 6: e18093.

Wang L‐S, Leebens‐Mack J, Kerr Wall P, et al. (2011) the impact of multiple protein sequence alignment on phylogenetic estimation. IEE/ACM transactions on Computational Biology and Bioinformatics 8: 1108–1119.

Wolf YI, Rogozin IB and Koonin EV (2004) Coelomata and not Ecdysozoa: evidence from genome‐wide phylogenetic analysis. Genome Research 14: 29–36.

Yang Z (1996) Among‐site rate variation and its impact on phylogenetic analyses. Trends in Ecology & Evolution 11: 367–371.

Yang Z (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. Computer Applications in the Biosciences 13: 555–556.

Yang Z and Rannala B (2005) Branch‐length prior influences Bayesian posterior probability of phylogeny. Systematic Biology 54: 455–470.

Yang Z and Rannala B (2012) Molecular phylogenetics: principles and practice. Nature Reviews Genetics 13: 303–314.

Young AD and Gillung JP (2019) Phylogenomics – principles, opportunities and pitfalls of big‐data phylogenetics. Systematic Entomology 45: 225–247.

Further Reading

Drummond AJ and Rambaut A (2006) BEAST v1.4. Available from:

Felsenstein J (1993) PHYLIP (phylogeny inference package). Version 3.6a. Distributed by the author, Department of Genetics, University of Washington, Seattle.

Felsenstein J (2004) Inferring Phylogenies. Sinauer Associates: Sunderland, MA.

Hall B (2007) Phylogenetic Trees Made Easy, 3rd edn. Sinauer Associates: Sunderland, MA.

Lemey P, Salemi M and Vandamme AM (2009) The Phylogenetic Handbook: A Practical Approach to Phylogenetic Analysis and Hypothesis Testing. Cambridge University Press: Cambridge.

Miyamoto MM and Cracraft J (1991) Phylogenetic Analysis of DNA Sequences. Oxford University Press: New York.

Nei M (1987) Molecular Evolutionary Genetics. Columbia University Press: New York.

Nei M and Kumar S (2000) Molecular Evolution and Phylogenetics. Oxford University Press: New York.

Swofford DL (2001) PAUP#: Phylogenetic Analysis Using Parsimony (and Other Methods) 4.0 Beta. Sinauer Associates: Sunderland, MA.

Yang Z (2006) Computational Molecular Evolution. Oxford University Press: Oxford.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Battistuzzi, Fabia U, and Kumar, Sudhir(Oct 2020) Molecular Phylogeny Reconstruction. In: eLS. John Wiley & Sons Ltd, Chichester. [doi: 10.1002/9780470015902.a0029212]