Phylogenetic Relationships Deduced from Whole Genome Comparisons


In the second half of the twentieth century, gene sequences have been used to study their evolution as well as the history of species. A decade ago, progress in sequencing techniques provided the opportunity to use whole genome sequences for reconstructing species phylogeny. Several phylogenomic methods have been used to exploit these large amounts of data, and those that are based on homologous or orthologous characters have been favoured. Among these, the supertree and supermatrix approaches have been very successful at resolving parts of the tree of life. However, these methods discard a great proportion of the evolutionary information present in genomes. New integrative models of genome evolution could make a better use of whole genome sequences and thus improve the resolution of the tree of life.

Key concepts

  • Genomes contain a very large amount of information relevant for reconstructing their history and the history of species.

  • Among phylogenomic methods, only those based on homologous/orthologous characters can reconstruct a species tree.

  • Rare genomic changes, supertree and supermatrice methods have been the most extensively used.

  • Genome‐sized datasets have permitted confirming key phylogenies, and have also provided new insights into important parts of the tree of life.

  • New methods that combine different types of genomic information could further improve the resolution of the tree of life.

Keywords: phylogenomics; evolution; gene duplication; lateral gene transfer; trans‐specific polymorphisms

Figure 1.

The many roads to genome trees. Genome trees can be obtained by different types of methods. Two families can be defined: those that need homology/orthology relationships between characters (lower part of the figure), and those that do not (upper part).

Figure 2.

Orthologous characters, and three evolutionary scenarios why character history may differ from the species history. (a) Orthologous characters are characters whose history corresponds to the history of speciations. Here the species tree (large tubular structure) and the gene tree (blue lines) have the same topology, which groups A with B. (b) Duplications (orange circle) and losses complicate the history of characters, and can hide the history of speciations. Here a duplication preceding speciation 1, and character losses after speciation 1 lead to a history different from the species history, where B and C are grouped together, a situation named ‘hidden paralogy’. (c) A lateral transfer from an ancestor of species C to an ancestor of species B also leads to a history where B and C are clustered. (d) Trans‐specific polymorphisms can also render character trees different from the species tree. At any given time in a species, characters can exist in different alleles. At speciation 1, by chance, the ancestors of species B and C possess the blue form of the character, and the ancestor of the species A possesses the red form. The resulting character tree is different from the species tree.

Figure 3.

Integrative models for inferring species phylogeny from whole genomes. All information contained in genomes could be used together, through appropriate models of evolution. Models of sequence evolution allow exploiting events of substitution and possibly events of insertion/deletion, models of gene family evolution events of duplication, loss, transfer and trans‐specific polymorphism, models of character evolution rare genomic changes and models of gene order evolution events of genome rearrangement.



Altschul SF, Madden TL, Schäffer AA et al. (1997) Gapped BLAST and PSI‐BLAST: a new generation of protein database search programs. Nucleic Acids Research 25(17): 3389–3402.

Baldauf SL and Palmer JD (1993) Animals and fungi are each other's closest relatives: congruent evidence from multiple proteins. Proceedings of the National Academy of Sciences of the USA 90(24): 11558–11562.

Bapteste E, Brinkmann H, Lee JA et al. (2002) The analysis of 100 genes supports the grouping of three highly divergent amoebae: Dictyostelium, Entamoeba, and Mastigamoeba. Proceedings of the National Academy of Sciences of the USA 99(3): 1414–1419.

Bininda‐Emonds ORP (2004) The evolution of supertrees. Trends in Ecological Evolution 19(6): 315–322.

Bininda‐Emonds ORP, Cardillo M, Jones KE et al. (2007) The delayed rise of present‐day mammals. Nature 446(7135): 507–512.

Boore JL and Brown WM (1998) Big trees from little genomes: mitochondrial gene order as a phylogenetic tool. Current Opinion in Genetics & Development 8(6): 668–674.

Boore JL, Collins TM, Stanton D, Daehler LL and Brown WM (1995) Deducing the pattern of arthropod phylogeny from mitochondrial DNA rearrangements. Nature 376(6536): 163–165.

Brochier C, Bapteste E, Moreira D and Philippe H (2002) Eubacterial phylogeny based on translational apparatus proteins. Trends in Genetics 18(1): 1–5.

Brochier C, Forterre P and Gribaldo S (2004) Archaeal phylogeny based on proteins of the transcription and translation machineries: tackling the Methanopyrus kandleri paradox. Genome Biology 5(3): R17.

Brown JR, Douady CJ, Italia MJ, Marshall WE and Stanhope MJ (2001) Universal trees based on large combined protein sequence data sets. Nature Genetics 28(3): 281–285.

Ciccarelli FD, Doerks T, von Mering C et al. (2006) Toward automatic reconstruction of a highly resolved tree of life. Science 311(5765): 1283–1287.

Csuros M and Miklos I (2006) A Probabilistic Model for Gene Content Evolution with Duplication, Loss, and Horizontal Transfer, vol. 3909/2006. Berlin: Springer. pp. 206–220.

Dagan T and Martin W (2006) The tree of one percent. Genome Biology 7(10): 118.

Darling AE, Miklós I and Ragan MA (2008) Dynamics of genome rearrangement in bacterial populations. PLoS Genetics 4(7): e1000128.

Daubin V, Gouy M and Perrière G (2002) A phylogenomic approach to bacterial phylogeny: evidence of a core of genes sharing a common history. Genome Research 12(7): 1080–1090.

Davies TJ, Barraclough TG, Chase MW et al. (2004) Darwin's abominable mystery: insights from a supertree of the angiosperms. Proceedings of the National Academy of Sciences of the USA 101(7): 1904–1909.

Deeds EJ, Hennessey H and Shakhnovich EI (2005) Prokaryotic phylogenies inferred from protein structural domains. Genome Research 15(3): 393–402.

Degnan JH and Rosenberg NA (2006) Discordance of species trees with their most likely gene trees. PLoS Genetics 2(5): e68.

Delsuc F, Brinkmann H and Philippe H (2005) Phylogenomics and the reconstruction of the tree of life. Nature Review. Genetics 6(5): 361–375.

Delsuc F, Brinkmann H, Chourrout D and Philippe H (2006) Tunicates and not cephalochordates are the closest living relatives of vertebrates. Nature 439(7079): 965–968.

Dunn CW, Hejnol A, Matus DQ et al. (2008) Broad phylogenomic sampling improves resolution of the animal tree of life. Nature 452(7188): 745–749.

Edwards SV, Liu L and Pearl DK (2007) High‐resolution species trees without concatenation. Proceedings of the National Academy of Sciences of the USA 104(14): 5936–5941.

Glenner H, Hansen AJ, Sørensen MV et al. (2004) Bayesian inference of the metazoan phylogeny; a combined molecular and morphological approach. Current Biology 14(18): 1644–1649.

Gouy M and Li WH (1989) Molecular phylogeny of the kingdoms Animalia, Plantae, and Fungi. Molecular Biology and Evolution 6(2): 109–122.

Hackett SJ, Kimball RT, Reddy S et al. (2008) A phylogenomic study of birds reveals their evolutionary history. Science 320(5884): 1763–1768.

Henz SR, Huson DH, Auch AF, Nieselt‐Struwe K and Schuster SC (2005) Whole‐genome prokaryotic phylogeny. Bioinformatics 21(10): 2329–2335.

Jansen RK, Cai Z, Raubeson LA et al. (2007) Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome‐scale evolutionary patterns. Proceedings of the National Academy of Sciences of the USA 104(49): 19369–19374.

Jeffroy O, Brinkmann H, Delsuc F and Philippe H (2006) Phylogenomics: the beginning of incongruence? Trends in Genetics 22(4): 225–231.

Kriegs JO, Churakov G, Kiefmann M et al. (2006) Retroposed elements as archives for the evolutionary history of placental mammals. PLoS Biology 4(4): e91.

Larget B, Kadane JB and Simon DL (2005) A Bayesian approach to the estimation of ancestral genome arrangements. Molecular Phylogenetics and Evolution 36(2): 214–223.

Leigh JW, Susko E, Baumgartner M and Roger AJ (2008) Testing congruence in phylogenomic analysis. System Biology 57(1): 104–115.

Li W‐H (1997) Molecular Evolution. Sunderland, MA: Sinauer Associates Inc.

Lin J and Gerstein M (2000) Whole‐genome trees based on the occurrence of folds and orthologs: implications for comparing genomes on different levels. Genome Research 10(6): 808–818.

Maddison WP (1997) Gene trees in species trees. System Biology 46: 523–536.

Otu HH and Sayood K (2003) A new sequence distance measure for phylogenetic tree construction. Bioinformatics 19(16): 2122–2130.

Page RD and Charleston MA (1997) From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem. Molecular Phylogenetics and Evolution 7(2): 231–240.

Qi J, Wang B and Hao B‐I (2004) Whole proteome prokaryote phylogeny without sequence alignment: a K‐string composition approach. Journal of Molecular Evolution 58(1): 1–11.

Rokas A and Holland PWH (2000) Rare genomic changes as a tool for phylogenetics. Trends in Ecological Evolution 15(11): 454–459.

Rokas A, Williams BL, King N and Carroll SB (2003) Genome‐scale approaches to resolving incongruence in molecular phylogenies. Nature 425(6960): 798–804.

Saitou N and Nei M (1987) The neighbor‐joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4(4): 406–425.

Snel B, Bork P and Huynen MA (1999) Genome phylogeny based on gene content. Nature Genetics 21(1): 108–110.

Stechmann A and Cavalier‐Smith T (2002) Rooting the eukaryote tree by using a derived gene fusion. Science 297(5578): 89–91.

Steel M (2005) Should phylogenetic models be trying to “fit an elephant”? Trends in Genetics 21(6): 307–309.

Steel M and Rodrigo A (2008) Maximum likelihood supertrees. System Biology 57(2): 243–250.

Sturtevant AH and Dobzhansky T (1936) Inversions in the third chromosome of wild races of Drosophila pseudoobscura, and their use in the study of the history of the species. Proceedings of the National Academy of Sciences of the USA 22(7): 448–450.

Suchard MA (2005) Stochastic models for horizontal gene transfer: taking a random walk through tree space. Genetics 170(1): 419–431.

Wehe A, Bansal MS, Burleigh JG and Eulenstein O (2008) DupTree: a program for large‐scale phylogenetic analyses using gene tree parsimony. Bioinformatics 24(13): 1540–1541.

Wildman DE, Uddin M, Opazo JC et al. (2007) Genomics, biogeography, and the diversification of placental mammals. Proceedings of the National Academy of Sciences of the USA 104(36): 14395–14400.

Woese CR and Fox GE (1977) Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proceedings of the National Academy of Sciences of the USA 74(11): 5088–5090.

Yang Z (1996) Maximum‐likelihood models for combined analyses of multiple sequence data. Journal Molecular Evolution 42(5): 587–596.

Further Reading

Felsenstein J (2004) Inferring Phylogenies. Sinauer Associates.

Rannala B and Yang Z (2008) Phylogenetic inference using whole genomes. Annual Reviews in Genomics Human Genetics 9: 217–231.

Snel B, Huynen MA and Dutilh BE (2005) Genome trees and the nature of genome evolution. Annual Reviews in Microbiology 59: 191–209.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Boussau, Bastien(Sep 2009) Phylogenetic Relationships Deduced from Whole Genome Comparisons. In: eLS. John Wiley & Sons Ltd, Chichester. [doi: 10.1002/9780470015902.a0021784]