Phylogenomics of the Human Genome

Abstract

The availability of the complete sequence of the human genome has paved the way to investigate the evolution of our species from the perspective provided by each single gene encoded in our genome. Here I survey recent advances towards the reconstruction and analysis of the human phylome, the complete collection of molecular phylogenies reconstructed for each human protein‐coding gene. Initial analyses of such large phylogenomic dataset have not only served to produce a complete catalogue of orthologues and paralogues of human genes across 35 other eukaryotic species but also have revealed interesting evolutionary information. For instance, we could trace the history of duplication events in the lineages leading to our species and found that expansions of certain biological processes correlate with physiological adaptations experienced by our ancestors. Another interesting finding is the large degree of topological diversity found among different gene trees, emphasizing the difficulty of resolving certain evolutionary relationships.

Key concepts

  • Evolution by gene duplication. Gene duplication is one of the main processes by which a genome can acquire novel functions. Several processes such as retrotransposition of messenger RNAs and the duplication of segments of a chromosome, entire chromosomes and even whole genomes, provide duplicate copies of genomic regions that constitute the raw material where evolution can act to create novel functions. The fate of most duplicate genes is to be lost by pseudogenization. However, certain mutations may generate a novel functionality in one of the duplicates or render each duplicate specialized in one of the functions carried out by the ancestor. If these changes provide selective advantages, such mutations will be selected and both gene copies will be maintained. These duplications leave a footprint in the genome that we can interpret through phylogenetic analyses. In this way we can discover when each gene family experienced duplications. Through the functional analysis of the encoded functions of families expanded at different evolutionary periods we can infer what evolutionary innovations played a role.

  • Species phylogeny. The evolutionary relationship among a group of species is usually represented by species phylogeny or tree of life. Such phylogenies were historically reconstructed by grouping organisms based on morphological characters. With the advent of sequencing techniques, species phylogenies are inferred from the reconstructed evolutionary histories of genes shared by the species of interest. However, it has been found that different genes may provide conflicting topological arrangements of the species considered. Currently there is much research being done about how the information contained in entire genomes can be used to resolve particularly conflicting evolutionary relationships.

  • Orthology prediction. Reliable orthology prediction, that is finding correspondences among genes in different genomes, is central to comparative genomics. Most automated methods to predict orthology are based on pair‐wise comparisons, for instance finding pairs of genes in different genomes that are, reciprocally, most similar to each other. However, orthology is defined by phylogenetic criteria and therefore inspecting the evolutionary history of a gene family would be the most appropriate way to unravel orthology relationships. Automated phylogeny‐based orthology prediction has recently emerged as a feasible alternative for genome‐wide studies. The species‐overlap algorithm used in this study was specifically designed to cope with the large topological diversity found among gene trees.

Keywords: phylogenomics; human genome; phylogenetics; orthology; phylome

Figure 1.

Schematic representation of a phylogenetic pipeline (green rectangles). To build a phylome from a seed genome, this phylogenetic pipeline should be repeated for each protein encoded in that phylome. Each seed protein is used as a query to search for putative homologues among the proteins encoded in a set of predefined fully sequenced genomes. After filtering out spurious hits, each group of sequences is aligned using a multiple alignment program, and the resulting alignment is trimmed to only conserve the most informative columns. Finally, such alignments derived from every seed sequence are used to infer high‐quality phylogenies including the estimation of gamma parameters and evolutionary model testing steps. The text on the left side summarizes some of the main programs (Capella‐Gutíerrez et al., ; Castresana, ; Guindon and Gascuel, ; Ronquist and Huelsenbeck, ; Smith and Waterman, ) that can be used to automatize each step in the pipeline as well as some of the most important parameters and decisions that should be considered.

Figure 2.

Phylogenetic tree representing the evolutionary relationships of p53 and related proteins. Shadowed boxes indicate vertebrate members of the p53, p73 and p73L subfamilies. Duplication nodes are inferred when they have two daughter partitions that contain common species; they are marked with a red or orange circle. The rest of the nodes represent speciation events. Orthologous relationships are inferred between all proteins within a shadowed box and these and the two C. intestinalis proteins. Adapted from Gabaldon .

Figure 3.

Estimates for the number of duplication events occurred at each major transition in the eukaryote lineages leading to humans. Horizontal bars indicate the average number of duplications per gene. Boxes on the right list some of the Gene Ontology terms of the biological process category that are significantly over‐represented in the set of gene families duplicated at a certain stage. With permission from Huerta‐Cepas et al..

Figure 4.

The three alternative phylogenetic relationships among arthropods, chordates and nematodes. The pie chart indicates the fraction of trees in the human phylome that supports each alternative evolutionary scenario. Pictures (from left to right) represent Soybean cyst nematode (Heterodera sp.), a European honey bee (Apis mellifera) and the toad Xenopus laevis. All pictures are in the public domain and have been downloaded from Wikipedia (www.wikipedia.org).

close

References

Berglund‐Sonnhammer AC, Steffansson P, Betts MJ and Liberles DA (2006) Optimal gene trees from sequences and species trees using a soft interpretation of parsimony. Journal of Molecular Evolution 63: 240–250.

Capella‐Gutíerrez S, Silla‐Martínez J and Gabaldón T (2009) TrimAl: a tool for automated alignment trimming in large‐scale phylogenetic analyses. Bioinformatics 25(15): 1972–1973.

Castresana J (2000) Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Molecular Biology and Evolution 17: 540–552.

Castresana J (2007) Topological variation in single‐gene phylogenetic trees. Genome Biology 8: 216.

Dagan T and Martin W (2006) The tree of one percent. Genome Biology 7: 118.

Delsuc F, Brinkmann H and Philippe H (2005) Phylogenomics and the reconstruction of the tree of life. Nature Reviews. Genetics 6: 361–375.

Eisen JA (1998) Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Research 8: 163–167.

Fitch WM (1970) Distinguishing homologous from analogous proteins. Systematic Zoology 19: 99–113.

Flicek P, Aken BL, Beal K et al. (2008) Ensembl 2008. Nucleic Acids Research 36: D707–D714.

Gabaldón T (2005) Evolution of proteins and proteomes, a phylogenetics approach. Evolutionary Bioinformatics Online 1: 51–56.

Gabaldon T (2008) Large‐scale assignment of orthology: back to phylogenetics? Genome Biology 9: 235.

Gabaldón T and Huynen MA (2003) Reconstruction of the proto‐mitochondrial metabolism. Science 301: 609.

Gabaldón T and Huynen MA (2005) Lineage‐specific gene loss following mitochondrial endosymbiosis and its potential for function prediction in eukaryotes. Bioinformatics 21(suppl 2): ii144–ii150.

Gabaldon T and Huynen MA (2007) From endosymbiont to host‐controlled organelle: the hijacking of mitochondrial protein synthesis and metabolism. PLoS Computation Biology 3: e219.

Gabaldón T, Marcet‐Houben M and Huerta‐Cepas J (2008) Reconstruction and analysis of large‐scale phylogenetic data, challenges and opportunities. New York: Nova Sciences Publishers.

Guindon S and Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic Biology 52: 696–704.

van der Heijden RT, Snel B, van Noort V and Huynen MA (2007) Orthology prediction at scalable resolution by phylogenetic tree analysis. BMC Bioinformatics 8: 83.

Huerta‐Cepas J, Dopazo H, Dopazo J and Gabaldón T (2007) The human phylome. Genome Biology 8: R109.

Keeling PJ, Burger G, Durnford DG et al. (2005) The tree of eukaryotes. Trends in Ecological Evolution 20: 670–676.

Lander ES, Linton LM, Birren B et al. (2001) Initial sequencing and analysis of the human genome. Nature 409: 860–921.

Ohno S (1970) Evolution by Gene Duplication. London: Allen & Unwin.

Page RD and Charleston MA (1997) From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem. Molecular Phylogenetics and Evolution 7: 231–240.

Rokas A and Carroll SB (2006) Bushes in the tree of life. PLoS Biology 4: e352.

Ronquist F and Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19: 1572–1574.

Sicheritz‐Ponten T and Andersson SG (2001) A phylogenomic approach to microbial evolution. Nucleic Acids Research 29: 545–552.

Smith TF and Waterman MS (1981) Identification of common molecular subsequences. Journal of Molecular Biology 147: 195–197.

Venter JC, Adams MD, Myers EW et al. (2001) The sequence of the human genome. Science 291: 1304–1351.

Zmasek CM and Eddy SR (2001) A simple algorithm to infer gene duplication and speciation events on a gene tree. Bioinformatics 17: 821–828.

Further Reading

Salemi M and Vandamme AM (2003) The Phylogenetic Handbook: A Practical Approach to DNA and Protein Phylogeny. Cambridge: Cambridge University Press.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Gabaldón, Toni(Dec 2009) Phylogenomics of the Human Genome. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0021555]