Evolution of Protein Domains


Analysis of protein structures reveals that they are made up from independent globular substructures known as protein domains. The fold that these domains assume is typically the same in evolutionarily related proteins. However, exceptions to this rule allow us to begin to determine the process by which novel folds can develop from ancestral folds and possibly even how the first folds came into existence. Various lines of research have shown that thermodynamic stability, designability, functional flexibility and structural drift all play important roles in shaping the distribution and variation of structural families in nature.

Keywords: protein evolution; protein structure; designability; the last universal common ancestor; protein domains; folds.

Figure 1.

The CATH hierarchy. The four major hierarchical levels in the CATH structural classification – (C)lass, (A)rchitecture, (T)opology or fold level and (H)omologous superfamily. Three of the most highly populated architectures in the classification are illustrated.

Figure 2.

Structure is not always more conserved than sequence. In this case, domains 1du2a00 (CATH code:; blue) and 1se7a00 (CATH code:; blue) superpose badly and have a low structural similarity (SSAP score = 55.48). However, the sequence alignment produced using the sequence alignment software MUSCLE shows clear sequence similarity between the domains (sequence identity=60%).

Figure 3.

Large insertions and conserved cores. Highlighted in red are two examples of domains from the (ATP)‐Grasp family. These examples vary significantly in size, with the largest example (left) containing many inserts and embellishments. Despite this variation, the cores are recognizably similar and the location of the active site (the yellow and green residues) appear to be conserved.

Figure 4.

Identifiable metabolic paths of LUCA using homology data derived from structure rather than sequence. It has been possible to derive a more complex view of the abilities of the LUCA. Sequence‐based approaches had only identified systems involved in information transfer (i.e. translation and transcription). Figure was adapted by Stathis Sedaris from Ranea J, Sillero A, Thornton J and Orengo C () Protein superfamily evolution and the Last Common Universal Ancestor. Journal of Molecular Biology 63(4): 513–525.



Andreeva A, Howarth D, Brenner SE et al. (2004) SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Research 32: D226–D229.

Chandonia JM and Brenner SE (2006) The impact of structural genomics: expectations and outcomes. Science 311: 347–351.

Grishin NV (2001) KH domain: one motif, two folds. Nucleic Acids Research 29: 638–643.

Hausrath AC and Goriely A (2006) Repeat proteins predicted by a continuous representation of protein space. Protein Science 15: 753–760.

Krishna SS and Grishin NV (2005) Structural drift: A possible path to protein change. Bioinformatics 5: 197.

Mirkin BG, Fenner TI, Galperin MY and Koonin EV (2003) Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evolutionary Biology 3: 2.

Pearl F, Todd A, Sillitoe I et al. (2005) The CATH domain structure database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis. Nucleic Acids Research 33: D247–D251.

Privalov PL and Khechinashvili NN (1974) A thermodynamic approach to the problem of stabilisation of globular protein structure: a calorimetric study. Journal of Molecular Biology 86: 665–684.

Ranea JA, Sillero A, Thornton JM and Orengo CA (2006) Protein superfamily evolution and the last common universal ancestor (LUCA). Journal of Molecular Biology 63: 513–525.

Reeves GA, Dallman TJ, Redfern OC et al. (2006) Structural diversity of domain superfamilies in the CATH database. Journal of Molecular Biology 360: 725–741.

Riechmann L and Winter G (2006) Early protein evolution: building domains from ligand‐binding polypeptide segments. Journal of Molecular Biology 363: 460–468.

Todd AE, Marsden RI, Thornton JM and Orengo CA (2005) Progress of structural genomics initiatives: an analysis of solved target structures. Journal of Molecular Biology 348: 1235–1260.

Wetlaufer DB (1973) Nucleation, rapid folding, and globular intrachain regions in proteins. Proceedings of the National Academy of Sciences of the USA 70: 697–701.

Zhang Y and Skolnick J (2005) The protein structure prediction problem could be solved using the current PDB library. Proceedings of the National Academy of Sciences of the USA 102: 1029–1034.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Yeats, Corin A, and Orengo, Christine A(Sep 2007) Evolution of Protein Domains. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0020202]