Protein Structure Classification

To understand and map the universe of protein structures, it is necessary to collate, annotate and classify these structures in a rational scheme. The different approaches that have been taken to tackle this problem include the identification of protein domains, phylogenetic and phenetic classification and hierarchical and nearest-neighbour clustering. Powerful new sequence searching methods enable structural assignments to be allocated to genomic data.

Keywords: protein structure classification; common folds; protein architecture; structural comparison; genomes

Figure 1. Schematic representation of the Class (C), Architecture (A) and Topology/fold (T) levels in the CATH database.
Figure 2. CATHerine wheel plot showing the distribution of nonhomologous structures (i.e. a single representative from each homologous superfamily (H-level) in CATH) among the different classes (C), architectures (A), and fold families (T) in the CATH database. Protein classes are shown pink (mainly ), yellow (mainly ) and green (). Within each class, the angle subtended for a given segment reflects the proportion of structures within the identified architectures (inner circle) or fold families (outer circle). The superfold families are indicated in paler colours and illustrated with a MOLSCRIPT drawing of a representative from the family.
Figure 3. Schematic showing the difference between shear (sliding) and hinge motions.
close
 References
    Altschul SF, Madden TL, Schaffer AA et al. (1997) Gapped BLAST and PSI BLAST: a new generation of protein database search programs. Nucleic Acids Research 25: 3389–3402.
    Andreeva A and Murzin AG (2006) Evolution of protein fold in the presence of functional constraints. Current Opinion in Structural Biology 16: 399–408.
    Berman HM, Westbrook J, Feng Z et al. (2000) The Protein Data Bank. Nucleic Acids Research 28: 235–242.
    Brenner SE, Chothia C and Hubbard TJP (1997) Population statistics of protein structures: lessons from structural classifications. Current Opinion in Structural Biology 7: 369–376.
    Chothia C (1992) One thousand families for the molecular biologist. Nature 357: 543–544.
    Coulson AF and Moult J (2002) A unifold, mesofold, and superfold model of protein fold use. Proteins 46: 61–71.
    Flores S, Echols N, Milburn D et al. (2006) The database of macromolecular motions: new features added at the decade mark. Nucleic Acids Research 34: D296–D301.
    Gerstein M and Krebs W (1998) A database of macromolecular motions. Nucleic Acids Research 26: 4280–4290.
    Grishin NV (2001) Fold change in evolution of protein structures. Journal of Structural Biology 134: 167–185.
    Grishin NV and Krishna SS (2005) Structural drift: a possible path to protein fold change. Bioinformatics 21: 1308–1310.
    Hadley C and Jones DT (1999) A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. Structure 7: 1099–1112.
    Harrison A, Pearl F, Mott R, Thornton J and Orengo C (2002) Quantifying similarities within fold space. Journal of Molecular Biology 323: 909–926.
    Hogue CWV, Ohkawa H and Bryant SH (1996) WWW-Entrez and the molecular modelling database. Trends in Biochemical Sciences 21: 226–229.
    Holm L and Sander C (1993a) Protein structure comparison by alignment of distance matrices. Journal of Molecular Biology 233: 123–138.
    Holm L and Sander C (1993b) Parser for folding units. Proteins Structure Function and Genetics 19: 256–268.
    Holm L and Sander C (1996) Mapping the protein universe. Science 2: 595–603.
    Holm L and Sander C (1997) Dali/FSSP classification of three-dimensional protein folds. Nucleic Acids Research 25: 231–234.
    Holm L and Sander C (1998) Dictionary of recurrent domains in protein structures. Proteins 33: 88–96.
    Hubbard SJ and Argos P (1996) A functional role for cavities in domain: domain motions. Journal of Molecular Biology 261: 289–300.
    Islam SA, Luo J and Sternberg MJE (1995) Identification and analysis of domains in proteins. Protein Engineering 8: 513–525.
    Jones S, Stewart M, Michie A et al. (1998) Domain assignment for protein structures using a consensus approach: characterization and analysis. Protein Science 7: 233–242.
    Karplus K, Barrett C and Hughey R (1998) Hidden Markov Models (HMMs) for detecting remote homologies. Bioinformatics 14: 846–856.
    Kinch LN and Grishin NV (2002) Evolution of protein structures and functions. Current Opinion in Structural Biology 12: 400–408.
    Lee D, Grant A, Marsden RL and Orengo C (2005) Identification and distribution of protein families in 120 completed genomes using Gene3D. Proteins 59: 603–615.
    Levitt M and Chothia C (1976) Structural patterns in globular proteins. Nature 261: 552–558.
    Michie AD, Orengo CA and Thornton JM (1996) Analysis of domain structural class using an automated class assignment protocol. Journal of Molecular Biology 262: 168–185.
    Mizuguchi K, Deane CA, Blundell TL and Overington JP (1998) HOMSTRAD: a database of protein structure alignments for homologous families. Protein Science 7: 356–540.
    Murzin AG, Brenner SE, Hubbard T and Chothia C (1995) SCOP: a structural classification of the protein database for the investigation of sequences and structures. Journal of Molecular Biology 247: 536–540.
    Orengo CA, Jones DT and Thornton JM (1994) Protein superfamilies and domain superfolds. Nature 372: 631–634.
    Orengo CA, Michie AD, Jones S et al. (1997) CATH – a hierarchic classification of protein domain structures. Structure 5: 1093–1108.
    Pearl FM, Bennett CF, Bray JE et al. (2003) The CATH database: an extended protein family resource for structural and functional genomics. Nucleic Acids Research 31: 452–455.
    Qi G, Lee R and Hayward S (2005) A comprehensive and non-redundant database of protein domain movements. Bioinformatics 21: 2832–2838.
    Reeves GA, Dallman TJ, Redfern OC, Akpor A and Orengo CA (2006) Structural diversity of domain superfamilies in the CATH database. Journal of Molecular Biology 360: 725–741.
    Siddiqui AS and Barton GJ (1995) Continuous and discontinuous domains: an algorithm for the automatic generation of reliable protein domain definitions. Protein Science 4: 872–884.
    Swindells MB (1995) A procedure for detecting structural domains in proteins. Protein Science 4: 103–112.
    Taylor WR and Orengo CA (1989) Protein structure alignment. Journal of Molecular Biology 208: 1–22.
    Thornton JM, Orengo CA, Todd AE and Pearl FMG (1999) Protein folds, functions and evolution. Journal of Molecular Biology 293: 333–342.
    Yeats C, Maibaum M, Marsden R et al. (2006) Gene3D: modelling protein structure, function and evolution. Nucleic Acids Research 34: D281–D284.
 Further Reading
    book Branden C and Tooze J (1999) Introduction to Protein Structure, 2nd edn. New York: Garland Publishing.
    book Bourne PE and Weissig H (eds) (2003) Structural Bioinformatics. Hoboken, NJ: Wiley-Liss.
    book Orengo CA, Thornton JM and Jones DT (eds) (2003) Bioinformatics: Genes, Proteins & Computers. Oxford: BIOS Scientific.
    book Schulz GE and Schirmer RH (1979) Principles of Protein Structure. New York: Springer-Verlag.
Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Pearl, Frances MG, Orengo, Christine A, and Thornton, Janet M(Sep 2007) Protein Structure Classification. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0003033.pub2]