Protein Structure Prediction and Databases


Three‐dimensional structures of proteins are the key to understanding their molecular function. Most reliably protein structures are determined by experiment. Recent advances in experimental techniques have lead to a large increase in numbers of both protein sequences and 3D structures. Yet, the number of experimentally resolved proteins 3D structures is three orders of magnitude lower than that of sequences. This calls for computer support of protein structure prediction. Today several databases complement the comparatively small set of experimentally resolved protein structures with much larger sets of protein models generated by computer.

Key Concepts:

  • Protein structure prediction relies heavily on the experimental data on protein structures; the volume of such data is the prime determinant for the quality of protein structure predictions.

  • The three major types of methods for protein structure prediction are homology, or template‐based modelling; fold recognition, or threading; de novo, or ab initio prediction.

  • Homology modelling is the most reliable class of methods, but require experimental knowledge of a structure of a homologous – and thus structurally similar – protein, called the template.

  • Sensitive sequence similarity search tools are used for detection of potential templates.

  • The protein structure is modelled step‐wise: (1) aligning the target protein to the template, (2) placing the aligned target residues onto their respective template residues, (3) placing the side chains of nonconserved residues, healing backbone breaks and modelling loops that form gaps in the alignment, and (4) refining the model.

  • The two most popular computational tools for homology modelling are MODELLER and SWISS‐MODEL; the two protein model databases based on them are ModBase and the SWISS‐MODEL Repository, respectively.

  • The Protein Modelling Portal unites data from these and other databases, and provides an independent system for model evaluation called CAMEO.

Keywords: protein structure; protein structure prediction; protein structure databases; structural genomics; protein structure modelling

Figure 1.

Protein sequence space is depicted in two dimensions for all proteins containing up to 1000 amino acids. There are over 101300 such protein sequences. Islands of proteins capable of adopting a unique and stable fold are depicted in pink, protein clusters occurring in nature are depicted by violet circles.

Figure 2.

Structural superposition of a homology model of BdcA from E.coli (orange, UniProt ID P39333) with the structural template used for its creation (troponine reductase TR‐1 from D.stramonium, blue, PDB ID 1AE1, sequence identity between template and target 30%). The structurally dissimilar regions are indicated with stars.



Benkert P, Biasini M and Schwede T (2011) Toward the estimation of the absolute quality of individual protein structure models. Bioinformatics 27(3): 343–350.

Haas J, Roth S, Arnold K et al. (2013) The protein model portal – a comprehensive resource for protein structure and model information. Database (Oxford) 2013: bat031.

Hildebrand A, Remmert M, Biegert A and Söding J (2009) Fast and accurate automatic structure prediction with HHpred. Proteins 77(Suppl 9): 128–132.

Jenkinson AM, Albrecht M, Birney E et al. (2008) Integrating biological data – the Distributed Annotation System. BMC Bioinformatics 9(Suppl 8): S3.

Kiefer F, Arnold K, Künzli M, Bordoli L and Schwede T (2009) The SWISS‐MODEL repository and associated resources. Nucleic Acids Research 37(Database issue): D387–D392.

Kryshtafovych A, Fidelis K and Moult J (2014) CASP10 results compared to those of previous CASP experiments. Proteins 82(Suppl 2): 164–174.

Peitsch MC (1996) ProMod and Swiss‐Model: Internet‐based tools for automated comparative protein modeling. Biochemical Society Transactions 24(1): 274–279.

Pieper U, Webb BM, Barkan DT et al. (2011) ModBase, a database of annotated comparative protein structure models, and associated resources. Nucleic Acids Research 39(Database issue): D465–D474.

Remmert M, Biegert A, Hauser A and Söding J (2011) HHblits: lightning‐fast iterative protein sequence searching by HMM‐HMM alignment. Nature Methods 9(2): 173–175.

Sali A and Blundell TL (1993) Comparative protein modeling by satisfaction of spatial restraints. Journal of Molecular Biology 234(3): 779–815.

Söding J, Biegert A and Lupas AN (2005) The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Research 33(Web Server issue): W244–W248.

Venselaar H, Joosten RP, Vroling B et al. (2010) Homology modeling and spectroscopy, a never‐ending love story. European Biophysics Journal 39(4): 551–563.

Vroling B, Sanders M, Baakman C et al. (2011) GPCRDB: information system for G protein‐coupled receptors. Nucleic Acids Research 39(Database issue): D309–D319.

Further reading

Biasini M, Bienert S, Waterhouse A et al. (2014) SWISS‐MODEL: modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Research 42(Web Server issue): W252–W258.

Cooper S, Khatib F, Treuille A et al. (2010) Predicting protein structures with a multiplayer online game. Nature 466: 756–760.

Edwards YJ and Cottage A (2003) Bioinformatics methods to predict protein structure and function. Molecular Biotechnology 23: 139–166.

Gough J (2002) The SUPERFAMILY database in structural genomics. Acta Crystallographica 58: 1897–1900

Gu J and Bourne P (eds.) (2009) Homology modeling fold recognition methods, De novo protein structure prediction: methods and applications. In: Structural Bioinformatics, 2nd edn. Hoboken: John Wiley & Sons, Inc..

Lees JG, Lee D, Studer RA et al. (2013) Gene3D: Multi‐domain annotations for protein sequence and comparative genome analysis. Nucleic Acids Research 42(D1): D240–D245.

Lewis TE, Sillitoe I, Andreeva A et al. (2013) Genome3D: a UK collaborative project to annotate genomic sequences with predicted 3D structures based on SCOP and CATH domains. Nucleic Acids Research 41(D1): D499–D507.

Roy A, Kucukural A and Zhang Y (2010) I‐TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols 725–738.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Kalinina, Olga V, and Lengauer, Thomas(Sep 2014) Protein Structure Prediction and Databases. In: eLS. John Wiley & Sons Ltd, Chichester. [doi: 10.1002/9780470015902.a0006214.pub2]