Protein Structure Prediction


The goal of protein structure prediction is to estimate the spatial position of every atom of protein molecules from the amino acid sequence by computational methods. Depending on the availability of homologous templates in the PDB library, structure prediction approaches are categorised into template‐based modelling (TBM) and free modelling (FM). While TBM is by far the only reliable method for high‐resolution structure prediction, challenges in the field include constructing the correct folds without using template structures and refining the template models closer to the native state when templates are available. Nevertheless, the usefulness of various levels of protein structure predictions have been convincingly demonstrated in biological and medical applications.

Key Concepts:

  • Evolution is a general principle to guide protein structure and function predictions.

  • Proteins of similar sequence have similar 3D structure.

  • Function of protein is decided by the 3D structure.

  • TBM using homologous templates has the highest accuracy.

  • Template structure can be refined by combining multiple templates.

  • Current physics‐based ab initio folding can only fold small proteins.

  • Threading is an efficient tool for detecting distantly homologous templates.

  • Membrane protein structure prediction is challenging due to the lack of templates.

  • Disordered regions exist in protein which does not possess stable structure but has important function implications.

Keywords: ab initio folding; fold recognition; comparative modelling; structure‐based function annotation; membrane protein; CASP

Figure 1.

Pipeline of a typical composite protein structure prediction approach.

Figure 2.

An example of the template‐based modelling by I‐TASSER server for PAS domain from Burkholderia thailandensis (PDBID: 3mqo). (a) Initial target model built by copying Cα coordinates from a nonhomology template (PDBID: 3lyx) identified by MUSTER, which contains multiple gaps; (b) full‐length model constructed by the I‐TASSER Monte Carlo assembly simulations; (c) final atomic structural model after atomic structural refinement. The grey background cartoon shows the X‐ray structure.

Figure 3.

Approximate correspondence of the structure prediction algorithms, model accuracy, and the biological usefulness.

Figure 4.

Predicted protein–ligand complexes using I‐TASSER and BSP‐SLIM in GPCR‐Dock 2010. (a) Dopamine D3/eticlopride complex; (b) CXCR4 chemokine receptor with compound IT1t; and (c) CXCR receptor with peptide CVX15. The native ligand binding pose is shown in green and the predicted ligand pose in red.



Anfinsen CB (1973) Principles that govern the folding of protein chains. Science 181: 223–230.

Arakaki AK, Zhang Y and Skolnick J (2004) Large scale assesment of the utility of low resolution protein structures for biochemical function assignment. Bioinformatics 20: 1087–1096.

Barth P, Wallner B and Baker D (2009) Prediction of membrane protein structures with complex topologies using limited constraints. Proceedings of the National Academy of Sciences of the USA 106: 1409–1414.

Berman HM, Westbrook J, Feng Z et al. (2000) The Protein Data Bank. Nucleic Acids Research 28: 235–242.

Bowie JU, Luthy R and Eisenberg D (1991) A method to identify protein sequences that fold into a known three‐dimensional structure. Science 253: 164–170.

Bradley P, Misura KM and Baker D (2005) Toward high‐resolution de novo structure prediction for small proteins. Science 309: 1868–1871.

Cherezov V, Rosenbaum DM, Hanson MA et al. (2007) High‐resolution crystal structure of an engineered human beta2‐adrenergic G protein‐coupled receptor. Science 318: 1258–1265.

Cozzetto D, Kryshtafovych A and Tramontano A (2009) Evaluation of CASP8 model quality predictions. Proteins 77(suppl. 9): 157–166.

Das R, Qian B, Raman S et al. (2007) Structure prediction for CASP7 targets using extensive all‐atom refinement with Rosetta@home. Proteins 69: 118–128.

Dunker AK, Lawson JD, Brown CJ et al. (2001) Intrinsically disordered protein. Journal of Molecular Graphics and Modelling 19: 26–59.

Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14: 755–763.

Fischer D (2006) Servers for protein structure prediction. Current Opinion in Structural Biology 16: 178–182.

Ginalski K, Elofsson A, Fischer D and Rychlewski L (2003) 3D‐Jury: a simple approach to improve protein structure predictions. Bioinformatics 19: 1015–1018.

Giorgetti A, Raimondo D, Miele AE and Tramontano A (2005) Evaluating the usefulness of protein structure models for molecular replacement. Bioinformatics 21(suppl. 2): ii72–ii76.

He B, Wang K, Liu Y et al. (2009) Predicting intrinsic disorder in proteins: an overview. Cell Research 19: 929–949.

Holm L and Sander C (1991) Database algorithm for generating protein backbone and side‐chain co‐ordinates from a C alpha trace application to model building and detection of co‐ordinate errors. Journal of Molecular Biology 218: 183–194.

Jauch R, Yeo HC, Kolatkar PR and Clarke ND (2007) Assessment of CASP7 structure predictions for template free targets. Proteins 69: 57–67.

Krogh A, Larsson B, von Heijne G and Sonnhammer EL (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. Journal of Molecular Biology 305: 567–580.

Kryshtafovych A, Fidelis K and Tramontano A (2011) Evaluation of model quality predictions in CASP9. Proteins 79(suppl. 10): 91–106.

Kufareva I, Rueda M, Katritch V, Stevens RC and Abagyan R (2011) Status of GPCR modeling and docking as reflected by community‐wide GPCR Dock 2010 assessment. Structure 19: 1108–1126.

Li Y and Zhang Y (2009) REMO: A new protocol to refine full atomic protein models from C‐alpha traces by optimizing hydrogen‐bonding networks. Proteins 76: 665–676.

Liwo A, Khalili M, Czaplewski C et al. (2007) Modification and optimization of the united‐residue (UNRES) potential energy function for canonical simulations. I. Temperature dependence of the effective energy function and tests of the optimization method with single training proteins. Journal of Physical Chemistry B 111: 260–285.

MacCallum JL, Hua L, Schnieders MJ et al. (2009) Assessment of the protein‐structure refinement category in CASP8. Proteins 77(suppl. 9): 66–80.

Malmstrom L, Riffle M, Strauss Charlie EM et al. (2007) Superfamily assignments for the yeast proteome through integration of structure prediction with the gene ontology. PLoS Biology 5: e76.

Marti‐Renom MA, Stuart AC, Fiser A et al. (2000) Comparative protein structure modeling of genes and genomes. Annual Review of Biophysics and Biomolecular Structure 29: 291–325.

McGuffin LJ (2008) Intrinsic disorder prediction from the analysis of multiple protein fold recognition models. Bioinformatics 24: 1798–1804.

Monastyrskyy B, Fidelis K, Moult J, Tramontano A and Kryshtafovych A (2011) Evaluation of disorder predictions in CASP9. Proteins 79(suppl. 10): 107–118.

Moult J, Fidelis K, Kryshtafovych A, Rost B and Tramontano A (2009) Critical assessment of methods of protein structure prediction‐Round VIII. Proteins: Structure, Function, and Bioinformatics 77: 1–4.

Needleman SB and Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48: 443–453.

Palczewski K, Kumasaka T, Hori T et al. (2000) Crystal structure of rhodopsin: A G protein‐coupled receptor. Science 289: 739–745.

Pellegrini‐Calace M, Carotti A and Jones DT (2003) Folding in lipid membranes (FILM): a novel method for the prediction of small membrane protein 3D structures. Proteins 50: 537–545.

Read RJ and Chavali G (2007) Assessment of CASP7 predictions in the high accuracy template‐based modeling category. Proteins 69(suppl. 8): 27–37.

Romero Obradovic and Dunker K (1997) Sequence data analysis for long disordered regions prediction in the Calcineurin family. Genome Inform Ser Workshop Genome Inform 8: 110–124.

Rotkiewicz P and Skolnick J (2008) Fast procedure for reconstruction of full‐atom protein models from reduced representations. Journal of Computational Chemistry 29: 1460–1465.

Roy A and Zhang Y (2012) Recognizing protein‐ligand binding sites by global structural alignment and local geometry refinement. Structure 20: 987–997.

Roy A, Kucukural A and Zhang Y (2010) I‐TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols 5: 725–738.

Roy A, Srinivasan N and Gowri VS (2009) Molecular and structural basis of drift in the functions of closely related homologous enzyme domains: implications for function annotation based on homology searches and structural genomics. In Silico Biology 9: S41–S55.

Rychlewski L, Jaroszewski L, Li W and Godzik A (2000) Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Science 9: 232–241.

Schlessinger A, Geier E, Fan H et al. (2011) Structure‐based discovery of prescription drugs that interact with the norepinephrine transporter, NET. Proceedings of the National Academy of Sciences of the USA 108: 15810–15815.

Simons KT, Kooperberg C, Huang E and Baker D (1997) Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. Journal of Molecular Biology 268: 209–225.

Skolnick J (2006) In quest of an empirical potential for protein structure prediction. Current Opinion in Structural Biology 16: 166–171.

Skolnick J, Kihara D and Zhang Y (2004) Development and large scale benchmark testing of the PROSPECTOR 3.0 threading algorithm. Protein 56: 502–518.

Soding J (2005) Protein homology detection by HMM‐HMM comparison. Bioinformatics 21: 951–960.

Summa CM and Levitt M (2007) Near‐native structure refinement using in vacuo energy minimization. Proceedings of the National Academy of Sciences of the USA 104: 3177–3182.

Tramontano A and Morea V (2003) Assesment of homology based predictions in CASP 5. Proteins 53: 352–368.

Wallner B and Elofsson A (2007) Prediction of global and local model quality in CASP7 using Pcons and ProQ. Proteins 69: 184–193.

Wu S and Zhang Y (2008) MUSTER: Improving protein sequence profile‐profile alignments by using multiple sources of structure information. Proteins 72: 547–556.

Wu ST and Zhang Y (2007) LOMETS: A local meta‐threading‐server for protein structure prediction. Nucleic Acids Research 35: 3375–3382.

Xu D and Zhang Y (2012) Ab Initio Protein Structure Assembly Using Continuous Structure Fragments and Optimized Knowledge‐based Force Field, Proteins 80: 1715–1735.

Xu D, Zhang J, Roy A and Zhang Y (2011) Automated protein structure modeling in CASP9 by I‐TASSER pipeline combined with QUARK‐based ab initio folding and FG‐MD‐based structure refinement. Proteins 79(suppl. 10): 147–160.

Xu Y, Xu D, Crawford OH et al. (1999) Protein threading by PROSPECT: a prediction experiment in CASP3. Protein Engineering 12: 899–907.

Yang Y, Faraggi E, Zhao H and Zhou Y (2011) Improving protein fold recognition and template‐based modeling by employing probabilistic‐based matching between predicted one‐dimensional structural properties of query and corresponding native properties of templates. Bioinformatics 27: 2076–2082.

Zhang J, Liang Y and Zhang Y (2011) Atomic‐level protein structure refinement using fragment‐guided molecular dynamics conformation sampling. Structure 19: 1784–1795.

Zhang J, Wang Q, Barz B et al. (2010) MUFOLD: A new solution for protein 3D structure prediction. Proteins 78: 1137–1152.

Zhang Y (2007) Template‐based modeling and free modeling by I‐TASSER in CASP7. Proteins 69: 108–117.

Zhang Y (2009) Protein structure prediction: when is it useful? Current Opinion in Structural Biology 19: 145–155.

Zhang Y and Skolnick J (2004a) SPICKER: A clustering approach to identify near‐native protein folds. Journal of Computational Chemistry 25: 865–871.

Zhang Y and Skolnick J (2004b) Tertiary structure predictions on a comprehensive benchmark of medium to large size proteins. Biophysical Journal 87: 2647–2655.

Zhang Y, Devries ME and Skolnick J (2006) Structure modeling of all identified G protein‐coupled receptors in the human genome. PLoS Computational Biology 2: e13.

Zhang Y, Kihara D and Skolnick J (2002) Local energy landscape flattening: Parallel hyperbolic Monte Carlo sampling of protein folding. Proteins 48: 192–201.

Zhang Y, Kolinski A and Skolnick J (2003) TOUCHSTONE II: A new approach to ab initio protein structure prediction. Biophysical Journal 85: 1145–1164.

Zhou H, Pandit SB, Lee SY et al. (2007) Analysis of TASSER‐based CASP7 protein structure prediction results. Proteins 69(suppl. 8): 90–97.

Further Reading

Baker D and Sali A (2001) Protein structure prediction and structural genomics. Science 294: 93–96.

Elofsson A and von Heijne G (2007) Membrane protein structure: prediction versus reality. Annual Review of Biochemistry 76: 125–140.

Fink AL (2005) Natively unfolded proteins. Current Opinion in Structural Biology 15(1): 35–41.

Zhang Y and Skolnick J (2005) The protein structure prediction problem could be solved using the current PDB library. Proceedings of the National Academy of Sciences of the USA 102(4): 1029–1034.

Zhang Y (2008) Progress and challenges in protein structure prediction. Current Opinion in Structural Biology 18: 342–348.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Roy, Ambrish, and Zhang, Yang(Aug 2012) Protein Structure Prediction. In: eLS. John Wiley & Sons Ltd, Chichester. [doi: 10.1002/9780470015902.a0003031.pub2]