Protein Homology Modelling

Abstract

Protein structure prediction aims to model the three‐dimensional (3D) structure of so far structurally uncharacterised proteins from their amino acid sequence. Motivated by the observation that homologous proteins with related amino acid sequences have similar 3D structures, protein homology modelling uses comparative methods to generate models for a target protein based on one or more related proteins with known 3D structure. The coordinates of the model are generated based on alignments between the target's and template's amino acid sequences, which define the correspondence between residues in both proteins. Ultimately, the quality of a computational model determines its usefulness for specific biomedical applications. Therefore, model quality estimation methods are used to identify unreliable or erroneous regions in the resulting models, and to estimate the overall accuracy of a model. Homology modelling (or comparative modelling) is currently the most accurate computational method available to routinely generate models of sufficient quality for various applications in life science research. Comparative protein modelling methods have been completely automated in recent years, and several Internet servers offer protein modelling services which are reliable and easy to use – also for the nonexpert in computational biology.

Key Concepts:

  • Protein structure prediction aims to model the three‐dimensional structure of so far structurally uncharacterised proteins (‘target’) based on their amino acid sequence.

  • Homologous proteins with related amino acid sequences have similar three‐dimensional structures.

  • Protein homology modelling uses information from one or more related proteins with known three‐dimensional structure (‘template’) to generate models for the target protein.

  • Sensitive sequence searching methods are applied to identify template proteins with known structures in large databases.

  • An alignment between the target's and template's amino acid sequences describes the correspondence between residues in both proteins.

  • The coordinates of the model are constructed by extracting positional information from the corresponding structural template.

  • Segments of the target protein not covered by template information (e.g. insertions/deletion in the alignment) have to be constructed using de novo modelling methods.

  • Model quality estimation methods are used to identify unreliable or erroneous regions in the resulting models.

  • Ultimately, the quality of a structural model determines its usefulness for specific biomedical applications.

Keywords: protein structure prediction; protein structure modelling; bioinformatics; computational structural biology; structural genomics; homology modelling; functional genomics

Figure 1.

Historical view on the structural coverage of the Escherichia coli proteome by experimental structures and homology models. The plot shows in a retrospective analysis which structure information – experimental structures or models of various levels of target–template sequence identity – was available for the residues in the proteome of the model organism E. coli at a given point in time (Guex et al., ).

Figure 2.

Schematic homology modelling workflow. The flowchart illustrates the classical steps to construct a homology model. Starting from the sequence of the target protein, one or more related structures (templates) are identified (template selection) within a template library. Target and template sequences are aligned (target–template alignment) and one or more alternative models are then constructed based on alternative target–template alignments. Finally the quality of the obtained models is estimated to rank models by their expected accuracy. If necessary, the procedure can be repeated by selecting different templates and creating alternative target–template alignments until a satisfactory result is obtained.

Figure 3.

Model accuracy and modelling errors. In this retrospective analysis, a model of the PAS domain of a transcriptional regulator in the LuxR family from Burkholderia thailandensis (cartoon representation) is shown in comparison with its experimental control structure (3MQO chain B, gray tubes). The model was based on the experimental structure of the homologues protein CPS_1291 from Colwellia psychrerythraea as template (3LYX chain B), sharing an overall 14% sequence identity. In this particular example, the overall fold has been modelled correctly with only one loop deviating significantly from the control structure, illustrating a successful of template‐based model despite very low sequence identity. The model has been generated as part of the 9th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction CASP9 (http://predictioncenter.org/casp9/) in the category of template‐based modelling.

close

References

Altschul SF, Madden TL, Schaffer AA et al. (1997) Gapped BLAST and PSI‐BLAST: a new generation of protein database search programs. Nucleic Acids Research 25(17): 3389–3402.

Armougom F, Moretti S, Poirot O et al. (2006) Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D‐Coffee. Nucleic Acids Research 34(Web Server issue): W604–W608.

Arnold K, Kiefer F, Kopp J et al. (2009) The Protein Model Portal. Journal of Structural and Functional Genomics 10(1): 1–8.

Barkan DT, Hostetter DR, Mahrus S et al. (2010) Prediction of protease substrates using sequence and structure features. Bioinformatics 26(14): 1714–1722.

Battistini S, Ricci C, Lotti EM et al. (2010) Severe familial ALS with a novel exon 4 mutation (L106F) in the SOD1 gene. Journal of the Neurological Sciences 293(1‐2): 112–115.

Baud O, Etter S, Spreafico M et al. (2011) The mouse eugenol odorant receptor: structural and functional plasticity of a broadly tuned odorant binding pocket. Biochemistry 50(5): 843–853.

Benkert P, Biasini M and Schwede T (2011) Toward the estimation of the absolute quality of individual protein structure models. Bioinformatics 27(3): 343–350.

Benkert P, Kunzli M and Schwede T (2009) QMEAN server for protein model quality estimation. Nucleic Acids Research 37(Web Server issue): W510–W514.

Berman H, Henrick K, Nakamura H and Markley JL (2007) The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Research 35(Database issue): D301–D303.

Berman HM, Westbrook JD, Gabanyi MJ et al. (2009) The protein structure initiative structural genomics knowledgebase. Nucleic Acids Research 37(Database issue): D365–D368.

Blundell TL, Sibanda BL, Sternberg MJ and Thornton JM (1987) Knowledge‐based prediction of protein structures and the design of novel molecules. Nature 326(6111): 347–352.

Buchan DW, Ward SM, Lobley AE et al. (2010) Protein annotation and modelling servers at University College London. Nucleic Acids Research 38(Web Server issue): W563–W568.

Canutescu AA and Dunbrack RL Jr (2003) Cyclic coordinate descent: a robotics algorithm for protein loop closure. Protein Science 12(5): 963–972.

Chailyan A, Marcatili P, Cirillo D and Tramontano A (2011) Structural repertoire of immunoglobulin lambda light chains. Proteins 79: 1513–1524.

Chien EY, Liu W, Zhao Q et al. (2010) Structure of the human dopamine D3 receptor in complex with a D2/D3 selective antagonist. Science 330(6007): 1091–1095.

Chothia C (1992) Proteins. One thousand families for the molecular biologist. Nature 357(6379): 543–544.

Chothia C and Lesk AM (1986) The relation between the divergence of sequence and structure in proteins. EMBO Journal 5(4): 823–826.

Cozzetto D, Kryshtafovych A and Tramontano A (2009) Evaluation of CASP8 model quality predictions. Proteins 77(suppl. 9): 157–166.

Cuff A, Redfern OC, Greene L et al. (2009) The CATH hierarchy revisited‐structural divergence in domain superfamilies and the continuity of fold space. Structure 17(8): 1051–1062.

Das R and Baker D (2008) Macromolecular modeling with rosetta. Annual Review of Biochemistry 77: 363–382.

DePristo MA, de Bakker PI, Lovell SC and Blundell TL (2003) Ab initio construction of polypeptide fragments: efficient generation of accurate, representative ensembles. Proteins 51(1): 41–55.

Eddy SR (2009) A new generation of homology search tools based on probabilistic inference. Genome Informatics 23(1): 205–211.

Fiser A, Do RK and Sali A (2000) Modeling of loops in protein structures. Protein Science 9(9): 1753–1773.

Forrest LR, Tavoulari S, Zhang YW, Rudnick G and Honig B (2007) Identification of a chloride ion binding site in Na+/Cl‐dependent transporters. Proceedings of the National Academy of Sciences of the USA 104(31): 12761–12766.

Ginalski K, Elofsson A, Fischer D and Rychlewski L (2003) 3D‐Jury: a simple approach to improve protein structure predictions. Bioinformatics 19(8): 1015–1018.

Godzik A (2011) Metagenomics and the protein universe. Current Opinion in Structural Biology 21(3): 398–403.

Guex N, Peitsch MC and Schwede T (2009) Automated comparative protein structure modeling with SWISS‐MODEL and Swiss‐PdbViewer: a historical perspective. Electrophoresis 30(suppl. 1): S162–S173.

Hildebrand A, Remmert M, Biegert A and SodingJ (2009) Fast and accurate automatic structure prediction with HHpred. Proteins 77(suppl. 9): 128–132.

Hooft RW, Vriend G, Sander C and Abola EE (1996) Errors in protein structures. Nature 381(6580): 272.

Kalyanaraman C, Imker HJ, Fedorov AA et al. (2008) Discovery of a dipeptide epimerase enzymatic function guided by homology modeling and virtual screening. Structure 16(11): 1668–1677.

Karchin R, Diekhans M, Kelly L et al. (2005) LS‐SNP: large‐scale annotation of coding non‐synonymous SNPs based on multiple information sources. Bioinformatics 21(12): 2814–2820.

Kelley LA and Sternberg MJ (2009) Protein structure prediction on the Web: a case study using the Phyre server. Nature Protocols 4(3): 363–371.

Kiefer F, Arnold K, Kunzli M, Bordoli L and Schwede T (2009) The SWISS‐MODEL Repository and associated resources. Nucleic Acids Research 37(Database issue): D387–D392.

Krieger E, Joo K, Lee J et al. (2009) Improving physical realism, stereochemistry, and side‐chain accuracy in homology modeling: four approaches that performed well in CASP8. Proteins 77(suppl. 9): 114–122.

Krivov GG, Shapovalov MV and Dunbrack RL Jr (2009) Improved prediction of protein side‐chain conformations with SCWRL4. Proteins 77(4): 778–795.

Kufareva I, Rueda M, Katritch V, Stevens RC and Abagyan R (2011) Status of GPCR modeling and docking as reflected by community‐wide GPCR Dock 2010 Assessment. Structure 19(8): 1108–1126.

Levitt M (2009) Nature of the protein universe. Proceedings of the National Academy of Sciences of the USA 106(27): 11079–11084.

Li YY, Hou TJ and Goddard WA 3rd (2010) Computational modeling of structure‐function of g protein‐coupled receptors with applications for drug design. Current Medicinal Chemistry 17(12): 1167–1180.

MacCallum JL, Hua L, Schnieders MJ et al. (2009) Assessment of the protein‐structure refinement category in CASP8. Proteins 77(suppl. 9): 66–80.

Masmoudi S, Antonarakis SE, Schwede T et al. (2001) Novel missense mutations of TMPRSS3 in two consanguineous Tunisian families with non‐syndromic autosomal recessive deafness. Human Mutation 18(2): 101–108.

McGuffin LJ and Roche DB (2010) Rapid model quality assessment for protein structure predictions using the comparison of multiple models without structural alignments. Bioinformatics 26(2): 182–188.

Melo F and Feytmans E (1998) Assessing protein structures with a non‐local atomic interaction energy. Journal of Molecular Biology 277(5): 1141–1152.

Moult J (2005) A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Current Opinion in Structural Biology 15(3): 285–289.

Murray PS, Li Z, Wang J et al. (2005) Retroviral matrix domains share electrostatic homology: models for membrane binding function throughout the viral life cycle. Structure 13(10): 1521–1531.

North B, Lehmann A and Dunbrack RL Jr (2011) A new clustering of antibody CDR loop conformations. Journal of Molecular Biology 406(2): 228–256.

Peitsch MC (1996) ProMod and Swiss‐Model: internet‐based tools for automated comparative protein modelling. Biochemical Society Transactions 24(1): 274–279.

Peitsch MC (2002) About the use of protein models. Bioinformatics 18(7): 934–938.

Pieper U, Eswar N, Webb BM et al. (2009) MODBASE, a database of annotated comparative protein structure models and associated resources. Nucleic Acids Research 37(Database issue): D347–D354.

Pieper U, Webb BM, Barkan DT et al. (2011) ModBase, a database of annotated comparative protein structure models, and associated resources. Nucleic Acids Research 39(Database issue): D465–D474.

Raman S, Vernon R, Thompson J et al. (2009) Structure prediction for CASP8 with all‐atom refinement using rosetta. Proteins 77(suppl. 9): 89–99.

Rasmussen SG, Choi HJ, Fung JJ et al. (2011a) Structure of a nanobody‐stabilized active state of the beta(2) adrenoceptor. Nature 469(7329): 175–180.

Rasmussen SG, Devree BT, Zou Y et al. (2011b) Crystal structure of the beta(2) adrenergic receptor‐Gs protein complex. Nature doi:10.1038/nature10361 [Epub ahead of print].

Rohl CA, Strauss CE, Chivian D and Baker D et al. (2004) Modeling structurally variable regions in homologous proteins with rosetta. Proteins 55(3): 656–677.

Rossi KA, Nayeem A, Weigelt CA and Krystek SR Jr (2009) Closing the side‐chain gap in protein loop modeling. Journal of Computer‐Aided Molecular Design 23(7): 411–418.

Roy A, Kucukural A and Zhang Y (2010) I‐TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols 5(4): 725–738.

Sali A and Blundell TL (1993) Comparative protein modelling by satisfaction of spatial restraints. Journal of Molecular Biology 234(3): 779–815.

Schwede T, Kopp J, Guex N and Peitsch MC (2003) SWISS‐MODEL: an automated protein homology‐modeling server. Nucleic Acids Research 31(13): 3381–3385.

Schwede T, Sali A, Eswar N and Peitsch MC (2008) Protein structure modeling. In: Schwede T and Peitsch MC (eds) Computational Structural Biology: Methods and Applications, pp. 3–36. Singapore: World Scientific Publishing Company Ltd.

Schwede T, Sali A, Honig B et al. (2009) Outcome of a workshop on applications of protein models in biomedical research. Structure 17(2): 151–159.

Sellers BD, Zhu K, Zhao S, Friesner RA and Jacobson MP (2008) Toward better refinement of comparative models: predicting loops in inexact environments. Proteins 72(3): 959–971.

Sippl MJ (1995) Knowledge‐based potentials for proteins. Current Opinion in Structural Biology 5(2): 229–235.

Soding J (2005) Protein homology detection by HMM–HMM comparison. Bioinformatics 21(7): 951–960.

Soto CS, Fasnacht M, Zhu J, Forrest L and Honig B (2008) Loop modeling: sampling, filtering, and scoring. Proteins 70(3): 834–843.

Sutcliffe MJ, Haneef I, Carney D and Blundell TL (1987) Knowledge based modelling of homologous proteins, part I: three‐dimensional frameworks derived from the simultaneous superposition of multiple structures. Protein Engineering 1(5): 377–384.

Terwilliger TC, Stuart D and Yokoyama S (2009) Lessons from structural genomics. Annual Review of Biophysics 38: 371–383.

UniProt, Consortium (2010) The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Research 38(Database issue): D142–D148.

de Vries RP, de Vries E, Moore KS et al. (2011) Only two residues are responsible for the dramatic difference in receptor binding between swine and new pandemic H1 hemagglutinin. Journal of Biological Chemistry 286(7): 5868–5875.

Wallace DF, Harris JM and Subramaniam VN (2010) Functional analysis and theoretical modeling of ferroportin reveals clustering of mutations according to phenotype. American Journal of Physiology – Cell Physiology 298(1): C75–C84.

Wallner B and Elofsson A (2006) Identification of correct regions in protein models using structural, alignment, and consensus information. Protein Science 15(4): 900–913.

Warne T, Moukhametzianov R, Baker JG et al. (2011) The structural basis for agonist and partial agonist action on a beta(1)‐adrenergic receptor. Nature 469(7329): 241–244.

Xu F, Wu H, Katritch V et al. (2011) Structure of an agonist‐bound human A2A adenosine receptor. Science 332: 322–327.

Xu J, Jiao F and Yu L (2008) Protein structure prediction using threading. Methods in Molecular Biology 413: 91–121.

Yang Y and Zhou Y (2008) Specific interactions for ab initio folding of protein terminal regions with secondary structures. Proteins 72(2): 793–803.

Yue P, Melamud E and Moult J (2006) SNPs3D: candidate gene and SNP selection for association studies. BMC Bioinformatics 7: 166.

Zhang Y and Skolnick J (2005) The protein structure prediction problem could be solved using the current PDB library. Proceedings of the National Academy of Sciences of the USA 102(4): 1029–1034.

Further Reading

Bordoli L, Kiefer F, Arnold K et al. (2009) Protein structure homology modelling using SWISS‐MODEL workspace. Nature Protocols 4: 1–13.

Cavasotto CN and Phatak SS (2009) Homology modeling in drug discovery: current trends and applications. Drug Discovery Today 14: 676–683.

Fiser A (2010) Template‐based protein structure modeling. Methods in Molecular Biology 673: 73–94.

Proteins (2011) Special issue on CASP9 – Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction. This is a collection of papers describing the assessment of the current state of the art in protein structure prediction.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Peitsch, Manuel C, and Schwede, Torsten(Nov 2011) Protein Homology Modelling. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0005273.pub2]