Structural Databases of Biological Macromolecules

Abstract

A biological macromolecule's function is determined by the chemical and physical characteristics of its three‐dimensional (3D) shape, or ‘structure’. For this reason, knowing the structure of a biomolecule is very helpful if we want to be able to understand living systems and disease. The Protein Data Bank (PDB) began as an archive of the structural data available about biological macromolecules. The advances made in all technologies have been mirrored in further development of the PDB and in the structural speciality and structural characteristic databases that have also evolved. New resource portals such as the Protein Structure Initiative (PSI) Structural Biology Knowledgebase (SBKB) also collect all available genomic, structural, and functional information together to reduce the time needed to obtain the latest information on structurally determined proteins. This article will describe selected structural databases and resources available to the public today.

Key Concepts:

  • Structural information about proteins at the atomic‐level can lead to an explanation of its role in living systems.

  • The Protein Data Bank archive is the sole provider of primary structural data of biological macromolecules worldwide.

  • The four members of the worldwide protein data bank consortium, the RCSB PDB, PDBe, PDBj and BMRB, maintain the PDB archive and provide tools for exploring and understanding the structural entries.

  • Other value‐added databases further classify derived structural information by combining the structural and/or biological aspects of a biomolecule.

  • Meta‐portals such as the Structural Biology Knowledgebase integrate all available genetic, structural, functional, and experimental information for all structurally determined proteins to enable a better understanding of sequence–structure–function relationships.

Keywords: structural biology; databases; crystallography; nuclear magnetic resonance; electron cryomicroscopy; structural genomics; structural proteomics

Figure 1.

Growth of the contents of the Protein Data Bank (as of May 2012). The number of structures deposited each year is shown in grey, the total number of structures available in black. 2012 values are projected deposited/total values based on deposition trends up to May 2012. This chart is regularly updated at http://www.rcsb.org

Figure 2.

Example of a structure query using the Structural Biology Knowledgebase. Users can search the SBKB by protein or DNA sequence, by Protein Data Bank (PDB) ID, UniProt Accession Code (AC) or by text. Red links direct users to the primary data resources. (Top) The summary of search results includes matching structures, theoretical models, structure determination targets, protocols, and available DNA clones from the PSI Materials Repository. (Middle) the Structures tab organises the links to primary data resources. (Bottom) The SBKB's annotation notebook will provide available links to over 150 key biological databases; biological categories in the right‐hand tabs that have no existing annotations are greyed out.

close

References

Allen FH, Bellard S, Brice MD et al. (1979) The Cambridge Crystallographic Data Centre: computer‐based search, retrieval, analysis and display of information. Acta Crystallographica Section B: Structural Science 35: 2331–2339.

Andreeva A, Howorth D, Brenner SE et al. (2004) SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Research 32(Database issue): D226–D229.

Berman HM, Henrick K and Nakamura H (2003) Announcing the worldwide Protein Data Bank. Nature Structural Biology 10(12): 980.

Berman HM, Westbrook J, Feng Z et al. (2002) The nucleic acid database. Acta Crystallographica Section D 58: 899–907.

Berman HM, Westbrook JD, Feng Z et al. (2000) The protein data bank. Nucleic Acids Research 28: 235–242.

Binkowski A (2009) Global Protein Surface Survey. http://gpss.mcsg.anl.gov/.

Blake CCF, Koenig DF, Mair GA et al. (1965) Structure of hen egg‐white lysozyme. A three dimensional Fourier synthesis at 2 Å resolution. Nature 206: 757–761.

Bordoli L and Schwede T (2012) Automated protein structure modeling with SWISS‐MODEL workspace and the protein model portal. Methods in Molecular Biology 857: 107–136.

Cormier CY, Park JG, Fiacco M et al. (2012) PSI:Biology‐materials repository: a biologist's resource for protein expression plasmids. Journal of Structural and Functional Genomics 12(2): 55–62.

Cuff AL, Sillitoe I, Lewis T et al. (2009) The CATH classification revisited‐‐architectures reviewed and new ways to characterize structural divergence in superfamilies. Nucleic Acids Research 37(Database issue): D310–D314.

Dodge C, Schneider R and Sander C (1998) The HSSP database of protein structure‐sequence alignments and family profiles. Nucleic Acids Research 26: 313–315.

Eswar N, Eramian D, Webb B et al. (2008) Protein structure modeling with MODELLER. Methods in Molecular Biology 426: 145–159.

Gabanyi MJ, Adams PD, Arnold K et al. (2011) The Structural Biology Knowledgebase: a portal to protein structures, sequences, functions, and methods. Journal of Structural and Functional Genomics 12: 45–54.

Gifford LK, Carter LG, Gabanyi MJ et al. (2012) The protein structure initiative structural biology knowledgebase technology portal: a structural biology web resource. Journal of Structural and Functional Genomics 13(2): 57–62.

Golovin A and Henrick K (2009) Chemical substructure search in SQL. Journal of Chemical Information and Modeling 49(1): 22–27.

Henderson R, Sali A, Baker ML et al. (2012) Outcome of the first electron microscopy validation task force meeting. Structure 20(2): 205–214.

Henrick K, Feng Z, Bluhm W et al. (2008) Remediation of the protein data bank archive. Nucleic Acids Research 36(Database issue): D426–D433.

Hildebrand A, Remmert M, Biegert A et al. (2009) Fast and accurate automatic structure prediction with HHpred. Proteins 77(suppl. 9): 128–132.

Kartha G, Bello J and Harker D (1967) Tertiary structure of ribonuclease. Nature 213: 862–865.

Kendrew JC, Bodo G, Dintzis HM et al. (1958) A three‐dimensional model of the myoglobin molecule obtained by X‐ray analysis. Nature 181: 662–666.

Kiefer F, Arnold K, Kunzli M et al. (2009) The SWISS‐MODEL repository and associated resources. Nucleic Acids Research 37(Database issue): D387–D392.

Kinjo AR, Suzuki H, Yamashita R et al. (2012) Protein Data Bank Japan (PDBj): maintaining a structural data archive and resource description framework format. Nucleic Acids Research 40(Database Issue): D453–D460.

Krissinel E and Henrick K (2004) Secondary‐structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallographica Section D: Biological Crystallography 60: 2256–2268.

Krissinel E and Henrick K (2007) Inference of macromolecular assemblies from crystalline state. Journal of Molecular Biology 372(3): 774–797.

Laskowski RA, Hutchinson EG, Michie AD et al. (1997) PDBSum: a Web‐based database of summaries and analyses of all PDB structures. Trends in Biochemical Sciences 22: 488–490.

Lawson CL, Baker ML, Best C et al. (2011) EMDataBank.org: unified data resource for CryoEM. Nucleic Acids Research 39(Database issue): D456–D464.

Lawson CL, Dutta S, Westbrook J et al. (2008) Representation of viruses in the remediated PDB archive. Acta Crystallographica Section D 64: 874–882.

Lees J, Yeats C, Perkins J et al. (2012) Gene3D: a domain‐based resource for comparative genomics, functional annotation and protein network analysis. Nucleic Acids Research 40(Database issue): D465–D471.

Martin ACR, Orengo CA, Hutchinson EG et al. (1998) Protein folds and functions. Structure 6: 875–884.

Martz E (2009) FirstGlance in Jmol. http://firstglance.jmol.org.

Perutz MF, Rossmann MG, Cullis AF et al. (1960) Structure of haemoglobin: a three‐dimensional Fourier synthesis at 5.5 Å resolution, obtained by X‐ray analysis. Nature 185: 416–422.

Pieper U, Eswar N, Webb BM et al. (2009) MODBASE, a database of annotated comparative protein structure models and associated resources. Nucleic Acids Research 37(Database issue): D347–D354.

Protein Data Bank (1971) Protein Data Bank. Nature New Biology 233: 223.

Read RJ, Adams PD, Arendall WB III et al. (2011) A new generation of crystallographic validation tools for the Protein Data Bank. Structure 19(10): 1395–1412.

Rose PW, Beran B, Bi C et al. (2011) The RCSB protein data bank: redesigned web site and web services. Nucleic Acids Research 39(Database issue): D392–D401.

Roy A, Kucukural A and Zhang Y (2010) I‐TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols 5(4): 725–738.

Rykunov D, Steinberger E, Madrid‐Aliste CJ et al. (2009) Improved scoring function for comparative modeling using the M4T method. Journal of Structural and Functional Genomics 10(1): 95–99.

Salwinski L, Miller CS, Smith AJ et al. (2004) The database of interacting proteins: 2004 update. Nucleic Acids Research 32(Database issue): D449–D451.

Sickmeier M, Hamilton JA, LeGall T et al. (2007) DisProt: the Database of Disordered Proteins. Nucleic Acids Research 35(Database issue): D786–D793.

Ulrich EL, Akutsu H, Doreleijers JF et al. (2008) BioMagResBank. Nucleic Acids Research 36(Database issue): D402–D408.

Velankar S, Alhroub Y, Best C et al. (2012) PDBe: protein data bank in Europe. Nucleic Acids Research 40(Database issue): D445–D452.

Westbrook JD, Ito N, Nakamura H et al. (2005) PDBML: the representation of archival macromolecular structure data in XML. Bioinformatics 21: 988–992.

White SH (2004) The progress of membrane protein structure determination. Protein Science: A Publication of the Protein Society 13(7): 1948–1949.

Whitmore L, Woollett B, Miles AJ et al. (2011) PCDDB: the protein circular dichroism data bank, a repository for circular dichroism spectral and metadata. Nucleic Acids Research 39(Database issue): D480–D486.

Wilson D, Madera M, Vogel C et al. (2007) The SUPERFAMILY database in 2007: families and functions. Nucleic Acids Research 35(Database issue): D308–D313.

Wyckoff HW, Hardman KD, Allewell N et al. (1967) The structure of ribonuclease‐S at 6 Å resolution. Journal of Biological Chemistry 242: 3749–3753.

Further Reading

Arnold E, Himmel DM and Rossmann MG (eds) (2012) International Tables for Crystallography, vol. F: Crystallization of Biological Macromolecules [Chapters 21 and 24]. West Sussex, UK: John Wiley & Sons, Ltd.

Hall SR and McMahon B (eds) (2006) International Tables for Crystallography, vol. G: Definition and exchange of crystallographic data. West Sussex, UK: John Wiley & Sons, Ltd.

Anonymous (2007) Structural Genomics Supplement. Structure 16(1): 1–160.

Anonymous (2012) Nucleic Acids Databases Issue. Nucleic Acids Research 40: D1–D1317.

Web Links

Canadian Bioinformatics.ca Links Directory. http://bioinformatics.ca/links_directory/category/protein

Structural Biology Knowledgebase Portal to information on structurally targeted proteins, structures, theoretical models, methods, materials, technologies, and information on the Protein Structure Initiative. http://sbkb.org

Structural data resources directories: ExPASy Bioinformatics Resource Portal. http://expasy.org

Worldwide Protein Data Bank Member organizations serve as data deposition, processing, and distribution centers for PDB data http://www.wwpdb.org

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Gabanyi, Margaret J, and Berman, Helen M(Sep 2012) Structural Databases of Biological Macromolecules. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0005252.pub2]