Protein Databases

An abundance of protein databases are available, dealing with fields as diverse as protein sequences, protein domains, posttranslational modifications and protein–protein interactions. Such resources are crucial to proteomics research.

Keywords: protein; knowledgebase; proteomics; proteome; database

 References
    Attwood TK, Blythe M, Flower DR, et al. (2002) PRINTS and PRINTS-S shed light on protein ancestry. Nucleic Acids Research 30: 239–241.
    Auerbach AD (2000) Eighth International HUGO-Mutation Database Initiative Meeting, April 9, Vancouver, Canada. Human Mutation 16: 265–268.
    Bader GD, Donaldson I, Wolting C, et al. (2001) BIND – The Biomolecular Interaction Network Database. Nucleic Acids Research 29: 242–245.
    Bairoch A (2000) The ENZYME database in 2000. Nucleic Acids Research 28: 304–305.
    Bateman A, Birney E, Cerruti L, et al. (2002) The Pfam Protein Families Database. Nucleic Acids Research 30: 276–280.
    Berman HM, Westbrook J, Feng Z, et al. (2000) The Protein Data Bank. Nucleic Acids Research 28: 235–242.
    Boeckmann B, Bairoch A, Apweiler R, et al. (2003) The Swiss-Prot protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research 31(1): 365–370.
    Cooper CA, Harrison MJ, Wilkins MR and Packer NH (2001) GlycoSuiteDB: a new curated relational database of glycoprotein glycan structures and their biological sources. Nucleic Acids Research 29: 332–335.
    Corpet F, Servant F, Gouzy J and Kahn D (2000) ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons. Nucleic Acids Research 28: 267–269.
    Falquet L, Pagni M, Bucher P, et al. (2002) The PROSITE database, its status in 2002. Nucleic Acids Research 30: 235–238.
    Haft DH, Loftus BJ, Richardson DL, et al. (2001) TIGRFAMSs: a protein family resource for the functional identification of proteins. Nucleic Acids Research 29: 41–43.
    Hoogland C, Sanchez J-C, Tonella L, et al. (2000) The 1999 SWISS-2DPAGE database update. Nucleic Acids Research 28: 286–288.
    Hoogland C, Sanchez J-C, Walther D, et al. (1999) Two-dimensional electrophoresis resources available from ExPASy. Electrophoresis 20: 3568–3571.
    Horn F, Weare J, Beukers MW, et al. (1998) GPCRDB: an information system for G protein-coupled receptors. Nucleic Acids Research 26: 277–281.
    Kanehisa M, Goto S, Kawashima S and Nakaya A (2002) The KEGG databases at GenomeNet. Nucleic Acids Research 30: 42–46.
    Krawczak M and Cooper DN (1997) The Human Gene Mutation Database. Trends in Genetics 13: 121–122.
    Lefranc M-P (2001) IMGT, the international ImMunoGeneTics database. Nucleic Acids Research 29: 207–209.
    Letunic I, Goodstadt L, Dickens NJ, et al. (2002) Recent improvements to the SMART domain-based sequence annotation resource. Nucleic Acids Research 30: 242–244.
    book McKusick VA (1998) Mendelian Inheritance in Man. Catalogs of Human Genes and Genetic Disorders, 12th edn. Baltimore, MD: Johns Hopkins University Press.
    Mulder NJ, Apweiler R, Attwood TK, et al. (2002) The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Briefings in Bioinformatics 3: 225–235.
    Murzin AG, Brenner SE, Hubbard T and Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology 247: 536–540.
    Orengo CA, Michie AD, Jones S, et al. (1997) CATH – a hierarchic classification of protein domain structures. Structure 5: 1093–1108.
    Rawlings ND, O'Brien EA and Barrett AJ (2002) MEROPS: the protease database. Nucleic Acids Research 30: 343–346.
    Schomburg I, Chang A, Hofmann O, et al. (2002) BRENDA, a resource for enzyme data and metabolic information. Trends in Biochemical Sciences 27: 54–56.
    Sherry ST, Ward MH, Kholodov M, et al. (2002) dbSNP: the NCBI database of genetic variation. Nucleic Acids Research 29: 308–311.
    Xenarios I, Salwinski L, Duan XJ, et al. (2002) DIP: the Database of Interacting Proteins. A research tool for studying cellular networks of protein interactions. Nucleic Acids Research 30: 303–305.
    Zanzoni A, Montecchi-Palazzi L, Quondam M, Ausiello G, Helmer-Citterich M and Cesareni G (2002) MINT: a Molecular INTeraction database. FEBS Letters 513(1): 135–140.
 Web Links
    ePath The ExPASy list of Biomolecular servers. This site lists the major (over one thousand!) databases of interest relative to proteomic research http://www.expasy.org/alinks.html
    ePath BIND. The Biomolecular Interaction Network Database stores full descriptions of interactions, molecular complexes and pathways, among which are protein–protein interactions http://bind.mshri.on.ca/
    ePath BRENDA. The main collection of enzyme functional data available to the scientific community, maintained and developed at the Institute of Biochemistry at the University of Cologne http://www.brenda.uni-koeln.de/
    ePath CATH. A hierarchical domain classification of protein structures derived from PDB (see below) http://www.biochem.ucl.ac.uk/bsm/cath_new/index.html
    ePath DbSNP. The Single Nucleotide Polymorphism database is a repository of all the genetic variations which are discovered as the human genome is being deciphere http://www.ncbi.nlm.nih.gov/SNP
    ePath DIP. The Database of Interacting Proteins is curated both manually by expert curators and automatically. The DIP database provides a comprehensive and integrated tool for browsing and extracting information on protein–protein interactions http://dip.doe-mbi.ucla.edu/
    ePath ENZYME. An enzyme nomenclature database based on the recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (IUBMB) http://us.expasy.org/enzyme/
    ePath GlycoSuiteDB. A database of glycoprotein glycan structures derived from the scientific literature. Regarding proteins, when the glycan structures are known to be attached to a specific protein, direct links are made to Swiss-Prot and TrEMBL databases (see below) http://www.glycosuite.com/
    ePath GPCRDB. An information system, which collects and disseminates data related to the G-protein-coupled receptor http://www.gpcr.org/7tm
    ePath HGMD. The Human Gene Mutation Database is a comprehensive database of gene lesions underlying human inherited disease http://www.hgmd.org/
    ePath HGVS. The Human Genome Variation Society was created to promote the discovery and free publication of information on the variations in human genes by fostering a central repository for such variations http://www.hgvs.org/
    ePath IMGT. The international ImMunoGeneTics database is a high-quality integrated information system that specializes in immunoglobulins, T cell receptors and major histocompatibility complex molecules of all vertebrate species http://imgt.cines.fr:8104/
    ePath InterPro. An integrated documentation resource for protein families, domains and functional sites, which was developed to rationalize the complementary efforts of the individual protein signature database projects that form the InterPro core (see Table 1) http://www.ebi.ac.uk/interpro/
    ePath KEGG. The Kyoto Encyclopedia of Genes and Genomes strives to computerize the current knowledge of molecular and cellular biology in terms of the information pathways that consist of interacting molecules or genes http://www.genome.ad.jp/kegg/
    ePath MEROPS. Provides data on individual proteases, protease families and also clans into which the families are grouped http://merops.iapc.bbsrc.ac.uk/
    ePath MINT. The Molecular Interactions database. Stores functional interactions between biological molecules. http://cbm.bio.uni.oma2.it/mint/
    ePath OMIM. The Online Mendelian Inheritance in Man database is a collection of human genes and genetic disorders maintained by the McKusick–Nathans Institute for Genetic Medicine, Johns Hopkins University (Baltimore, MD) and the National Center for Biotechnology Information, National Library of Medicine (Bethesda, MD). The database offers a wealth of textual information provided in each entry, some of which can be useful in the context of protein studies http://www.ncbi.nlm.nih.gov/omim/
    ePath PDB. The Protein Data Bank is a collection of 3D structures of proteins, nucleic acids and other biological macromolecules. PDB is a resource of critical importance in the discovery of new pharmacological agents, new catalysts, new biomaterials and possibly nanodevices http://www.rcsb.org/pdb/
    ePath SCOP. Provides a detailed and comprehensive description of the structural and evolutionary relationships between proteins whose structure is known http://scop.mrc-lmb.cam.ac.uk/scop/
    ePath Swiss-2Dpage. Contains 2D-PAGE and SDS PAGE reference maps and information on identified proteins from a variety of human biological samples. It is maintained collaboratively by the Central Clinical Chemistry Laboratory of the Geneva University Hospital and the Swiss Institute of Bioinformatics http://www.expasy.org/ch2d/
    ePath Swiss-Prot. A non-redundant curated protein knowledge resource that provides a high level of annotation. Besides the stark protein sequence, a Swiss-Prot entry offers the description of the function of a protein, its domain structure, posttranslational modifications, variants and links to other databases http://www.expasy.org/sprot
    ePath TrEMBL. Consists of computer-annotated entries in Swiss-Prot format derived from the translation of coding sequences in the European Molecular Biology Laboratory nucleotide sequence database. Hence TrEMBL entries are preliminary Swiss-Prot entries that have not yet been manually annotated http://www.ebi.ac.uk/swissprot
    ePath World-2Dpage. A complete index of 2D-PAGE databases and services http://www.expasy.org/ch2d/2d-index.html
Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Gerritsen, Vivienne Baillie, and Bairoch, Amos(Sep 2005) Protein Databases. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1038/npg.els.0005251]