Protein Coding

Protein-coding genes can be organized into families of similar function, structure and sequence, according to their shared evolutionary histories. Individual proteins are modularly constructed of domains, which are often rearranged on evolutionary timescales to create functionally novel proteins.

Keywords: genetic code; codon usage; gene family; domain; alternative splicing

Figure 1. Universal genetic code. Codon triplets are read off in the order: left, top, right. For example, AUG is methionine.
Figure 2. Effective number of codons plotted against G+C content at the third codon position, with one point for each of 7765 experimentally confirmed human genes. The line is a theoretical upper bound that is based on the genetic code.
Figure 3. Amino acid identities in the globin superfamily. The matrix element M(I, J) refers to the number of identically matched amino acids between row I and column J, given as a percentage of the protein found in row I.
Figure 4. Number of distinct domain architectures in the first four sequenced eucaryotic genomes, shown according to cellular environment as intracellular, extracellular and transmembrane. (Adapted with permission from Lander et al. 2001.)
Figure 5. Alternative splice forms for neurexin 3 (NRXN3). Different versions of an exon are indicated by a letter suffix. All conceivable exon combinations are listed with a ‘y’ (yes) or ‘n’ (no) to indicate whether or not it has been observed.
close
 References
    Bateman A, Birney E, Cerruti L, et al. (2002) The Pfam protein families database. Nucleic Acids Research 30: 276–280.
    Brenner SE, Chothia C and Hubbard TJ (1998) Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proceedings of the National Academy of Sciences of the United States of America 95: 6073–6078.
    Brett D, Pospisil H, Valcarcel J, Reich J and Bork P (2002) Alternative splicing and genome complexity. Nature Genetics 30: 29–30.
    Burge C and Karlin S (1997) Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology 268: 78–94.
    Hardison R (1998) Hemoglobins from bacteria to man: evolution of different patterns of gene expression. Journal of Experimental Biology 201: 1099–1117.
    Hubbard T, Barker D, Birney E, et al. (2002) The Ensembl genome database project. Nucleic Acids Research 30: 38–41.
    Lander ES, Linton LM, Birren B, et al. (2001) Initial sequencing and analysis of the human genome. Nature 409: 860–921.
    Patthy L (1999) Genome evolution and the evolution of exon shuffling – a review. Gene 238: 103–114.
    Rowen L, Young J, Birditt B, et al. (2002) Analysis of the human neurexin genes: alternative splicing and the generation of protein diversity. Genomics 79: 587–597.
    Wright F (1990) The ‘effective number of codons’ used in a gene. Gene 87: 23–29.
 Further Reading
    Grabowski PJ and Black DL (2001) Alternative RNA splicing in the nervous system. Progress in Neurobiology 65: 289–308.
    Harrison PM and Gerstein M (2002) Studying genomes through the aeons: protein families, pseudogenes and proteome evolution. Journal of Molecular Biology 318: 1155–1174.
    Kaessmann H, Zöllner S, Nekrutenko A and Li WH (2002) Signatures of domain shuffling in the human genome. Genome Research 12: 1642–1650.
    Maniatis T and Tasic B (2002) Alternative pre-mRNA splicing and proteome expansion in metazoans. Nature 418: 236–243.
    Meyerowitz EM (2002) Plants compared to animals: the broadest comparative study of development. Science 295: 1482–1485.
    book Ohno S (1970) Evolution by gene duplication. Berlin/New York: Springer-Verlag.
    Pawson T, Raina M and Nash P (2002) Interaction domains: from simple binding events to complex cellular behavior. FEBS Letters 513: 2–10.
    Ponting CP and Russell RR (2002) The natural history of protein domains. Annual Reviews in Biophysics and Biomolecular Structure 31: 45–71.
    Stein L (2001) Genome annotation: from sequence to biology. Nature Reviews Genetics 2: 493–503.
    Yu J, Hu S, Wang J, et al. (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296: 79–92.
 Web Links
    ePath Hemoglobin alpha 1 (HBA1) LocusID: 3039. LocusLink: http://www.ncbi.nlm.nih.gov/LocusLink/LocRpt.cgi?l=3039
    ePath Hemoglobin beta (HBB); LocusID: 3043. LocusLink: http://www.ncbi.nlm.nih.gov/LocusLink/LocRpt.cgi?l=3043
    ePath Hemoglobin delta (HBD); LocusID: 3045. LocusLink: http://www.ncbi.nlm.nih.gov/LocusLink/LocRpt.cgi?l=3045
    ePath Hemoglobin zeta (HBZ); LocusID: 3050. LocusLink: http://www.ncbi.nlm.nih.gov/LocusLink/LocRpt.cgi?l=3050
    ePath Neurexin 3 (NRXN3); LocusID: 9369. LocusLink: http://www.ncbi.nlm.nih.gov/LocusLink/LocRpt.cgi?l=9369
    ePath Hemoglobin alpha 1 (HBA1); MIM number: 141800. OMIM: http://www.ncbi.nlm.nih.gov/htbin-post/Omim/dispmim?141800
    ePath Hemoglobin beta (HBB); MIM number: 141900. OMIM: http://www.ncbi.nlm.nih.gov/htbin-post/Omim/dispmim?141900
    ePath Hemoglobin delta (HBD); MIM number: 142000. OMIM: http://www.ncbi.nlm.nih.gov/htbin-post/Omim/dispmim?142000
    ePath Hemoglobin zeta (HBZ); MIM number: 142310. OMIM: http://www.ncbi.nlm.nih.gov/htbin-post/Omim/dispmim?142310
    ePath Neurexin 3 (NRXN3); MIM number: 600567. OMIM: http://www.ncbi.nlm.nih.gov/htbin-post/Omim/dispmim?600567
Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Wong, Gane Ka‐Shu(Sep 2005) Protein Coding. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1038/npg.els.0005017]