Protein Coding

Abstract

A protein‐coding gene is composed of a series of nucleotide triplets – the codons – that encrypt not only the protein content but also the start and stop signals. There are 64 (43) codons in the canonical genetic code, which encode 20 amino acids with redundancy. Hence, there are synonymous codons that encode the same amino acids, and they are used at different frequencies among different species. The resultant codon‐usage biases reveal complex interplays of mutation and selection. Protein‐coding genes can be organised into families of similar function, structure and sequence, according to their shared evolutionary histories. Individual proteins are modularly constructed of domains, which are often rearranged on evolutionary timescales to create functionally novel proteins.

Key Concepts:

  • A protein‐coding gene consists of a series of nucleotide triplets.

  • The genetic code defines the relationship between codons and amino acids.

  • The genetic code can be organised into two halves and four quarters, which manifest distinct physiochemical features.

  • Codon usage bias, a phenomenon in which synonymous codons (encoding the same amino acid) are used at different frequencies in different species, is a result of complex interplays between mutation and selection.

  • Protein‐coding genes are organised into families of similar function, structure and sequence, according to their shared evolutionary histories.

  • Individual proteins are modularly constructed from domains, which are often rearranged on evolutionary timescales to create functionally novel proteins.

Keywords: genetic code; codon usage; gene family; domain; alternative splicing

Figure 1.

A content‐centric organisation of the genetic code based on GC and purine contents. The genetic code is divided into two halves (shaded in light pink and blue) and four quarters (letters highlighted in blue, green, orange and red). R and Y stand for purine and pyrimidine, respectively. N represents any of the four nucleotides and St indicates stop codon.

Figure 2.

Effective number of codons plotted against G+C content at the third codon position, with one point for each of the 7765 experimentally confirmed human genes. The line is a theoretical upper bound that is based on the genetic code.

Figure 3.

Amino acid identities in the globin superfamily. The matrix element M(I, J) refers to the number of identically matched amino acids between row I and column J, given as a percentage of the protein found in row I.

Figure 4.

Number of distinct domain architectures in the first four sequenced eucaryotic genomes, shown according to cellular environment as intracellular, extracellular and transmembrane. Adapted with permission from Lander et al. .

Figure 5.

Alternative splice forms for NRXN3. Different versions of an exon are indicated by a letter suffix. All conceivable exon combinations are listed with an ‘y’ (yes) or ‘n’ (no) to indicate whether or not it has been observed.

close

References

Bateman A, Birney E, Cerruti L et al. (2002) The Pfam protein families database. Nucleic Acids Research 30: 276–280.

Brenner SE, Chothia C and Hubbard TJ (1998) Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proceedings of the National Academy of Sciences of the USA 95: 6073–6078.

Brett D, Pospisil H, Valcarcel J, Reich J and Bork P (2002) Alternative splicing and genome complexity. Nature Genetics 30: 29–30.

Bulmer M (1991) The selection‐mutation‐drift theory of synonymous codon usage. Genetics 129: 897–907.

Burge C and Karlin S (1997) Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology 268: 78–94.

Hardison R (1998) Hemoglobins from bacteria to man: evolution of different patterns of gene expression. Journal of Experimental Biology 201: 1099–1117.

Hubbard T, Barker D, Birney E et al. (2002) The Ensembl genome database project. Nucleic Acids Research 30: 38–41.

Keren H, Lev‐Maor G and Ast G (2010) Alternative splicing and evolution: diversification, exon definition and function. Nature Reviews Genetics 11: 345–355.

Lander ES, Linton LM, Birren B et al. (2001) Initial sequencing and analysis of the human genome. Nature 409: 860–921.

Mattick JS and Makunin IV (2006) Non‐coding RNA. Human Molecular Genetics 15: R17–R29.

Patthy L (1999) Genome evolution and the evolution of exon shuffling – a review. Gene 238: 103–114.

Pertea M and Salzberg SL (2010) Between a chicken and a grape: estimating the number of human genes. Genome Biology 11: 206.

Rowen L, Young J, Birditt B et al. (2002) Analysis of the human neurexin genes: alternative splicing and the generation of protein diversity. Genomics 79: 587–597.

Wright F (1990) The ‘effective number of codons’ used in a gene. Gene 87: 23–29.

Yu J, Yang Z, Kibukawa M et al. (2002) Minimal introns are not ‘junk’. Genome Research 12: 1185–1189.

Zhang Z and Yu J (2011) On the organizational dynamics of the genetic code. Genomics Proteomics & Bioinformatics 9: 21–29.

Zhang Z, Li J, Cui P et al. (2012) Codon Deviation Coefficient: a novel measure for estimating codon usage bias and its statistical significance. BMC Bioinformatics 13: 43.

Zhu J, He F, Wang D et al. (2010) A novel role for minimal introns: routing mRNAs to the cytosol. PLoS ONE 5: e10144.

Further Reading

Liu XQ (2000) Protein‐splicing intein: genetic mobility, origin, and evolution. Annual Review of Genetics 34: 61–76.

Martinez M (2011) Plant protein‐coding gene families: emerging bioinformatics approaches. Trends in Plant Science 16: 558–567.

Plotkin JB and Kudla G (2011) Synonymous but not the same: the causes and consequences of codon bias. Nature Reviews Genetics 12: 32–42.

Zhang Z and Yu J (2012) The pendulum model for genome compositional dynamics: from the four nucleotides to the twenty amino acids. Genomics, Proteomics & Bioinformatics 10: 175–180.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Zhang, Zhang, Wong, Gane Ka‐Shu, and Yu, Jun(Jun 2013) Protein Coding. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0005017.pub2]