Protein Primary Structure


The order of the amino acids in a polypeptide chain is referred to as its amino acid sequence or primary structure. It is specified by the nucleotide sequence of the encoding gene, and it is assembled in cells through the process called translation. The primary structure determines the shape into which a polypeptide chain naturally folds, and consequently allows its biochemical function. But highly regulated enzymatic activities in cells can drive the recoding of nucleotide messages. Moreover, after or even during the biosynthesis, specific amino acid residues often undergo specific chemical modifications. These posttranslational modifications, able to modulate the physical and chemical properties, folding, stability and ultimately the function of the polypeptide chains, are considered as features of the primary structure of proteins. A large community of scientists has been dedicated to setting up complementary methods to determine this first level in the hierarchical order of the protein structure. Thanks to these methodological efforts, millions of protein primary structures have been determined to date and deposited in large public specialised databases freely accessible for structural and evolutionary relationship studies.

Key Concepts

  • Proteins are built up from l‐alpha amino acid units
  • The amino acid linear arrangement determines the 3D structure and consequently the biochemical function of a protein
  • Amino acids are polymerised to a unique sequence by a biosynthetic process, according to which a nucleotide sequence is translated into an amino acid sequence
  • Recoding activities have been known, leading to alternative protein products in a regulated and controlled process
  • Posttranslational modifications can additionally modify an amino acid sequence to drive a sophisticated modulation of conformation and function of proteins
  • Biotechnological production of therapeutic proteins demands a highly accurate verification of the primary structure of products
  • Depending on the genome information available, the determination of the protein primary structure can be accomplished either by direct methods on isolated or partially resolved proteins or by the indirect deciphering of their nucleic acid sequences

Keywords: alpha amino acids; translation; recoding; posttranslational modifications; protein sequence databases

Figure 1. The 20 canonical amino acids. Structures are showed in the outer ring, whereas their three‐letter and one‐letter abbreviations are reported in the inner rings. Colours distinguish hydrophobic (orange), polar (green), acidic (violet) and basic (blue) properties of their side chain. At the centre of the wheel, the structure of a generic l‐alpha amino acid is reported.
Figure 2. Diagram of the polymerisation of l‐alpha amino acids into polypeptide chains. The mechanism through which three generic l‐alpha amino acids (R1, R2 and R3: side chains) condensate their amino and carboxyl groups and generate two peptide bonds in a head‐to‐tail mode is schematically reported. The loss of water following the condensation of the amine and carboxyl chemical functions is also pointed out. The internal unit is referred to as (amino acid) residue. The two amino acids at the ends of the primary structure still maintain a free amine and a free carboxyl group, and are named the ‐terminal and ‐terminal amino acids, respectively. By convention, in any protein primary structure, the residue numbering starts from the unit that maintains the amino group.
Figure 3. The translation: a simplified cartoon. In the ubiquitous process known as translation, l‐alpha amino acids (forms) are selected from the pool of soluble metabolites and assembled into a primary structure from the ‐terminus to the ‐terminus. In the first step of translation (a), each aminoacyl‐tRNA synthetase (blue area) specifically links each of the 20 canonical amino acids to a cognate tRNA, that is, the tRNA from the cytosolic pool (here represented by the purple structure) that exhibits a nucleotide triplet complementary to one of the triplets encoding in the mRNAs this specific amino acid (codon). In the cartoon, nucleotides are highlighted by means of coloured rectangles. Following this reaction, an amino acid will approach to the translational machinery linked to the adaptor molecule (tRNA) through a bond with a high energy of hydrolysis. In the activation step, energy is provided by the hydrolysis of ATP (not shown). Once formed, aminoacyl‐tRNAs are recognised by a dedicated protein (the elongation protein factor, yellow area in the cartoon) and, if its anticodon is complementary to the nucleotide triplets specifically exposed at the ribosome platform, a new peptide bond is synthesised on the growing amino acid sequence, arisen from previous elongation steps. Simultaneously with the formation of the peptide bond, a complex and coordinated movement of the ribosome makes possible that a new codon will be correctly exposed, and that a new elongation step will start. In the elongation cycles, energy is provided by the hydrolysis of GTP (not shown). Specific elements, encompassed into the mRNA sequences, make sure that initiation, reading and stopping of the mRNA will be correctly accomplished. Among these, in the cartoon the canonical starting codon (AUG) and one of the standard stop codon (UAG) are highlighted on the molecule of the mRNA (green line).
Figure 4. A summary of the main approaches to determine a protein primary structure. (a) Once a codifying sequence (DNA/RNA) has been determined, the primary structure of the expressed protein can be deduced from an translation of the three nucleotide‐based message into the amino acid sequence. Knowledge of the structural elements into the codifying sequence that determine the frame of the ribosome reading is in demand. (b) The complete process for determination of the protein primary structure by a chemical approach is schematically represented. The ordered amino acid arrangement of a protein from its ‐terminus can be achieved directly on an isolated protein (the blue structure at the top), following an iterative chemical process referred to as Edman degradation (the central box). This technique demands strongly denatured proteins: the preliminary steps for denaturing and breaking of any disulphide bridge are remarked in the upper part of the diagram. Because proteins are longer than the limits of the Edman degradation, independent cleavages are required to generate small peptides (blue and red bars) that can then be sequenced individually. Finally (at the bottom), the entire primary structure may be virtually reconstructed by overlapping the amino acid sequences provided by each peptide pattern. Preliminary steps providing knowledge of the molecular weight of the intact protein and of the ‐terminal and ‐terminal sequences, necessary to be sure that the entire protein is accounted for, are also included in the diagram. (c) Instrumentation and principle of sequencing of polypeptides by tandem mass spectrometry are shown. A simplified scheme of a typical tandem mass spectrometer is reported at the top of the panel. The mechanisms of fragmentation of parent ions (referred to as MSMS analysis) and the proposed nomenclature for some of the expected product ions are reported at the left of the panel. Finally, at the right of the panel, a simple example of how an amino acid sequence can be reconstructed from an MSMS spectrum. The amino acid arrangement along the primary structure is inferred by the measurement of the difference in mass between contiguous product ions. The three‐letter abbreviations and the mass residues are reported in Table. Xle is a three‐letter abbreviation often employed to indicate the residue in which either of the two isomer amino acids leucine and isoleucine can be placed.


Calderone TL, Stevens RD and Oas TG (1996) High‐level misincorporation of lysine for arginine at AGA codons in a fusion protein expressed in E. coli. Journal of Molecular Biology 262: 407–412.

Harris RP and Kilby PM (2014) Amino acid misincorporation in recombinant biopharmaceutical products. Current Opinion in Biotechnology 30C: 45–50.

Huang Y, O'Mara B, Conover M, et al. (2012) Glycine to glutamic acid misincorporation observed in a recombinant protein expressed by Escherichia coli cells. Protein Science 21 (5): 625–632.

Ibba M and Söll D (1999) Quality control mechanisms during translation. Science 286 (5446): 1893–1897.

Jilek A, Mollay C, Lohner K, et al. (2012) Substrate specificity of a peptidyl‐aminoacyl‐L/D‐isomerase from frog skin. Amino Acids 42 (5): 1757–1764.

Krzycki JA (2013) The path of lysine to pyrrolysine. Current Opinion in Chemical Biology 17 (4): 619–625.

Kyte J and Doolittle RF (1982) A simple method for displaying the hydropathic character of a protein. Journal of Molecular Biology 157: 105–132.

Lide DR (1991) Handbook of Chemistry and Physics, 72nd edn. Boca Raton, FL: CRC Press.

Lobanov AV, Turanov AA, Dl H, et al. (2010) Dual functions of codons in the genetic code. Critical Reviews in Biochemistry and Molecular Biology 45 (4): 257–265.

Lykke‐Andersen J and Bennett EJ (2014) Protecting the proteome: eukaryotic cotranslational quality control pathways. Journal of Cell Biology 204 (4): 467–476.

Moghal A, Mohler K and Ibba M (2014) Mistranslation of the genetic code. FEBS Letters pii: S0014‐5793(14)00662‐0.

Moini M, Rollman CM and France CA (2013) Dating human bone: is racemization dating species‐specific? Analytical Chemistry 85 (23): 11211–11215.

Rosenberger RF (1994) Translational errors during recombinant protein synthesis. Developments in Biological Standardization 83: 21–26.

Su X, Lin Z and Lin H (2013) The biosynthesis and biological function of diphthamide. Critical Reviews in Biochemistry and Molecular Biology 48 (6): 515–521.

Venne AS, Kollipara L and Zahedi RP (2014) The next level of complexity: crosstalk of posttranslational modifications. Proteomics 14 (4–5): 513–524.

Wen D, Vecchi MM, Gu S, et al. (2009) Discovery and investigation of misincorporation of serine at asparagine positions in recombinant proteins expressed in Chinese hamster ovary cells. Journal of Biological Chemistry 284 (47): 32686–32694.

Zhang Z, Shah B and Bondarenko PV (2013) G/U and certain wobble position mismatches as possible main causes of amino acid misincorporations. Biochemistry 52 (45): 8165–8176.

Further Reading

Barrett GC and Elmore DT (1998) Amino acids and Peptides. Cambridge, UK: Cambridge University Press.

Kinter M and Sherman NE (2000) Protein Sequencing and Identification Using Tandem Mass Spectrometry. Hoboken, NJ: John Wiley & Sons, Inc.

Nierhaus KH and Wilson D (eds) (2004) Protein Synthesis and Ribosome Structure: Translating the Genome. Weinheim: Wiley‐VCH.

Smith BJ (ed) (2002) Methods in Molecular Biology: Protein Sequencing Protocols, 2nd edn. Totowa, NJ: Humana Press.

Walsh G (2014) Proteins: Biotechnology and Biochemistry, 2nd edn. Hoboken, NJ: Wiley‐Blackwell.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Schininà, Eugenia M, and Barra, Donatella(Mar 2015) Protein Primary Structure. In: eLS. John Wiley & Sons Ltd, Chichester. [doi: 10.1002/9780470015902.a0001332]