Genome, Proteome and the Quest for a Full Structure–Function Description of an Organism


Completion of the Human Genome Project provided the deoxyribonucleic acid (DNA) sequence of a human genome – a complete blueprint of our organism. Genomic information is processed into ribonucleic acid (RNA) and then into proteins, which carry out most cellular processes that sustain life. All of these molecules – DNA, RNA and proteins – have unique three‐dimensional shapes, and the atomic details of these shapes, and the inherent information they carry, define their function. Every cell, and ultimately the entire organism, may be viewed as a gigantic three‐dimensional jigsaw puzzle where all of the pieces have to fit together for the whole system to function. Elucidating the shapes of the molecular elements of the cell helps decipher the rules that dictate how they hierarchically form larger objects, such as molecular machines, organelles, cells and organs, how the shapes of individual molecules and their assemblies change on regulation, what changes cause disease and eventually, how they can be repaired or how they can be targeted by drugs.

Key Concepts:

  • Genes are coded in a four‐letter code by linear strings of DNA and define sequences and shapes of all downstream products (RNA and proteins).

  • DNA has a limited number of three‐dimensional shapes that it can take (three main forms currently known), but even small variations of these shapes are important for gene regulation and interactions between DNA and other molecules.

  • Products of DNA transcription (RNA) and translation (proteins) have complex but unique three‐dimensional shapes that are defined by their sequence and post‐translational modifications as well as interaction with other molecules.

  • Experimental determination of DNA, RNA and protein three‐dimensional structures by X‐ray crystallography, NMR spectroscopy and other techniques provide a molecular level understanding of the fundamental processes of life.

  • The number of basic shapes (or folds) that RNA and proteins can adopt are limited by steric constraints; however, these numbers are very large as compared to the limited number of DNA shapes – it is now known hundreds (RNA) or thousands (protein) of shapes and (probably) thousands of other shapes are still possible.

  • Regulation of DNA, RNA or protein function often involves changes to their structure and/or dynamics.

  • DNA, RNA and protein molecules form functional networks where neighbours in the network influence and regulate each other. Analyses and simulations of such networks is a subject of systems biology, which ultimately provides a perspective for predictive modelling of biological systems.

  • DNA, RNA and protein molecules can form higher order complexes and assemblies – a process driven by a mutual compatibility of their shapes and/or chemical composition. Such complexes define the many nodes in functional cellular networks.

  • Experimentally determined structures of DNA, RNA and protein complexes and assemblies provide a detailed picture of how regulation of cellular processes is carried out.

Keywords: genome; proteome; DNA; RNA; protein; 3D structures; ORF; pathway; proteomics

Figure 1.

Artistic rendering of a complete bacterial cell (Mycoplasma mycoides). Illustration by David S. Goodsell, The Scripps Research Institute ( Reprinted with permission from David S. Goodsell.

Figure 2.

(a) Three forms of DNA, from left to right, A‐DNA, D‐DNA and Z‐DNA. Reproduced from the Wikimedia Commons. © Richard Wheeler. (b) Structural diversity of RNA. Structures of a riboswitch (3vrs) and a U2/U6 RNA spliceosome complex (2lkr). Over 900 different RNA structures have been determined experimentally. (c) A small sample of the diverse varieties of protein structures. Clockwise from the top left: haemoglobin structure (first protein structure determined experimentally by X‐ray crystallography), TIM barrel (the most prevalent protein fold), CheY protein (the second most prevalent protein fold), greek‐key barrel of plastocyanin (popular fold present in human antibodies and many cell–cell interaction proteins). Figures not to scale.

Figure 3.

Largest macromolecular structures determined at atomic resolution. (a) One of a number of ribosome structures determined by X‐ray crystallography (example shown, ribbon representation of a yeast ribosome structure taken from the RCSB PDB ( based on the PDB 3u5c coordinates determined in Yusopov lab (Ben‐Shem et al., ). (b) Structure of a nuclear pore complex, determined by assembling individual atomic models using a number of constraints derived from several types of experimental methods by Sali A and Rout MP; Adapted from Alber et al.. © Nature Publishing Group. Reproduced with permission from Sali A, University of California San Francisco, and Rout MP, The Rockefeller University.

Figure 4.

Structural view of a functional molecular network. Here, a model of a core metabolic network of a thermophilic bacterium, Thermotoga maritima is shown. A systems biology model of metabolism allows simulation of bacteria behaviour in multiple conditions, whereas experimental structure determination and modelling provides three‐dimensional models of all of the nodes (proteins) in the network. Adapted from an original figure by Zhang et al.. © AAAS.

Figure 5.

Three‐dimensional structures of many players in other networks have been determined. Here, examples from the still not fully understood network of interactions between pathogens and commensal microbes on one side and human immune system on the other have been shown. Clockwise from the upper left, structures of the anthrax lethal factor (PDB code 1j7n), a main virulence factor of Bacillus anthrax; innate immunity receptor TLR5 in complex with bacterial flagellin (PDB code 3v47), and several secreted proteins from human microbiome solved by the Joint Center for Structural Genomics. The process of cataloguing secreted proteins from human commensal bacterial flora is still in its beginning, but NIH PSI centres have solved structures of more than 200 such proteins.



Alber F, Dokudovskaya S, Veenhoff LM et al. (2007) The molecular architecture of the nuclear pore complex. Nature 450(7170): 695–701.

Alberts B (2002) Molecular Biology of the Cell, 4th edn. New York: Garland Science.

Altschul SF, Gish W, Miller W et al. (1990) Basic local alignment search tool. Journal of Molecular Biology 215(3): 403–410.

Ben‐Shem A, Garreau de Loubresse N, Melnikov S et al. (2011) The structure of the eukaryotic ribosome at 3.0 Å resolution. Science 334(6062): 1524–1529.

Berman HM, Kleywegt GJ, Nakamura H et al. (2012) The Protein Data Bank at 40: reflecting on the past to prepare for the future. Structure 20(3): 391–396.

Campbell ID (2002) Timeline: the march of structural biology. Nature Reviews. Molecular Cell Biology 3(5): 377–381.

Delcher AL, Harmon D, Kasif S et al. (1999) Improved microbial gene identification with GLIMMER. Nucleic Acids Research 27(23): 4636–4641.

Dickerson RE, Drew HR, Conner BN et al. (1982) The anatomy of A‐, B‐, and Z‐DNA. Science 216(4545): 475–485.

ENCODE Project Consortium (2004) The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306(5696): 636–640.

Krebs JE, Goldstein ES and Kilpatrick ST (2013) Lewin's Genes XI, 11th edn. Burlington, MA: Jones & Bartlett Learning.

Lander ES, Linton LM, Birren B et al. (2001) Initial sequencing and analysis of the human genome. Nature 409(6822): 860–921.

Lieberman‐Aiden E, van Berkum NL, Williams L et al. (2009) Comprehensive mapping of long‐range interactions reveals folding principles of the human genome. Science 326(5950): 289–293.

Lipman DJ and Pearson WR (1985) Rapid and sensitive protein similarity searches. Science 227(4693): 1435–1441.

Lo Conte L, Ailey B, Hubbard TJ et al. (2000) SCOP: a structural classification of proteins database. Nucleic Acids Research 28(1): 257–259.

Lucic V, Forster F and Baumeister W (2005) Structural studies by electron tomography: from cells to molecules. Annual Review of Biochemistry 74: 833–865.

Rhodes G (2000) Crystallography Made Crystal Clear:A Guide for Users of Macromolecular Models, 2nd edn. San Diego, CA. Academic Press.

Ridley M (2006) Genome: The Autobiography of a Species in 23 Chapters. New York, NY: Harper Perennial.

Rohs R, West SM, Sosinsky A et al. (2009) The role of DNA shape in protein‐DNA recognition. Nature 461(7268): 1248–1253.

Sali A, Glaeser R, Earnest T et al. (2003) From words to literature in structural proteomics. Nature 422(6928): 216–225.

Venter JC, Adams MD, Myers EW et al. (2001) The sequence of the human genome. Science 291(5507): 1304–1351.

Ward AB, Sali A and Wilson IA (2013) Integrative Structural Biology. Science 339(6122): 913–915.

Watson JD and Crick FH (1953) Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature 171(4356): 737–738.

Wüthrich K (1990) Protein structure determination in solution by NMR spectroscopy. Journal of Biological Chemistry 265(36): 22059–22062.

Zhang Y, Thiele I, Weekes D et al. (2009) Three‐dimensional structural view of the central metabolic network of Thermotoga maritima. Science 325: 1544–1549.

Further Reading

Berman HM, Westbrook J, Feng Z et al. (2000) The Protein Data Bank. Nucleic Acids Research 28: 235–242.

Gu J and Bourne PE (2009) Structural Bioinformatics, 2nd edn. Hoboken: Wiley‐Blackwell.

Web Links

John‐Marc Chandonia1 and Sung‐Hou Kim, Structural proteomics of minimal organisms: Conservation of protein fold usage and evolutionary implications.

Proteopedia – A scientific ‘wiki’ bridging the rift between three‐dimensional structure and function of biomacromolecules. Genome Biology 2008, 9:R121. PMID:18673581.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Godzik, Adam, and Wilson, Ian A(Mar 2013) Genome, Proteome and the Quest for a Full Structure–Function Description of an Organism. In: eLS. John Wiley & Sons Ltd, Chichester. [doi: 10.1002/9780470015902.a0003024.pub2]