Sequencing the Human Genome: Novel Insights into Its Structure and Function

Abstract

The availability of the human genome sequence has had an enormous impact on biomedical research. New discoveries emanating directly from the elucidation of the human genome sequence have included the unexpectedly low total number of genes, the existence of numerous transcribed but noncoding sequences and the multiplicity of low‐copy repeats and segmental duplications. The Human Genome Project has also spawned new research projects such as Encyclopedia of DNA Elements (ENCODE), Haplotype Map (HapMap) 1000 Genomes Project and the Human Variome Project which together are helping to reveal the remarkable complexity and diversity of the human genome. Finally, comparison of the human reference genome sequence with the genomes of other human species such as the Neanderthal and Denisovan or with other vertebrates has opened up numerous research avenues in evolutionary biology.

Key Concepts

  • The Human Genome Project has yielded a high‐quality sequence of the human genome which has been available to researchers for more than 10 years.
  • The 3 billion nucleotide reference assembly has been a prerequisite for the annotation of >22 800 different protein‐coding genes as well as thousands of RNA genes and transcripts of unknown function.
  • Some 16 000 pseudogenes have also been identified, some of which could encode novel proteins or regulatory RNAs.
  • Tens of millions of single‐nucleotide polymorphisms have been described as well as tens of thousands of structural variants.
  • The genome has been found to contain a multitude of structural features such as segmental duplications, transposon‐derived repeats, processed pseudogenes, simple sequence repeats and the blocks of tandemly repeated sequences at the centromeres and telomeres.
  • Now that the basic structure of the human genome has been elucidated, the focus has switched to trying to understand the function of its components.
  • The human genome contains noncoding ultraconserved elements which may represent long‐range enhancers of gene expression. The recognition of such elements is serving to change our conception of the nature of the gene.

Keywords: Human Genome Project (HGP); ENCODE project; HapMap project; nonprotein‐coding genes; gene definition; ultraconserved elements; segmental duplications

References

1000 Genomes Project Consortium, Auton A, Brooks LD, et al. (2015) A global reference for human genetic variation. Nature 526: 68–74.

Bejerano G, Pheasant M, Makunin I, et al. (2004) Ultraconserved elements in the human genome. Science 304: 1321–1325.

Chen J, Sun M, Hurst LD, Carmichael GG and Rowley JD (2005) Human antisense genes have unusually short introns: evidence for selection for rapid transcription. Trends in Genetics 21: 203–207.

Cheng J, Kapranov P, Drenkow J, et al. (2005) Transcriptional maps of 10 human chromosomes at 5‐nucleotide resolution. Science 308: 1149–1154.

Collins FS, Morgan M and Patrinos A (2003) The Human Genome Project: lessons from large‐scale biology. Science 300: 286–290.

Deloukas P, Matthews LH, Ashurst J, et al. (2001) The DNA sequence and comparative analysis of human chromosome 20. Nature 414: 865–871.

Deloukas P, Earthrowl ME, Grafham DV, et al. (2004) The DNA sequence and comparative analysis of human chromosome 10. Nature 429: 375–381.

Denoeud F, Kapranov P, Ucla C, et al. (2007) Prominent use of distal 5′ transcription start sites and discovery of a large number of additional exons in ENCODE regions. Genome Research 17: 746–759.

Dunham I, Shimizu N, Roe BA, et al. (1999) The DNA sequence of human chromosome 22. Nature 402: 489–495.

Dunham A, Matthews LH, Burton J, et al. (2004) The DNA sequence and analysis of human chromosome 13. Nature 428: 522–528.

ENCODE Project Consortium, Birney E, Stamatoyannopoulos JA, Dutta A, et al. (2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447: 799–816.

Gerstein MB, Bruce C, Rozowsky JS, et al. (2007) What is a gene, post‐ENCODE? History and updated definition. Genome Research 17: 669–681.

Gierman HJ, Indemans MH, Koster J, et al. (2007) Domain‐wide regulation of gene expression in the human genome. Genome Research 17: 1286–1295.

Gingeras TR (2007) Origin of phenotypes: genes and transcripts. Genome Research 17: 682–690.

Gregory SG, Barlow KF, McLay KE, et al. (2006) The DNA sequence and biological annotation of human chromosome 1. Nature 441: 315–321.

Grimwood J, Gordon LA, Olsen A, et al. (2004) The DNA sequence and biology of human chromosome 19. Nature 428: 529–535.

Hattori M, Fujiyama A, Taylor TD, et al. (2000) The DNA sequence of human chromosome 21. Nature 405: 311–319.

Heilig R, Eckenberg R, Petit JL, et al. (2003) The DNA sequence and analysis of human chromosome 14. Nature 421: 601–607.

Hillier LW, Fulton RS, Fulton LA, et al. (2003) The DNA sequence of human chromosome 7. Nature 424: 157–164.

Hillier LW, Graves TA, Fulton RS, et al. (2005) Generation and annotation of the DNA sequences of human chromosomes 2 and 4. Nature 434: 724–731.

Humphray SJ, Oliver K, Hunt AR, et al. (2004) DNA sequence and analysis of human chromosome 9. Nature 429: 369–374.

International HapMap Consortium (2005) A haplotype map of the human genome. Nature 437: 1299–1320.

International HapMap Consortium (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449: 851–861.

International Human Genome Sequencing Consortium (2004) Finishing the euchromatic sequence of the human genome. Nature 431: 931–945.

Jiang Z, Tang H, Ventura M, et al. (2007) Ancestral reconstruction of segmental duplications reveals punctuated cores of human genome evolution. Nature Genetics 39: 1361–1368.

Kapranov P, Cheng J, Dike S, et al. (2007a) RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science 316: 1484–1488.

Kapranov P, Willingham AT and Gingeras TR (2007b) Genome‐wide transcription and the implications for genomic organization. Nature Reviews. Genetics 8: 413–423.

Katzman S, Kern AD, Bejerano G, et al. (2007) Human genome ultraconserved elements are ultraselected. Science 317: 915.

Khaja R, Zhang J, MacDonald JR, et al. (2006) Genome assembly comparison identifies structural variants in the human genome. Nature Genetics 38: 1413–1418.

Lander ES, Linton LM, Birren B, et al. and International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature 409: 860–921.

Levy S, Sutton G, Ng PC, et al. (2007) The diploid genome sequence of an individual human. PLoS Biology 5: e254.

Little PFR (2005) Structure and function of the human genome. Genome Research 15: 1759–1766.

Martin J, Han C, Gordon LA, et al. (2004) The sequence and analysis of duplication‐rich human chromosome 16. Nature 432: 988–994.

McPherson JD, Marra M, Hillier L, et al. (2001) A physical map of the human genome. Nature 409: 934–941.

Mungall AJ, Palmer SA, Sims SK, et al. (2003) The DNA sequence and analysis of human chromosome 6. Nature 425: 805–811.

Murphy WJ, Pringle TH, Crider TA, Springer MS and Miller W (2007) Using genomic data to unravel the root of the placental mammal phylogeny. Genome Research 17: 413–421.

Muzny DM, Scherer SE, Kaul R, et al. (2006) The DNA sequence, annotation and analysis of human chromosome 3. Nature 440: 1194–1198.

Nusbaum C, Zody MC, Borowsky ML, et al. (2005) DNA sequence and analysis of human chromosome 18. Nature 437: 551–555.

Nusbaum C, Mikkelsen TS, Zody MC, et al. (2006) DNA sequence and analysis of human chromosome 8. Nature 439: 331–335.

Ovcharenko I, Loots GG, Nobrega MA, et al. (2005) Evolution and functional classification of vertebrate gene deserts. Genome Research 15: 137–145.

Pennacchio LA, Ahituv N, Moses AM, et al. (2006) In vivo enhancer analysis of human conserved non‐coding sequences. Nature 444: 499–502.

Redon R, Ishikawa S, Fitch KR, et al. (2006) Global variation in copy number in the human genome. Nature 444: 444–454.

Romiguier J, Ranwez V, Delsuc F, et al. (2013) Less is more in mammalian phylogenomics: AT‐rich genes minimize tree conflicts and unravel the root of placental mammals. Molecular Biology and Evolution 30: 2124–2144.

Ross MT, Grafham DV, Coffey AJ, et al. (2005) The DNA sequence of the human X chromosome. Nature 434: 325–337.

Sabeti PC, Varilly P, Fry B, et al. (2007) Genome‐wide detection and characterization of positive selection in human populations. Nature 449: 913–918.

Scherer SW, Cheung J, MacDonald JR, et al. (2003) Human chromosome 7: DNA sequence and biology. Science 300: 767–772.

Scherer SE, Muzny DM, Buhay CJ, et al. (2006) The finished DNA sequence of human chromosome 12. Nature 440: 346–351.

Schmutz J, Martin J, Terry A, et al. (2004a) The DNA sequence and comparative analysis of human chromosome 5. Nature 431: 268–274.

Schmutz J, Wheeler J, Grimwood J, et al. (2004b) Quality assessment of the human genome sequence. Nature 429: 365–368.

Sémon M and Duret L (2006) Evolutionary origin and maintenance of coexpressed gene clusters in mammals. Molecular Biology and Evolution 23: 1715–1723.

Shaw CJ and Lupski JR (2004) Implications of human genome architecture for rearrangement‐based disorders: the genomic basis of disease. Human Molecular Genetics 13: R57–R64.

She X, Jiang Z, Clark RA, et al. (2004) Shotgun sequence assembly and recent segmental duplications within the human genome. Nature 431: 927–930.

Sudmant PH, Rausch T, Gardner EJ, et al. (2015) An integrated map of structural variation in 2,504 human genomes. Nature 526: 75–81.

Taylor TD, Noguchi H, Totoki Y, et al. (2006) Human chromosome 11 DNA sequence and analysis including novel gene identification. Nature 440: 497–500.

Thomas DJ, Rosenbloom KR, Clawson H, et al. (2007) The ENCODE Project at UC Santa Cruz. Nucleic Acids Research 35: D663–D667.

Tress ML, Martelli PL, Frankish A, et al. (2007) The implications of alternative splicing in the ENCODE protein complement. Proceedings of the National Academy of Sciences of the United States of America 104: 5495–5500.

Venter JC, Adams MD, Myers EW, et al. (2001) The sequence of the human genome. Science 291: 1304–1351.

Yamashita T, Honda M, Takatori H, et al. (2004) Genome‐wide transcriptome mapping analysis identifies organ‐specific gene expression patterns along human chromosomes. Genomics 84: 867–875.

Zheng D, Frankish A, Baertsch R, et al. (2007) Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution. Genome Research 17: 839–851.

Zody MC, Garber M, Adams DJ, et al. (2006a) DNA sequence of human chromosome 17 and analysis of rearrangement in the human lineage. Nature 440: 1045–1049.

Zody MC, Garber M, Sharpe T, et al. (2006b) Analysis of the DNA sequence and duplication history of human chromosome 15. Nature 440: 671–675.

Further Reading

Clark TG, Andrew T, Cooper GM, et al. (2007) Functional constraint and small insertions and deletions in the ENCODE regions of the human genome. Genome Biology 8: R180.

Diehl AG and Boyle AP (2016) Deciphering ENCODE. Trends in Genetics 32: 238–249.

Guigo R, Flicek P, Abril JF, et al. (2006) EGASP: the human ENCODE Genome Annotation Assessment Project. Genome Biology 7 (suppl. 1): S21–S31.

Gvozdev VA, Kogan GL and Usakin LA (2005) The Y chromosome as a target for acquired and amplified genetic material in evolution. BioEssays 27: 1256–1262.

Imanishi T, Itoh T, Suzuki Y, et al. (2004) Integrative annotation of 21 037 human genes validated by full‐length cDNA clones. PLoS Biology 2: e162.

Istrail S, Sutton GG, Florea L, et al. (2004) Whole‐genome shotgun assembly and comparison of human genome assemblies. Proceedings of the National Academy of Sciences of the United States of America 101: 1916–1921.

Leem SH, Kouprina N, Grimwood J, et al. (2004) Closing the gaps on human chromosome 19 revealed genes with a high density of repetitive tandemly arrayed elements. Genome Research 14: 239–246.

Naidoo N, Pawitan Y, Soong R, Cooper DN and Ku CS (2011) Human genetics and genomics a decade after the release of the draft sequence of the human genome. Human Genomics 5: 577–622.

Schmutz J, Wheeler J, Grimwood J, et al. (2004c) Quality assessment of the human genome sequence. Nature 429: 365–368.

Sogayar MC, Camargo AA, Bettoni F, et al. (2004) A transcript finishing initiative for closing gaps in the human transcriptome. Genome Research 14: 1413–1423.

Skaletsky H, Kuroda‐Kawaguchi T, Minx PJ, et al. (2003) The male‐specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature 423: 825–837.

Takeda J, Suzuki Y, Nakao M, et al. (2006) Large‐scale identification and characterization of alternative splicing variants of human gene transcripts using 56 419 completely sequenced and manually annotated full‐length cDNAs. Nucleic Acids Research 34: 3917–3928.

Washietl S, Pedersen JS, Korbel JO, et al. (2007) Structured RNAs in the ENCODE selected regions of the human genome. Genome Research 17: 852–864.

Zilberman D (2007) The human promoter methylome. Nature Genetics 39: 442–443.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Kehrer‐Sawatzki, Hildegard, and Cooper, David N(Nov 2016) Sequencing the Human Genome: Novel Insights into Its Structure and Function. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0001899.pub3]