Development and Role of the Human Reference Sequence in Personal Genomics

Abstract

Genome maps, like geographical maps, need to be interpreted carefully. Although maps are essential to exploration and navigation they cannot be completely accurate. Humans have been mapping the world for several millennia, but genomes have been mapped and explored for just a single century with the greatest advancements in making a sequence reference map of the human genome possible in the past 30 years. After the deoxyribonucleic acid (DNA) sequence of the human genome was completed in 2003, the reference sequence underwent several improvements and today provides the underlying comparative resource for a multitude genetic assays and biochemical measurements. However, the ability to simplify genetic analysis through a single comprehensive map remains an elusive goal.

Key Concepts:

  • Maps are incomplete and contain errors.

  • DNA sequence data are interpreted through biochemical experiments or comparisons to other DNA sequences.

  • A reference genome sequence is a map that provides the essential coordinate system for annotating the functional regions of the genome and comparing differences between individuals' genomes.

  • The reference genome sequence is always product of understanding at a set point in time and continues to evolve.

  • DNA sequences evolve through duplication and mutation and, as a result, contain many repeated sequences of different sizes, which complicates data analysis.

  • DNA sequence variation happens on large and small scales with respect to the lengths of the DNA differences to include single base changes, insertions, deletions, duplications and rearrangements.

  • DNA sequences within the human population undergo continual change and vary highly between individuals.

  • The current reference genome sequence is a collection of sequences, an assembly, that include sequences assembled into chromosomes, sequences that are part of structurally complex regions that cannot be assembled, patches (fixes) that cannot be included in the primary sequence, and high variability sequences that are organised into alternate loci.

  • Genetic analysis is error prone and the data require validation because the methods for collecting DNA sequences create artifacts and the reference sequence used for comparative analyses is incomplete.

Keywords: DNA sequencing; human genome; genomics; reference sequence; reference assembly; next generation sequencing; massively parallel

Figure 1.

Genomics Genealogy. Genomic mapping evolved from defining genomic distances as statistical values to base positions through a series of advancements over a 70‐year period. Modern DNA sequencing provided high‐resolution maps that cannot be utilised in a multitude of NGS based experiments. In one branch, new reference genomes continue to be developed through de novo DNA sequencing. In this branch DNA and RNA can also be isolated from different environments to sample the genes and organisms that may be present. The other two branches rely on a reference sequence to compare sequence differences between individuals or cells within tissues (Variation Assays), or measure genetic controls and biochemical processes (Functional Genomics) with the ultimate goal of linking genotype to phenotype in highly detailed ways. Numerous variation and functional genomics assays have been described (Mason et al., ; Shendure and Lieberman Aiden, ).

close

References

Alkan C, Sajjadian S and Eichler EE (2010) Limitations of next‐generation genome sequence assembly. Nature Methods 8(1): 61–65.

Aronson SJ, Clark EH, Varugheese M et al. (2012) Communicating new knowledge on previously reported genetic variants. Genetic Medicine 14(8): 713–719.

Campbell PJ, Stephens PJ, Pleasance ED et al. (2008) Identification of somatically acquired rearrangements in cancer using genome‐wide massively parallel paired‐end sequencing. Nature Genetics 40(6): 722–729.

Christensen KD and Green RC (2013) How could disclosing incidental information from whole‐genome sequencing affect patient behavior? Personalized Medicine 10(4) 377–386.

Church DM, Schneider VA, Graves T et al. (2011) Modernizing reference genome assemblies. PLoS Biology 9(7): e1001091.

Dennis MY, Nuttle X, Sudmant PH et al. (2012) Evolution of human‐specific neural SRGAP2 genes by incomplete segmental duplication. Cell 149(4): 912–922.

Dolgin E (2010) Personalized investigation. Nature Medicine 16(9): 953–955.

Drosopoulos WC, Kosiyatrakul ST, Yan Z, Calderano SG and Schildkraut CL (2012) Human telomeres replicate using chromosome‐specific, rather than universal, replication programs. Journal of Cell Biology 197(2): 253–266.

Dunham I, Kundaje A, Aldred SF et al. (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414): 57–74.

Eid J, Fehr A, Gray J et al. (2009) Real‐time DNA sequencing from single polymerase molecules. Science 323(5910): 133–138.

Eyre‐Walker A and Keightley PD (1999) High genomic deleterious mutation rates in hominids. Nature 397(6717): 344–347.

Feuk L, Carson AR and Scherer SW (2006) Structural variation in the human genome. Nature Reviews Genetics 7(2): 85–97.

Floutsakou I, Agrawal S, Nguyen TT et al. (2013) The shared genomic architecture of human nucleolar organizer regions. Genome Research 23(12): 2003–2012.

Frampton GM, Fichtenholtz A, Otto GA et al. (2013) Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing. Nature Biotechnology 31(11): 1023–1031.

Gole J, Gore A, Richards A et al. (2013) Massively parallel polymerase cloning and genome sequencing of single cells using nanoliter microwells. Nature Biotechnology 31(12): 1126–1132.

Green RC, Berg JS, Grody WW et al. (2013) ACMG recommendations for reporting of incidental findings in clinical exome and genome sequencing. Genetic Medicine 15(7): 565–574.

Greenman C, Stephens P, Smith R et al. (2007) Patterns of somatic mutation in human cancer genomes. Nature 446(7132): 153–158.

Harper AR and Topol EJ (2012) Pharmacogenomics in clinical practice and drug development. Nature Biotechnology 30(11): 1117–1124.

Huang L, Popic V and Batzoglou S (2013) Short read alignment with populations of genomes. Bioinformatics (Oxford, England) 29(13): i361–i370.

Kahvejian A, Quackenbush J and Thompson JF (2008) What would you do if you could sequence everything? Nature Biotechnology 26(10): 1125–1133.

Kidd JM, Gravel S, Byrnes J et al. (2012) Population genetic inference from personal genome data: impact of ancestry and admixture on human genomic variation. American Journal of Human Genetics 91(4): 660–671.

Kimmel SE, French B, Kasner SE et al. (2013) A pharmacogenetic versus a clinical algorithm for warfarin dosing. New England Journal of Medicine 369(24): 2283–2293.

Korbel JO, Urban AE, Affourtit JP et al. (2007) Paired‐end mapping reveals extensive structural variation in the human genome. Science 318(5849): 420–426.

Ledford H (2010) Life hackers. Nature 467(7316): 650–652.

Lee H and Schatz MC (2012) Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score. Bioinformatics (Oxford, England) 28(16): 2097–2105.

Li H, Ruan J and Durbin R (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research 18(11): 1851–1858.

MacArthur DG and Tyler‐Smith C (2010) Loss‐of‐function variants in the genomes of healthy humans. Human Molecular Genetics 19(R2): R125–R130.

MacDonald JR, Ziman R, Yuen RK, Feuk L and Scherer SW (2013) The database of genomic variants: a curated collection of structural variation in the human genome. Nucleic Acids Research 42(1): D986–D992.

Mardis ER, Ding L, Dooling DJ et al. (2009) Recurring mutations found by sequencing an acute myeloid leukemia genome. New England Journal of Medicine 361(11): 1058–1066.

Marques‐Bonet T and Eichler EE (2009) The evolution of human segmental duplications and the core duplicon hypothesis. Cold Spring Harbor Symposia on Quantitative Biology 74: 355–362.

Mason CE, Porter SG and Smith TM (2014) Characterizing multi‐omic data in systems biology. Advances in Experimental Medicine and Biology 799: 15–38.

Metzker ML (2010) Sequencing technologies – the next generation. Nature Reviews Genetics 11(1): 31–46.

Nagarajan N and Pop M (2013) Sequence assembly demystified. Nature Reviews Genetics 14(3): 157–167.

Navin N, Kendall J, Troge J et al. (2011) Tumour evolution inferred by single‐cell sequencing. Nature 472(7341): 90–94.

Nelson MR, Wegmann D, Ehm MG et al. (2012) An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337(6090): 100–104.

Newburger DE, Kashef‐Haghighi D, Weng Z et al. (2013) Genome evolution during progression to breast cancer. Genome Research 23(7): 1097–1108.

O'Connor TD, Kiezun A, Bamshad M et al. (2013) Fine‐scale patterns of population stratification confound rare variant association tests. PLoS One 8(7): e65834.

Pleasance ED, Cheetham RK, Stephens PJ et al. (2010) A comprehensive catalogue of somatic mutations from a human cancer genome. Nature 463(7278): 191–196.

Salzberg SL and Pertea M (2010) Do‐it‐yourself genetic testing. Genome Biology 11(10): 404.

Saunders CJ, Miller NA, Soden SE et al. (2012) Rapid whole‐genome sequencing for genetic disease diagnosis in neonatal intensive care units. Science Translational Medicine 4(154): 154ra135.

Sharon D, Tilgner H, Grubert F and Snyder M (2013) A single‐molecule long‐read survey of the human transcriptome. Nature Biotechnology 31(11): 1009–1014.

Shastry BS (2005) Pharmacogenetics and the concept of individualized medicine. Pharmacogenomics Journal 6(1): 16–21.

Shendure J and Lieberman Aiden E (2012) The expanding scope of DNA sequencing. Nature Biotechnology 30(11): 1084–1094.

Tennessen JA, Bigham AW, O'Connor TD et al. (2012) Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337(6090): 64–69.

The 1000 Genomes Project Consortium (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491: 56–65.

Treangen TJ and Salzberg SL (2011) Repetitive DNA and next‐generation sequencing: computational challenges and solutions. Nature Reviews Genetics 13(1): 36–46.

Varga Z, Sinn P, Fritzsche F et al. (2013) Comparison of EndoPredict and Oncotype DX test results in hormone receptor positive invasive breast cancer. PLoS One 8(3): e58483.

Worthey EA, Mayer AN, Syverson GD et al. (2011) Making a definitive diagnosis: successful clinical application of whole exome sequencing in a child with intractable inflammatory bowel disease. Genetic Medicine 13(3): 255–262.

Yang Y, Muzny DM, Reid JG et al. (2013) Clinical whole‐exome sequencing for the diagnosis of mendelian disorders. New England Journal of Medicine 369(16): 1502–1511.

Further Reading

Cook‐Deegan RM (1994) The Gene Wars: Science, Politics, and the Human Genome. New York, NY: WW Norton & Company.

Hutchison CA (2007) DNA sequencing: bench to bedside and beyond. Nucleic Acids Research 35(18): 6227–6237.

Kevles DJ and Hood LE (1993) The Code of Codes: Scientific and Social Issues in the Human Genome Project. Cambridge, MA: Harvard University Press.

Shreeve J (2007) The Genome War : How Craig Venter Tried to Capture the Code of Life and Save the World. New York, NY: Random House Digital, Inc.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Smith, Todd M, and Porter, Sandra G(Jun 2014) Development and Role of the Human Reference Sequence in Personal Genomics. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0025334]