Whole Genome Resequencing and 1000 Genomes Project

Abstract

The recent advances in sequencing technologies have enabled the whole human genome to be sequenced within weeks. To date, several human diploid genomes have been sequenced and the number of genomes being sequenced is expected to increase in the years to come. In fact, a 3‐year international collaborative project, the 1000 Genomes Project, was initiated in 2008 to sequence at least 1000 individual genomes from different populations around the world. The aim is to create the most detailed and comprehensive map of genetic variations in the human genome for future disease‐association studies and biomedical research. While waiting for this ambitious project to be completed, several whole genome sequencing studies have already provided some exciting results, where hundreds of thousands of new SNPs and short indels have been identified. In addition, these studies also address many important questions and issues in the experimental design and data analysis of whole genome sequencing.

Key concepts:

  • The arrival of next generation sequencing (NGS) and third generation sequencing technologies has empowered the sequencing of the whole human genome to be completed within weeks.

  • The first human whole genome sequencing (WGS) study using a next generation sequencer was completed in 2008, which marked the beginning of a new era in personalized genome sequencing.

  • To date, several WGS studies have been done using NGS technologies.

  • The NGS technologies are Roche® 454 Life Science Genome Sequencer FLX (GS FLX), Illumina® Genome Analyzer (GA) and Applied Biosystems® (ABI) Supported Oligonucleotide Ligation Detection System (SOLiD).

  • The WGS studies have clearly demonstrated the feasibility of using all the NGS and third generation sequencing technologies to decode the DNA sequence of human genome efficiently and at an affordable price per genome.

  • In addition, these studies have also addressed many important questions and issues surrounding the experimental design and data analysis in whole genome sequencing.

  • The more significant finding from the WGS studies is that they have conclusively revealed the richness of genetic variations in the human genome. The genetic variations in the human genome are more abundant than previously expected.

  • Several thousands of structural variations are found in all the WGS studies, and the studies also identified hundreds of thousands of indels.

  • Personalized genome sequencing is able to provide full DNA sequences and identify the enormous number of genetic variations in the genome. However, personalized medicine aims to predict individual susceptibility risks to various diseases and responses to drug therapies using the genetic variation information.

  • To achieve personalized medicine, the first steps are to detect and validate all the genetic variations in the human genome in population‐based studies, and catalogue them properly in databases, so they can be used as the genetic markers for future disease association studies.

Keywords: next generation sequencing technologies; whole genome sequencing; 1000 Genomes Project; SNPs; indels; structural variations

Figure 1.

The developments of sequencing technologies and whole human genome resequencing studies.

close

References

Ahn SM, Kim TH, Lee S et al. (2009) The first Korean genome sequence and analysis: full genome sequencing for a socio‐ethnic group. Genome Research 19: 1622–1629.

Bentley DR, Balasubramanian S, Swerdlow HP et al. (2008) Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456: 53–59.

Chen K, Wallis JW, McLellan MD et al. (2009) BreakDancer: an algorithm for high‐resolution mapping of genomic structural variation. Nature Methods 6: 677–681.

Cooper GM, Zerr T, Kidd JM et al. (2008) Systematic assessment of copy number variant detection via genome‐wide SNP genotyping. Nature Genetics 40: 1199–1203.

Ding L, Getz G, Wheeler DA et al. (2008) Somatic mutations affect key pathways in lung adenocarcinoma. Nature 455: 1069–1075.

Easton DF and Eeles RA (2008) Genome‐wide association studies in cancer. Human Molecular Genetics 17: R109–R115.

ENCODE Project Consortium (2004) The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306: 636–640.

ENCODE Project Consortium (2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447: 799–816.

Estivill X and Armengol L (2007) Copy number variants and common disorders: filling the gaps and exploring complexity in genome‐wide association studies. PLoS Genetics 3: 1787–1799.

Greenman C, Stephens P, Smith R et al. (2007) Patterns of somatic mutation in human cancer genomes. Nature 446: 153–158.

Gupta PK (2008) Single‐molecule DNA sequencing technologies for future genomics research. Trends in Biotechnology 26: 602–611.

Hinds DA, Kloek AP, Jen M et al. (2006) Common deletions and SNPs are in linkage disequilibrium in the human genome. Nature Genetics 38: 82–85.

Hunter DJ and Kraft P (2008) Drinking from the fire hose: statistical issues in genomewide association studies. New England Journal of Medicine 357: 436–439.

Iafrate AJ, Feuk L, Rivera MN et al. (2004) Detection of large‐scale variation in the human genome. Nature Genetics 36: 949–951.

International HapMap Consortium (2003) The International HapMap Project. Nature 426: 789–796.

International HapMap Consortium (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449: 851–861.

International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature 409: 860–921.

International Human Genome Sequencing Consortium (2004) Finishing the euchromatic sequence of the human genome. Nature 431: 931–945.

Kaiser J (2008) A plan to capture human diversity in 1000 genomes. Science 319: 395.

Kim JI, Ju YS, Park H et al. (2009) A highly annotated whole‐genome sequence of a Korean individual. Nature 460: 1011–1015.

Lassmann T, Hayashizaki Y and Daub CO (2009) TagDust: a program to eliminate artifacts from next generation sequencing data. Bioinformatics 25: 2839–2840.

Levy S, Sutton G, Ng PC et al. (2007) The diploid genome sequence of an individual human. PLoS Biology 5: e254.

Ley TJ, Mardis ER, Ding L et al. (2008) DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature 456: 66–72.

Li R, Yu C, Li Y et al. (2009a) SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25: 1966–1967.

Li R, Li Y, Fang X et al. (2009b) SNP detection for massively parallel whole‐genome resequencing. Genome Research 19: 1124–1132.

Maher B (2009) Exome sequencing takes centre stage in cancer profiling. Nature 459: 146–147.

Mardis ER (2006) Anticipating the 1,000 dollar genome. Genome Biology 7: 112.

Mardis ER (2008) Next‐generation DNA sequencing methods. Annual Review of Genomics and Human Genetics 9: 387–402.

Martinez‐Alcantara A, Ballesteros E, Feng C et al. (2009) PIQA: pipeline for Illumina G1 genome analyzer data quality assessment. Bioinformatics 25: 2438–2349.

McCarroll SA, Huett A, Kuballa P et al. (2008b) Deletion polymorphism upstream of IRGM associated with altered IRGM expression and Crohn's disease. Nature Genetics 40: 1107–1112.

McCarroll SA, Kuruvilla FG, Korn JM et al. (2008a) Integrated detection and population‐genetic analysis of SNPs and copy number variation. Nature Genetics 40: 1166–1174.

McKernan KJ, Peckham HE, Costa GL et al. (2009) Sequence and structural variation in a human genome uncovered by short‐read, massively parallel ligation sequencing using two‐base encoding. Genome Research 19: 1527–1241.

Morozova O and Marra MA (2008) Applications of next‐generation sequencing technologies in functional genomics. Genomics 92: 255–264.

Nejentsev S, Walker N, Riches D et al. (2009) Rare variants of IFIH1, a gene implicated in antiviral responses, protect against type 1 diabetes. Science 324: 387–389.

Pennisi E (2007) Working the (gene count) numbers: finally, a firm answer? Science 316: 1113.

Prickett TD, Agrawal NS, Wei X et al. (2009) Analysis of the tyrosine kinome in melanoma reveals recurrent mutations in ERBB4. Nature Genetics 41: 1127–1132.

Pushkarev D, Neff NF and Quake SR (2009) Single‐molecule sequencing of an individual human genome. Nature Biotechnology 27: 847–852.

Rothberg JM and Leamon JH (2008) The development and impact of 454 sequencing. Nature Biotechnology 26: 1117–1124.

Scherer SW, Lee C, Birney E et al. (2007) Challenges and standards in integrating surveys of structural variation. Nature Genetics 39: S7–S15.

Sebat J, Lakshmi B, Troge J et al. (2004) Large‐scale copy number polymorphism in the human genome. Science 305: 525–528.

Shendure J and Ji H (2008) Next‐generation DNA sequencing. Nature Biotechnology 26: 1135–1145.

Summerer D (2009) Enabling technologies of genomic‐scale sequence enrichment for targeted high‐throughput sequencing. Genomics 94: 363–368.

Venter JC, Adams MD, Myers EW et al. (2001) The sequence of the human genome. Science 291: 1304–1351.

Wain LV, Armour JA and Tobin MD (2009) Genomic copy number variation, human health, and disease. Lancet 374: 340–350.

Wang J, Wang W, Li R et al. (2008) The diploid genome sequence of an Asian individual. Nature 456: 60–65.

Wheeler DA, Srinivasan M, Egholm M et al. (2008) The complete genome of an individual by massively parallel DNA sequencing. Nature 452: 872–876.

Wood LD, Parsons DW, Jones S et al. (2007) The genomic landscapes of human breast and colorectal cancers. Science 318: 1108–1113.

Yoon S, Xuan Z, Makarov V et al. (2009) Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Research 19: 1586–1592.

Further Reading

Frazer KA, Murray SS, Schork NJ et al. (2009) Human genetic variation and its contribution to complex traits. Nature Reviews. Genetics 10: 241–251.

Mardis ER and Wilson RK (2009) Cancer genome sequencing: a review. Human Molecular Genetics 18: R163–R168.

Tucker T, Marra M and Friedman JM (2009) Massively parallel sequencing: the next big thing in genetic medicine. American Journal of Human Genetics 85: 142–154.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Chee‐Seng, KU, En Yun, Loy, Yudi, Pawitan, and Kee‐Seng, Chia(Apr 2010) Whole Genome Resequencing and 1000 Genomes Project. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0022507]