Characterising Structural Variation by Means of Next‐Generation Sequencing

Abstract

A new era of copy number variants (CNVs) discovery began when two separate studies, published concurrently in 2004, identified several hundred deletions and duplications in the human genome. Over the past several years, most of the CNV data were generated by microarrays. These methods have several shortcomings, such as the inability to detect copy‐neutral variants (e.g. inversions and translocations), limited sensitivity to detect smaller CNVs and poor resolution in determining CNV breakpoints especially with lower resolution microarrays. A paradigm shift in the discovery of copy‐neutral variants was attributed to the development of a sequencing‐based method known as paired‐end mapping. This method was first demonstrated to be powerful in detecting structural variants using next‐generation sequencing technologies in 2007. Further studies have also leveraged an important feature of sequencing data, where several hundred million short sequence reads are produced by next‐generation sequencers, to detect CNVs based on the abundance or density of the sequence reads aligned to a reference genome. This approach is known as depth‐of‐coverage. These emerging sequencing‐based methods will continue playing an important role in the discovery of structural variants until de novo genome assembly becomes more feasible.

Key Concepts:

  • A new era of copy number variants (CNVs) discovery began when two separate studies, published concurrently in 2004, identified several hundred deletions and duplications in the human genome.

  • Both the sample size and the resolution of microarray are critical factors in determining the discovery of less common and smaller CNVs.

  • ‘Human Genetic Variation’ was recognised as the ‘Breakthrough of The Year’ in 2007 by the journal Science.

  • Other types of chromosomal rearrangements, particularly inversions and balanced translocations, have received relatively less attention.

  • Inversions and translocations are also known as ‘copy‐neutral variants’ or ‘balanced chromosomal rearrangements’ and do not involve changes in copies number.

  • Collectively these copy number and copy‐neutral variants are broadly classified as ‘structural variants’.

  • The paradigm shift in the discovery of copy‐neutral variants was attributed to the development of the paired‐end mapping (PEM) method and concurrent advances in next‐generation sequencing technologies.

  • Further studies have also leveraged on an important feature of next‐generation sequencing data where several hundred million short sequence reads are produced to detect CNVs based on the density of the sequence reads aligning to the reference genome, and this approach is known as depth‐of‐coverage (DOC).

  • Although the PEM and DOC methods have overcome the major shortcomings of microarrays in detecting structural variants, these methods have their own weaknesses.

  • The emerging sequencing‐based methods (PEM and DOC) will continue to play a role in the discovery of structural variants until de novo genome assembly is more feasible

Keywords: next‐generation sequencing technologies; structural variants; paired‐end mapping; depth‐of‐coverage; mate‐pair mapping

References

1000 Genomes Project Consortium, Durbin RM, Abecasis GR et al. (2010) A map of human genome variation from population‐scale sequencing. Nature 467: 1061–1073.

Ahn SM, Kim TH, Lee S et al. (2009) The first Korean genome sequence and analysis: full genome sequencing for a socio‐ethnic group. Genome Research 19: 1622–1629.

Bentley DR, Balasubramanian S, Swerdlow HP et al. (2008) Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456: 53–59.

Branton D, Deamer DW, Marziali A et al. (2008) The potential and challenges of nanopore sequencing. Nature Biotechnology 26: 1146–1153.

Campbell PJ, Stephens PJ, Pleasance ED et al. (2008) Identification of somatically acquired rearrangements in cancer using genome‐wide massively parallel paired‐end sequencing. Nature Genetics 40: 722–729.

Carter NP (2007) Methods and strategies for analyzing copy number variation using DNA microarrays. Nature Genetics 39: S16–S21.

Conrad DF and Hurles ME (2007) The population genetics of structural variation. Nature Genetics 39: S30–S36.

Conrad DF, Pinto D, Redon R et al. (2010) Origins and functional impact of copy number variation in the human genome. Nature 464: 704–712.

Cooper GM, Zerr T, Kidd JM et al. (2008) Systematic assessment of copy number variant detection via genome‐wide SNP genotyping. Nature Genetics 40: 1199–1203.

Drmanac R, Sparks AB, Callow MJ et al. (2010) Human genome sequencing using unchained base reads on self‐assembling DNA nanoarrays. Science 327: 78–81.

Eichler EE, Nickerson DA, Altshuler D et al. (2007) Completing the map of human genetic variation. Nature 447: 161–165.

Estivill X and Armengol L (2007) Copy number variants and common disorders: filling the gaps and exploring complexity in genome‐wide association studies. PLoS Genetics 3: 1787–1799.

Feuk L (2010) Inversion variants in the human genome: role in disease and genome architecture. Genome Medicine 2: 11.

Feuk L, Carson AR and Scherer SW (2006) Structural variation in the human genome. Nature Reviews. Genetics 7: 85–97.

Feuk L, MacDonald JR, Tang T et al. (2005) Discovery of human inversion polymorphisms by comparative analysis of human and chimpanzee DNA sequence assemblies. PLoS Genetics 1: e56.

Gupta PK (2008) Single‐molecule DNA sequencing technologies for future genomics research. Trends in Biotechnology 26: 602–611.

Harismendy O, Ng PC, Strausberg RL et al. (2009) Evaluation of next generation sequencing platforms for population targeted sequencing studies. Genome Biology 10: R32.

Hastings PJ, Lupski JR, Rosenberg SM and Ira G (2009) Mechanisms of change in gene copy number. Nature Reviews. Genetics 10: 551–564.

Iafrate AJ, Feuk L, Rivera MN et al. (2004) Detection of large‐scale variation in the human genome. Nature Genetics 36: 949–951.

Khaja R, Zhang J, MacDonald JR et al. (2006) Genome assembly comparison identifies structural variants in the human genome. Nature Genetics 38: 1413–1418.

Kidd JM, Cooper GM, Donahue WF et al. (2008) Mapping and sequencing of structural variation from eight human genomes. Nature 453: 56–64.

Koboldt DC, Ding L, Mardis ER et al. (2010) Challenges of sequencing human genomes. Briefings in Bioinformatics 11: 484–498.

Korbel JO, Urban AE, Affourtit JP et al. (2007) Paired‐end mapping reveals extensive structural variation in the human genome. Science 318: 420–426.

Korn JM, Kuruvilla FG, McCarroll SA et al. (2008) Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nature Genetics 40: 1253–1260.

Ku CS, Pawitan Y, Sim X et al. (2010) Genomic copy number variations in three Southeast Asian populations. Human Mutation 31: 851–857.

Lee C, Iafrate AJ and Brothman AR (2007) Copy number variations and clinical cytogenetic diagnosis of constitutional disorders. Nature Genetics 39: S48–S54.

Levy S, Sutton G, Ng PC et al. (2007) The diploid genome sequence of an individual human. PLoS Biology 5: e254.

Li R, Zhu H, Ruan J et al. (2010a) De novo assembly of human genomes with massively parallel short read sequencing. Genome Research 20: 265–272.

Li Y, Hu Y, Bolund L and Wang J (2010b) State of the art de novo assembly of human genomes from massively parallel sequencing data. Human Genomics 4: 271–277.

Li Y and Wang J (2009) Faster human genome sequencing. Nature Biotechnology 27: 820–821.

Mardis ER (2008) Next‐generation DNA sequencing methods. Annual Review of Genomics and Human Genetics 9: 387–402.

Matsuzaki H, Wang PH, Hu J et al. (2009) High resolution discovery and confirmation of copy number variants in 90 Yoruba Nigerians. Genome Biology 10: R125.

McCarroll SA and Altshuler DM (2007) Copy‐number variation and association studies of human disease. Nature Genetics 39: S37–S42.

McCarroll SA, Kuruvilla FG, Korn JM et al. (2008) Integrated detection and population‐genetic analysis of SNPs and copy number variation. Nature Genetics 40: 1166–1174.

Medvedev P, Fiume M, Dzamba M et al. (2010) Detecting copy number variation with mated short reads. Genome Research September 21 [Epub ahead of print].

Medvedev P, Stanciu M and Brudno M (2009) Computational methods for discovering structural variation with next‐generation sequencing. Nature Methods 6: S13–S20.

Metzker ML (2010) Sequencing technologies – the next generation. Nature Reviews. Genetics 11: 31–46.

Meyerson M, Gabriel S and Getz G (2010) Advances in understanding cancer genomes through second‐generation sequencing. Nature Reviews. Genetics 11: 685–696.

Pang AW, MacDonald JR, Pinto D et al. (2010) Towards a comprehensive structural variation map of an individual human genome. Genome Biology 11: R52.

Park H, Kim JI, Ju YS et al. (2010) Discovery of common Asian copy number variants using integrated high‐resolution array CGH and massively parallel DNA sequencing. Nature Genetics 42: 400–405.

Paszkiewicz K and Studholme DJ (2010) De novo assembly of short sequence reads. Briefings in Bioinformatics 11: 457–472.

Pennisi E (2007) Breakthrough of the year. Human Genetic Variation. Science 318: 1842–1843.

Perry GH, Ben‐Dor A, Tsalenko A et al. (2008) The fine‐scale and complex architecture of human copy‐number variation. American Journal of Human Genetics 82: 685–695.

Redon R, Ishikawa S, Fitch KR et al. (2006) Global variation in copy number in the human genome. Nature 444: 444–454.

Robison K (2010) Application of second‐generation sequencing to cancer genomics. Briefings in Bioinformatics 11: 524–534.

Schadt EE, Turner S and Kasarskis A (2010) A window into third‐generation sequencing. Human Molecular Genetics 19: R227–R240.

Sebat J, Lakshmi B, Troge J et al. (2004) Large‐scale copy number polymorphism in the human genome. Science 305: 525–528.

Shendure J and Ji H (2008) Next‐generation DNA sequencing. Nature Biotechnology 26: 1135–1145.

Stankiewicz P and Lupski JR (2010) Structural variation in the human genome and its role in disease. Annual Review of Medicine 61: 437–455.

Stephens PJ, McBride DJ, Lin ML et al. (2009) Complex landscapes of somatic rearrangement in human breast cancer genomes. Nature 462: 1005–1010.

Sudmant PH, Kitzman JO, Antonacci F et al. (2010) Diversity of human copy number variation and multicopy genes. Science 330: 641–646.

Tuzun E, Sharp AJ and Bailey JA (2005) Fine‐scale structural variation of the human genome. Nature Genetics 37: 727–732.

Wang J, Wang W, Li R et al. (2008) The diploid genome sequence of an Asian individual. Nature 456: 60–65.

Wang K, Li M, Hadley D et al. (2007) PennCNV: an integrated hidden Markov model designed for high‐resolution copy number variation detection in whole‐genome SNP genotyping data. Genome Research 17: 1665–1674.

Wong KK, deLeeuw RJ, Dosanjh NS et al. (2007) A comprehensive analysis of common copy‐number variations in the human genome. American Journal of Human Genetics 80: 91–104.

Yim SH, Kim TM, Hu HJ et al. (2010) Copy number variations in East‐Asian population and their evolutionary and functional implications. Human Molecular Genetics 19: 1001–1008.

Yoon S, Xuan Z, Makarov V et al. (2009) Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Research 19: 1586–1592.

Zogopoulos G, Ha KC, Naqib F et al. (2007) Germ‐line DNA copy number variation frequencies in a large North American population. Human Genetics 122: 345–353.

Further Reading

Alkan C, Kidd JM, Marques‐Bonet T et al. (2009) Personalized copy number and segmental duplication maps using next‐generation sequencing. Nature Genetics 41: 1061–1067.

Carson AR, Feuk L, Mohammed M and Scherer SW (2006) Strategies for the detection of copy number and other structural variants in the human genome. Human Genomics 2: 403–414.

Hormozdiari F, Alkan C, Eichler EE and Sahinalp SC (2009) Combinatorial algorithms for structural variation detection in high‐throughput sequenced genomes. Genome Research 19: 1270–1278.

Kidd JM, Sampas N, Antonacci F et al. (2010) Characterization of missing human genome sequences and copy‐number polymorphic insertions. Nature Methods 7: 365–371.

Wain LV, Armour JA and Tobin MD (2009) Genomic copy number variation, human health, and disease. Lancet 374: 340–350.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Ku, Chee Seng, Naidoo, Nasheen, Teo, Shu Mei, and Pawitan, Yudi(Feb 2011) Characterising Structural Variation by Means of Next‐Generation Sequencing. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0023399]