Sequence Finishing on New Platforms


In the past five years, next generation sequencing (NGS) technology platforms have been released and have become the tool of choice for sequencing genomes. Methods for sequence finishing have had to be updated to deal with the issues created by NGS platforms. Finishing begins with the assembling of the reads produced by the sequencing machines before computational gap closure is attempted. Remaining gaps are identified and suitable experimental techniques such as polymerase chain reactions are used to close these. Until recently, the aim for many genome sequencing projects has been to fully finish genomes. However, due to the time and cost associated with finishing, the majority of future genomes will only reach a high‐quality draft. New genome standards have had to be defined to better reflect the types of sequencing now being undertaken. Improvements to the current generation of sequencing machines along with the advent of the third‐generation technologies means that new ways of dealing with the data volume will need to be sought.

Key Concepts:

  • Next generation sequencing platforms have generated new issues for sequence finishing, which have required novel solutions.

  • Genome standards have had to be updated to better reflect NGS data.

Keywords: next generation sequencing; finishing; Roche (R) 454 Life Science Genome Sequencer; Illumina (R) Genome Analyzer II; Applied Bioystems (R) SOLiD system; DNA; PCR

Figure 1.

Portion of a gap4 window showing 454 reads with differing numbers of base calls in a homopolymer run. The correct sequence is shown in red (Illumina consensus).

Figure 2.

(a) Gap5 visualisation window showing all Illumina reads present compared to (b) a gap4 window showing only an Illumina consensus (subset_4800000).



Assefa S, Keane TM, Otto TD et al. (2009) ABACAS: algorithm‐based automatic contiguation of assembled sequences. Bioinformatics 25(15): 1968–1969.

Bonfield JK and Whitwham A (2010) Gap5–editing the billion fragment sequence assembly. Bioinformatics 26(14): 1699–1703.

Chain PS, Grafham DV, Fulton RS et al. (2009) Genomics. Genome project standards in a new era of sequencing. Science 326(5950): 236–237.

Gordon D, Abajian C and Green P (1998) Consed: a graphical tool for sequence finishing. Genome Research 8(3): 195–202.

Harismendy O, Ng PC, Strausberg RL et al. (2009) Evaluation of next generation sequencing platforms for population targeted sequencing studies. Genome Biology 10(3): R32.

Hillier LW, Marth GT, Quinlan AR et al. (2008) Whole‐genome sequencing and variant discovery in C. elegans. Nature Methods 5(2): 183–188.

Langmead B, Trapnell C, Pop M and Salzberg SL (2009) Ultrafast and memory‐efficient alignment of short DNA sequences to the human genome. Genome Biology 10(3): R25.

Li H and Durbin R (2009) Fast and accurate short read alignment with Burrows‐Wheeler transform. Bioinformatics 25(14): 1754–1760.

Li R, Zhu H, Ruan J et al. (2009) De novo assembly of human genomes with massively parallel short read sequencing. Genome Research 20(2): 265–272.

Margulies M, Egholm M, Altman WE et al. (2005) Genome sequencing in microfabricated high‐density picolitre reactors. Nature 437(7057): 376–380.

Metzker ML (2009) Sequencing in real time. Nature Biotechnology 27(2): 150–151.

Metzker ML (2010) Sequencing technologies – the next generation. Nature Reviews 11(1): 31–46.

Ning Z, Cox A and Mullikin JC (2001) SSAHA: a fast search method for large DNA databases. Genome Research 11(10): 1725–1729.

Otto TD, Sanders M, Berriman M et al. (2010) Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology. Bioinformatics 26(14): 1704–1707.

Rozen S and Skaletsky H (2000) Primer3 on the WWW for general users and for biologist programmers. Methods in Molecular Biology 132: 365–386.

Sanger F, Air GM, Barrell BG et al. (1977) Nucleotide sequence of bacteriophage phi X174 DNA. Nature 265(5596): 687–695.

Schuster SC (2008) Next‐generation sequencing transforms today's biology. Nature Methods 5(1): 16–18.

Simpson JT, Wong K, Jackman SD et al. (2009) ABySS: a parallel assembler for short read sequence data. Genome Research 19(6): 1117–1123.

Staden R, Beal KF and Bonfield JK (2000) The Staden package, 1998. Methods in Molecular Biology 132: 115–130.

Tsai IJ, Otto TD and Berriman M (2010) Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps. Genome Biology 11(4): R41.

Valouev A, Ichikawa J, Tonthat T et al. (2008) A high‐resolution nucleosome position map of C. elegans reveals a lack of universal sequence‐dictated positioning. Genome Research 18(7): 1051–1063.

Zerbino DR and Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research 18(5): 821–829.

Zerbino DR, McEwen GK, Margulies EH et al. (2009) Pebble and rock band: heuristic resolution of repeats and scaffolding in the velvet short‐read de novo assembler. PloS One 4(12): e8407.

Further Reading

1000 Genomes Project Consortium, Durbin RM, Abecasis GR et al. (2010) A map of human genome variation from population‐scale sequencing. Nature 467(7319): 1061–1073.

Bird C and Grafham D (2004) BAC finishing strategies. Methods in Molecular Biology 255: 255–277.

Brown TA (2010) Gene Cloning and DNA Analysis: An Introduction, 6th edn. Oxford: Wiley‐Blackwell.

McMurray AM, Sulston JE and Quail MA (1998) Short‐insert libraries as a method of problem solving in genome sequencing. Genome Research 8(5): 562–566.

Munroe DJ and Harris TJ (2010) Third‐generation sequencing fireworks at Marco Island. Nature Biotechnology 28(5): 426–428.

Web Links

Illumina website:

Life/APG SOLiD 3:

Roche 454 Website:

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
van Tonder, Andries J, and Grafham, Darren(Mar 2011) Sequence Finishing on New Platforms. In: eLS. John Wiley & Sons Ltd, Chichester. [doi: 10.1002/9780470015902.a0023157]