Sequence Finishing

Abstract

Sequence finishing can be described as the manual enhancement of assembled shotgun sequence data to improve regions of low quality or close gaps in the sequence. Shotgun sequencing data can be whole genome derived or from clone‐based entities such as bacterial artificial chromosomes (BACs) or other plasmid based vectors such as fosmids. Sequence finishing can be achieved using a combination of laboratory and computational techniques to produce a highly accurate and complete deoxyribonucleic acid (DNA) sequence.

Variable coverage and sequence gaps can occur across whole genome shotgun or a particular BAC or fosmid owing to the random nature of shotgun sequencing and the composition of a given piece of DNA can be prohibitive to both the cloning and sequencing process. A targeted approach is required to provide a more cost effective means of improving any regions of low quality and to close gaps that remain after initial shotgun sequencing.

Key Concepts:

  • Gaps exist in shotgun sequence for various reasons including structural elements, which can interrupt the sequencing reaction and variation in coverage owing to chance.

  • Directed finishing to close gaps in shotgun DNA sequence is labour intensive compared with high throughput shotgun sequence generation.

  • Good quality complete DNA sequences facilitate genome annotation and comparative sequencing studies.

Keywords: subclone; shotgun sequencing; finishing; gap; polymerase chain reaction; short‐insert library; transposon library; sequence improvement

Figure 1.

Gap4 screenshot showing subclones within a contig. Gap4 displays readpair orientation within the contig editor as forward (+) and reverse (−) and can display individual read quality in greyscale. The display in CONSED is similar.

Figure 2.

Example of high quality discrepancies displayed in gap4 diploid plot (in black) and corresponding readpair coverage (in red) within the same contig. The cluster of peaks (highlighted by the arrows) and higher than average coverage (compared to the rest of the contig) suggests an assembly problem that would require further investigation.

Figure 3.

Screenshot of gap4 trace display illustrating a sudden stop in sequencing owing to a repeat structure.

close

References

Anderson S, de Bruin MHL, Coulson AR et al. (1982) Complete sequence of bovine mitochondrial DNA. Journal of Molecular Biology 156: 683–717.

Assefa S, Keane TM, Otto TD et al. (2009) ABACAS: algorithm‐based automatic contiguation of assembled sequences. Bioinformatics 15: 1968–1969.

Chain PS, Grafham DV, Fulton RS et al. (2009) Genome project standards in a new era of sequencing. Science 326: 236–237.

Chissoe SL, Marra AM, Hillier L et al. (1997) Representation of cloned genomic sequence in two sequencing vectors: correlation of DNA sequence and subclone distribution. Nucleic Acids Research 25: 2960–2966.

Deininger PL (1983) Random subcloning of sonicated DNA: application to shotgun DNA sequence analysis. Analytical Biochemistry 129: 216–223.

Devine SE, Chissoe SL, Eby Y et al. (1997) A transposon‐based strategy for sequencing repetitive DNA in eukaryotic genomes. Genome Research 7: 551–563.

Ewing B, Hillier L, Wendl MC et al. (1998) Base‐calling of automated sequencer traces using phred I. Genome Research 8: 186–194.

Gordon D, Abajian C and Green P (1998) Consed: a graphical tool for sequence finishing. Genome Research 8(3): 195–202.

Gordon D, Desmarais C and Green P (2001) Automated Finishing with Autofinish. Genome Research 11: 614–625.

McMurray AM, Sulston JE and Quail MA (1998) Short‐insert libraries as a method of problem solving in genome sequencing. Genome Research 8: 562–566.

Quail MA (2001) M13 cloning of mung bean nuclease digested PCR fragments as a means of gap closure within A/T‐rich, genome sequencing projects. DNA Sequence 12: 355–359.

Staden R, Beal KF and Bonfield JK (2000) The Staden package, 1998. Methods in Molecular Biology 32: 115–130.

Further Reading

Bird C and Grafham D (2004) BAC finishing strategies. Methods in Molecular Biology 255: 255–277.

Gibson G and Muse SV (2002) A Primer of Genomic Science. Genome Sequencing and Annotation, chap. 2, pp. 63–122. Massachusetts: Sinauer Associates, Inc.

Hunt AR, Willey DI and Quail MA (2003) Genome Mapping and Sequencing. In: Dunham I (ed) Finishing Genomic Sequence and Dealing with Problem Sequences. Chap. 11, pp. 315–355. United Kingdom: Horizon Scientific Press.

Rice PM, Elliston K and Gribskov M (1992) Sequence Analysis Primer. In: Gribskov M and Devereux J (eds) DNA, chap. 1, pp. 1–23. New York: Oxford University Press.

Web Links

CAP3 and PCAP available for download under licence from http://seq.cs.iastate.edu/

The Phred/Phrap/consed system homepage. Page containing documentation relating to the base caller‐‘Phred’ and the DNA sequence assembler ‘Phrap’ http://www.phrap.org

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Beasley, Helen, Grafham, Darren, and Willey, David(May 2011) Sequence Finishing. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0005389.pub2]