Sequence Accuracy and Verification

Analyses involving deoxyribonucleic acid sequences have to consider three main parameters concerning accuracy: sequence quality, sequence contiguity and sequence fidelity. Here, sequence quality defines the probability of error for any base-call, contiguity defines the completeness and correctness of the assembly of subsequences and fidelity defines the correctness of the genomic representation of the assembly.

Keywords: DNA sequence; assembly; contiguity; fidelity; quality

Figure 1. The PHRAP quality scores of a typical human genome ‘draft’ sequence as available from the EMBL database.
Figure 2. Levels of sequence contiguity. (N)100 indicates sequence gap in the clone assembly, (N)50000 indicates a bridged sequence gap in the chromosome assembly and (N)100000 indicates an unbridged sequence gap in the chromosome assembly. S indicates switch points between clone sequences in the chromosome assembly. Switch points are chosen arbitrarily within the middle sections of overlapping clone sequences.
Figure 3. EMBL/GENBANK/DDBJ database entry for the draft sequence shown in Figure 1.
close
 References
    Abola EE, Bairoch A, Barker WC, et al. (2000) Quality control in databanks for molecular biology. BioEssays 22: 1024–1034.
    Beck S (1993) Accuracy of DNA sequencing: should the sequence quality be monitored? DNA Sequence 4: 215–217.
    Bonfield JK, Smith KF and Staden R (1995) A new DNA sequence assembly program. Nucleic Acids Research 23: 4992–4999.
    Burge C and Karlin S (1997) Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology 268: 78–94.
    Ewing B and Green P (1998) Base-calling of automated sequencer traces using phred II. Error probabilities. Genome Research 8: 186–194.
    Ewing B, Hillier L, Wendl MC and Green P (1998) Base-calling of automated sequencer traces using phred I. Genome Research 8: 175–185.
    Felsenfeld A, Peterson J, Schloss J and Guyer M (1999) Assessing the quality of the DNA sequence from the Human Genome Project. Genome Research 9: 1–4.
    Gordon D, Abajian C and Green P (1998) Consed: a graphical tool for sequence finishing. Genome Research 8: 195–202.
    Yeh RF, Lim LP and Burge CB (2001) Computational inference of homologous gene structures in the human genome. Genome Research 11: 803–816.
 Further Reading
    Altschul SF, Gish W, Miller W, Myers EW and Lipman DJ (1990) Basic local alignment search tool. Journal of Molecular Biology 215: 403–410.
    Dunham I, Shimizu N, Roe BA, et al. (1999) The DNA sequence of human chromosome 22. Nature 402: 489–495.
    Gregory SG, Howell GR and Bentley DR (1997) Genome mapping by fluorescent fingerprinting. Genome Research 7: 1162–1168.
    Hattori M, Fujiyama A, Taylor TD, et al. (2000) The DNA sequence of human chromosome 21. Nature 405: 311–319.
    Marra MA, Kucaba TA, Dietrich NL, et al. (1997) High throughput fingerprint analysis of large-insert clones. Genome Research 7: 1072–1084.
    Mullikin JC, Hunt SE, Cole CG, et al. (2000) An SNP map of human chromosome 22. Nature 407: 516–520.
    Osoegawa K, Mammoser AG, Wu C, et al. (2001) A bacterial artificial chromosome library for sequencing the complete human genome. Genome Research 11: 483–496.
    International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature 409: 860–921.
 Web Links
    ePath Ensembl Trace Server http://trace.ensembl.org/
    ePath Genome Sequencing Center. International Finishing Standards for the Human Genome Project (Version September 7, 2001) http://genome.wustl.edu/gsc/Overview/finrules/hgfinrules.html
    ePath National Human Genome Research Institute (NHGRI). NHGRI Standard for Quality of Human Genomic Sequence http://www.nhgri.nih.gov:80/Grant_info/Funding/Statements/RFA/quality_standard.html
    ePath National Center for Biotechnology Information: NCBI News http://www.ncbi.nlm.nih.gov/Web/Newsltr/feb98.html#GenBank
    ePath Project Ensembl. Ensembl Genome Browser http://www.ensembl.org/
    ePath Summary of the Report of the Second International Strategy Meeting on Human Genome Sequencing Bermuda, 27th February–2nd March 1997 sponsored by the Wellcome Trust http://www.gene.ucl.ac.uk/hugo/bermuda2.htm
    ePath The Phred/Phrap/Consed System home page http://www.phrap.org
    ePath The Wellcome Trust Sanger Institute Human Blast Server http://www.sanger.ac.uk/HGP/blast_server.shtml
    ePath The Wellcome Trust Sanger Institute: software http://www.sanger.ac.uk/Software/
    ePath UCSC Human Genome Project Working Draft http://genome.ucsc.edu/
Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Beck, Stephan(Sep 2005) Sequence Accuracy and Verification. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1038/npg.els.0005390]