Human Genome: Draft Sequence

Abstract

Researchers have been allowed free, early access to a nearly complete, human genome sequence. The draft contained many gaps and ambiguities; however, a continuous process of finishing over the last few years has resolved most of them.

Keywords: human genome; genome analysis; genome assembly

Figure 1.

Strategy of international human genome sequencing consortium is based on selecting a minimal set of overlapping bacterial artificial chromosome (BAC) clones to sequence, from libraries covering the genome many times. Each selected clone then undergoes shotgun sequencing to a certain base coverage and is then ‘finished’ to close the remaining gaps and resolve problems in a semi‐manual fashion. A shotgun coverage of 4X means that on average each base in the clonexf will occur in four different reads.

Figure 2.

Finger print contigs (FPCs) are positioned and orientated on human chromosomes via Généthon markers and radiation hybrid (RH) map. Each FPC is an assembly of BAC clones based on similarities between their restriction digest fingerprints.

Figure 3.

(a) The raw human genome sequence consists of a mixture of sequences of finished (one continuous sequence) and unfinished (between 2 and 20 fragments) clones, available from the public sequences databases. The initial order of the clones is taken from FPCs. Within an unfinished clone, some fragments are locally ordered on the basis of paired reads, but the rest are unordered. As an unfinished clone is worked on to take it to finished status, it is resubmitted to the public databases, each time keeping its original accession number, but incrementing its version number. (b) The current set of sequences in public databases is defined as a freeze, and identified by their accession and version numbers, so subsequent analysis all works with the same set of sequences. Sequence homology searches between fragments in this set find overlaps that are consistent, or nearly consistent, with the original clone order (H). (c) A golden path is defined as the ordered set of fragments from the freeze which uniquely defines a nonredundant genome sequence, with strings of N's inserted to mark known gaps between fragments (G). Gaps can occur between ordered or unordered fragments within clones (G1) or between clones (G2) where no significant sequence similarity was found between their fragments.

close

References

Butler D (2002) Charity launches not‐for‐profit drug industry. Nature 416: 465.

Deloukas P, Matthews LH, Ashurst J, et al. (2001) The DNA sequence and comparative analysis of human chromosome 20. Nature 414: 865–871.

Dunham I, Shimizu N, Roe BA, et al. (1999) The DNA sequence of human chromosome 22. Nature 402: 489–495.

Hattori M, Fujiyama A, Taylor TD, et al. (2000) The DNA sequence of human chromosome 21. Nature 405: 311–319.

Hubbard T, Barker D, Birney E, et al. (2002) The Ensembl genome database project. Nucleic Acids Research 30: 38–41.

International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature 409: 860–921.

International Human Genome Mapping Consortium (2001) A physical map of the human genome. Nature 409: 934–941.

Pruitt KD and Maglott DR (2001) RefSeq and Locuslink: NCBI gene‐centered resources. Nucleic Acids Research 29(1): 137–140.

Sulston J and Ferry G (2002) The Common Thread. London, UK: Bantam Press.

The C. elegans Sequencing Consortium (1998) Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282: 2012–2018.

Venter JC, Adams MD, Myers EW, et al. (2001) The sequence of the human genome. Science 291: 1304–1351.

Web Links

Ensembl. It provides free access to an integrated view of analysis of vertebrate genomes. In addition it is an open source software project, so all code is available as well as all data for download, mirroring and reuse. http://www.ensembl.org/

RefSeq. The NCBI Reference Sequence (RefSeq) collection aims to provide a comprehensive, integrated, non‐redundant set of sequences, including genomic DNA, transcript (RNA) and protein products, for major research organisms. http://www.ncbi.nih.gov/RefSeq/

GNU. Gnu's not Unix web site, home of the Free Software Foundation, which provided much of the legal and philosophical basis for the open source software movement. http://www.gnu.org/

Linux. Home of Linux, the open source unix like operating system, which combines the linux kernel with many tools developed by GNU, hence GNU/Linux. This is just one project of the open source movement. http://www.linux.org/

Sourceforge. Largest repository for open source software, hosting almost 60,000 distinct projects. http://sourceforge.net/

MSF DNDi. Medecins sans Frontieres Drugs for neglected diseases initiative. Medecins san Frontieres (Doctors without Borders) project to develop drugs for neglected diseases. http://www.accessmed‐msf.org/dnd/

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Hubbard, Tim JP(Sep 2005) Human Genome: Draft Sequence. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1038/npg.els.0005395]