The age of genomics has pushed the cost of sequencing much lower than typical cost/ technology curves would predict. Sanger sequencing, next-gen sequencing (ABI SOLiD sequencing, Illumina GA, Roche 454) and even whole-chromosome imaging are providing sequence data faster than most laboratories can analyse or store. The biological data infrastructure that was established in the early 1990s is still in place, mostly because it was very well planned in terms of future needs. The three main biological data centres are NCBI (http://www.ncbi.nlm.nih.gov/), DDJB (http://www.ddbj.nig.ac.jp/) and EBI/EMBL (http://www.ebi.ac.uk/embl/). These centres will be discussed in the context of new types of high-density biological data, such as microarrays of various sorts. This article will discuss history, the tools that are provided to the public, other biological databases that support and integrate with sequencing databases and a projection of biology in the future.
- Biological data is much denser today than it was even 10 years ago.
- Sequencing DNA is faster, cheaper and more accurate today than it was 6 months ago, and far faster than it was 10 years ago. Data repositories have struggled to keep up.
- Three main DNA sequence repositories exist: USA, Europe and Japan. These are mirrors of each other, but are also distinct.
- Web-based sequence analysis tools are presented.
- Other high-density databases are listed including GEO, BIND, pdb, HapMap, KEGG and GO.
Keywords: bioinformatics; genomics; Sanger Centre; NCBI; UCSC