Genome Databases

Abstract

All the instructions needed by a cell to direct its activities are contained in its genome. The crucial advances in biology we are witnessing have an impact on genome databases, as the genome is the natural reference frame for mapping information about genes and proteins. Both primary databases, containing the exponentially growing raw information produced by the genomic initiatives, and secondary databases, containing functional annotations of genes and proteins, are the primary entry points for an increasing number of research communities, from biologists to geneticists, from clinical researchers to pharmacologists and more. The elaboration and extraction of new information from this huge amount of genome data is a key challenge that is met by interdisciplinary teams, including high‐throughput data analysts and high‐performance computing technologists.

Key Concepts

  • Genome databases are the official repositories of the ever‐growing amount of genomic sequences.
  • The genome represents a natural framework for mapping the biological data of an organism.
  • Genome browsers provide integrated and customisable views of the information.
  • Genome databases and their associated tools are the primary entry points for accessing biological information for an increasing number of species.
  • As more data on individual genomes become available, genome databases allow the reconstruction of whole ‘genetic landscapes’ of a pathology, population or species.
  • Future challenges are not only related to the sheer size of the data, but also to the need of protecting sensitive information without hampering the exploitation of data for new discoveries.

Keywords: genome; genomics; bioinformatics; genome databases; genetic landscape

Figure 1. Gaining information on a gene of interest using a genome browser (the UCSD genome browser in this example). (a) The user selects a gene of interest, in this case retinoblastoma 1 (RB1), and is directed to the genomic region encoding for the corresponding gene; (b) Clicking on the gene name provides a summary description of known information about the gene and links to external genome databases and tools; (c) Clicking on ‘Protein Structure’ shows the three‐dimensional structure of the protein (if known); (d) Clicking on ‘GO Annotation’ retrieves a table with gene ontology data; (e) Clicking on ‘Microarray’ a table on microarray experimental data is visualised; (f) Clicking on ‘VisiGene’ opens in situ images showing a gene used in an organism, sometimes down to cellular resolution.
close

References

Altman RB, Prabhu S, Sidow A, et al. (2016) A research roadmap for next‐generation sequencing informatics. Science Translational Medicine 8: 335ps10.

ENCODE Project Consortium (2012) An integrated encyclopaedia of DNA elements in the human genome. Nature 489: 57–74.

Erlich Y and Narayanan A (2014) Routes for breaching and protecting genetic privacy. Nature Reviews. Genetics 5: 409–421.

Graveley BR, Brooks AN, Carlson JW, et al. (2011) The developmental transcriptome of Drosophila melanogaster. Nature 471: 473–479.

Karro JE, Yan Y, Zheng D, et al. (2007) Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation. Nucleic Acids Research 35: D55–D60.

Keen JC and Moore HM (2015) The Genotype‐Tissue Expression (GTEx) project: linking clinical data with molecular analysis to advance personalized medicine. Journal of Personalized Medicine 5: 22–29.

Kozomara A and Griffiths‐Jones S (2011) miRBase: integrating microRNA annotation and deep‐sequencing data. Nucleic Acids Research 39: D152–D157.

Nakamura Y, Cochrane G and Karsch‐Mizrachi I (2013) The International Nucleotide Sequence Database Collaboration. Nucleic Acids Research 41: D21–D24.

Pruitt KD, Brown GR, Hiatt SM, et al. (2014) RefSeq: an update on mammalian reference sequences. Nucleic Acids Research 42: D756–D763.

Pundir S, Martin MJ, O'Donovan C, et al. (2016) UniProt tools. Current Protocols in Bioinformatics 24: 1.29.1–1.29.15.

Robinson J, Halliwell JA, McWilliam H, et al. (2013) The IMGT/HLA database. Nucleic Acids Research 41: D1222–D1227.

Speir ML, Zweig AS, Rosenbloom KR, et al. (2016) The UCSC Genome Browser database: 2016 update. Nucleic Acids Research 44: D717–D725.

The Gene Ontology Consortium (2015) Gene Ontology Consortium: going forward. Nucleic Acids Research 43: D1049–D1056.

Thorvaldsdóttir H, Robinson JT and Mesirov JP (2013) Integrative Genomics Viewer (IGV): high‐performance genomics data visualization and exploration. Briefings in Bioinformatics 14: 178–192.

UniProt Consortium (2014) Activities at the Universal Protein Resource (UniProt). Nucleic Acids Research 42: D191–D198.

Waterhouse RM, Zdobnov EM, Tegenfeldt F, et al. (2012) OrthoDB: a hierarchical catalog of animal, fungal and bacterial orthologs. Nucleic Acids Research 41: D358–D365.

Further Reading

Each January issue of Nucleic Acids Research is a special database issue.

Lesk AM (2012) Introduction to Genomics. Oxford: Oxford University Press.

Momand J and McCurdy A (2016) Concepts in Genomics and Bioinformatics. Oxford: Oxford University Press.

Schattner P (2008) Genomes, Browsers and Databases. Cambridge: Cambridge University Press.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Castrignanò, Tiziana, and Chillemi, Giovanni(Nov 2016) Genome Databases. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0005314.pub3]