Tiziana Castrignanò, Cineca, SCAI SuperComputing Applications and Innovation Department, Rome, Italy
Giovanni Chillemi, Cineca, SCAI SuperComputing Applications and Innovation Department, Rome, Italy
Published online: November 2016
All the instructions needed by a cell to direct its activities are contained in its genome. The crucial advances in biology
we are witnessing have an impact on genome databases, as the genome is the natural reference frame for mapping information
about genes and proteins. Both primary databases, containing the exponentially growing raw information produced by the genomic
initiatives, and secondary databases, containing functional annotations of genes and proteins, are the primary entry points
for an increasing number of research communities, from biologists to geneticists, from clinical researchers to pharmacologists
and more. The elaboration and extraction of new information from this huge amount of genome data is a key challenge that is
met by interdisciplinary teams, including high‐throughput data analysts and high‐performance computing technologists.
- Genome databases are the official repositories of the ever‐growing amount of genomic sequences.
- The genome represents a natural framework for mapping the biological data of an organism.
- Genome browsers provide integrated and customisable views of the information.
- Genome databases and their associated tools are the primary entry points for accessing biological information for an increasing
number of species.
- As more data on individual genomes become available, genome databases allow the reconstruction of whole ‘genetic landscapes’
of a pathology, population or species.
- Future challenges are not only related to the sheer size of the data, but also to the need of protecting sensitive information
without hampering the exploitation of data for new discoveries.
Keywords: genome; genomics; bioinformatics; genome databases; genetic landscape
Altman RB, Prabhu S, Sidow A, et al. (2016) A research roadmap for next‐generation sequencing informatics. Science Translational Medicine 8: 335ps10.
ENCODE Project Consortium (2012) An integrated encyclopaedia of DNA elements in the human genome. Nature 489: 57–74.
Erlich Y and Narayanan A (2014) Routes for breaching and protecting genetic privacy. Nature Reviews. Genetics 5: 409–421.
Graveley BR, Brooks AN, Carlson JW, et al. (2011) The developmental transcriptome of Drosophila melanogaster. Nature 471: 473–479.
Karro JE, Yan Y, Zheng D, et al. (2007) Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation. Nucleic Acids Research 35: D55–D60.
Keen JC and Moore HM (2015) The Genotype‐Tissue Expression (GTEx) project: linking clinical data with molecular analysis to advance personalized medicine. Journal of Personalized Medicine 5: 22–29.
Kozomara A and Griffiths‐Jones S (2011) miRBase: integrating microRNA annotation and deep‐sequencing data. Nucleic Acids Research 39: D152–D157.
Nakamura Y, Cochrane G and Karsch‐Mizrachi I (2013) The International Nucleotide Sequence Database Collaboration. Nucleic Acids Research 41: D21–D24.
Pruitt KD, Brown GR, Hiatt SM, et al. (2014) RefSeq: an update on mammalian reference sequences. Nucleic Acids Research 42: D756–D763.
Pundir S, Martin MJ, O'Donovan C, et al. (2016) UniProt tools. Current Protocols in Bioinformatics 24: 1.29.1–1.29.15.
Robinson J, Halliwell JA, McWilliam H, et al. (2013) The IMGT/HLA database. Nucleic Acids Research 41: D1222–D1227.
Speir ML, Zweig AS, Rosenbloom KR, et al. (2016) The UCSC Genome Browser database: 2016 update. Nucleic Acids Research 44: D717–D725.
The Gene Ontology Consortium (2015) Gene Ontology Consortium: going forward. Nucleic Acids Research 43: D1049–D1056.
Thorvaldsdóttir H, Robinson JT and Mesirov JP (2013) Integrative Genomics Viewer (IGV): high‐performance genomics data visualization and exploration. Briefings in Bioinformatics 14: 178–192.
UniProt Consortium (2014) Activities at the Universal Protein Resource (UniProt). Nucleic Acids Research 42: D191–D198.
Waterhouse RM, Zdobnov EM, Tegenfeldt F, et al. (2012) OrthoDB: a hierarchical catalog of animal, fungal and bacterial orthologs. Nucleic Acids Research 41: D358–D365.
Each January issue of Nucleic Acids Research is a special database issue.
Lesk AM (2012) Introduction to Genomics. Oxford: Oxford University Press.
Momand J and McCurdy A (2016) Concepts in Genomics and Bioinformatics. Oxford: Oxford University Press.
Schattner P (2008) Genomes, Browsers and Databases. Cambridge: Cambridge University Press.