Mutation Databases


Mutation databases were developed in the 1990s and since then being a hot topic in the field of human molecular genetics. They have been constantly evolving, from a single gene to whole genome, from a few patients to whole populations, bringing the community to face multiple challenges, from the ethical and technical aspects of data sharing to the new methods to be developed to bring precision medicine to the patient. The new sequencing technologies, generating worldwide an uninterrupted flow of variants from patients and from the general population, challenge the information technology capacities of genetic centres. Huge databases are now present in multiple copies inside the institutions in high‐performance computing environments and are used to filter, interpret and prioritise variants in disease studies in order to facilitate the work of thousands of geneticists overwhelmed with the big data.

Key Concepts

  • Mutation databases appeared in the 1990s and were designed to store, organise and share genetic data.
  • The main categories were central databases, which gathered all variants, and locus‐specific databases, which were focused on one gene/disease with an expert curator.
  • The advent of high‐throughput sequencing changed the paradigm: central databases are becoming more and more experts and locus‐specific databases more and more genomic.
  • High‐throughput sequencing also brought a new category of databases for fast variant interpretation.
  • The main current issue is data sharing: success is linked to data quality, availability and usability.

Keywords: DNA variant; genetic databases; high‐throughput sequencing; locus‐specific databases; data sharing; variant prioritisation

Figure 1. Main categories of genetic databases. Types of genetic databases are labelled in orange, with key examples in yellow. Arrows show examples of data integration in dbSNP. The blue curly brace highlights the UCSC web browser that includes data from all databases.
Figure 2. Evolution in the number of variants reported in dbSNP from 2005 (HapMap project) to 2017. The graph represents the number of submitted (blue) and validated variants (pink) for each dbSNP version. M: millions


1000 Genomes Project Consortium, Auton A, Brooks LD, et al. (2015) A global reference for human genetic variation. Nature 526: 68–74.

Ars E, Serra E, García J, et al. (2000) Mutations affecting mRNA splicing are the most common molecular defects in patients with neurofibromatosis type 1. Human Molecular Genetics 9: 237–247.

Béroud C, Hamroun D, Collod‐Béroud G, et al. (2005) UMD (Universal Mutation Database): 2005 update. Human Mutation 26: 184–191.

Bonini J, Varilh J, Raynal C, et al. (2015) Small‐scale high‐throughput sequencing‐based identification of new therapeutic tools in cystic fibrosis. Genetics in Medicine 17: 796–806.

Castellana S and Mazza T (2013) Congruency in the prediction of pathogenic missense mutations: state‐of‐the‐art web‐based tools. Briefings in Bioinformatics 14: 448–459.

Claustres M, Horaitis O, Vanevski M, et al. (2002) Time for a unified system of mutation description and reporting: a review of locus‐specific mutation databases. Genome Research 12: 680–688.

Claustres M, Thèze C, Des Georges M, et al. (2017) CFTR‐France, a national relational patient‐database for sharing genetic and phenotypic data associated with rare CFTR variants. Human Mutation 00: 1–19.

Collins FS (2003) The human genome project: lessons from large‐scale biology. Science 300: 286–290.

van Dijk EL, Auger H, Jaszczyszyn Y, et al. (2014) Ten years of next‐generation sequencing technology. Trends in Genetics 30: 418–426.

Ephraim SS, Anand N, DeLuca AP, et al. (2014) Cordova: web‐based management of genetic variation data. Bioinformatics 30: 3438–3439.

Fakhro KA, Staudt MR, Ramstetter MD, et al. (2016) The Qatar genome: a population‐specific tool for precision medicine in the Middle East. Human Genome Variation 3: 16016.

Fokkema IF a C, Taschner PEM, Schaafsma GCP, et al. (2011) LOVD v.2.0: the next generation in gene variant databases. Human Mutation 32: 557–563.

Forbes SA, Beare D, Boutselakis H, et al. (2017) COSMIC: somatic cancer genetics at high‐resolution. Nucleic Acids Research 45: D777–D783.

Francioli LC, Menelaou A, Pulit SL, et al. (2014) Whole‐genome sequence variation, population structure and demographic history of the Dutch population. Nature Genetics 46: 818–825.

Fu W, O'Connor TD, Jun G, et al. (2012) Analysis of 6,515 exomes reveals the recent origin of most human protein‐coding variants. Nature 493: 216–220.

International HapMap Consortium (2003) The International HapMap project. Nature 426: 789–796.

Jian X, Boerwinkle E and Liu X (2014) In silico prediction of splice‐altering single nucleotide variants in the human genome. Nucleic Acids Research 42: 13534–13544.

Karczewski KJ, Weisburd B, Thomas B, et al. (2017) The ExAC browser: displaying reference data information from over 60 000 exomes. Nucleic Acids Research 45: D840–D845.

Katsonis P, Koire A, Wilson SJ, et al. (2014) Single nucleotide variations: biological impact and theoretical interpretation. Protein Science 23: 1650–1666.

Kent WJ, Sugnet CW, Furey TS, et al. (2002) The human genome browser at UCSC. Genome Research 12: 996–1006.

Landrum MJ, Lee JM, Benson M, et al. (2016) ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Research 44: D862–D868.

Lee M, Roos P, Sharma N, et al. (2017) Systematic computational identification of variants that activate exonic and intronic cryptic splice sites. The American Journal of Human Genetics 100: 751–765.

Lek M, Karczewski KJ, Minikel EV, et al. (2016) Analysis of protein‐coding genetic variation in 60,706 humans. Nature 536: 285–291.

Lim KH, Ferraris L, Filloux ME, et al. (2011) Using positional distribution to identify splicing elements and predict pre‐mRNA processing defects in human genes. Proceedings of the National Academy of Sciences 108: 11093–11098.

Liquori A, Vaché C, Baux D, et al. (2016) Whole USH2A gene sequencing identifies several new deep intronic mutations. Human Mutation 37: 184–193.

Liu X, Wu C, Li C, et al. (2016) dbNSFP v3.0: a one‐stop database of functional predictions and annotations for human nonsynonymous and splice‐site SNVs. Human Mutation 37: 235–241.

Mathieson I and McVean G (2012) Differential confounding of rare and common variants in spatially structured populations. Nature Genetics 44: 243–246.

Nagasaki M, Yasuda J, Katsuoka F, et al. (2015) Rare variant discovery by deep whole‐genome sequencing of 1,070 Japanese individuals. Nature Communications 6: 8018.

Senapathy P, Shapiro MB and Harris NL (1990) Splice junctions, branch point sites, and exons: sequence statistics, identification, and applications to genome project. Methods in Enzymology 183: 252–278.

Sherry ST (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Research 29: 308–311.

Sosnay PR, Siklosi KR, Van Goor F, et al. (2013) Defining the disease liability of variants in the cystic fibrosis transmembrane conductance regulator gene. Nature Genetics 45: 1160–1167.

Thusberg J and Vihinen M (2009) Pathogenic or not? And if so, then how? Studying the effects of missense mutations using bioinformatics methods. Human Mutation 30: 703–714.

Tuffery‐Giraud S, Saquet C, Chambert S, et al. (2003) Pseudoexon activation in the DMD gene as a novel mechanism for Becker muscular dystrophy. Human Mutation 21: 608–614.

Vaché C, Besnard T, le Berre P, et al. (2012) Usher syndrome type 2 caused by activation of an USH2A pseudoexon: implications for diagnosis and therapy. Human Mutation 33: 104–108.

Xiong HY, Alipanahi B, Lee LJ, et al. (2015) The human splicing code reveals new insights into the genetic determinants of disease. Science 347: 1254806.

Yeo G and Burge CB (2004) Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. Journal of Computational Biology 11: 377–394.

Further Reading

Brown SM (2015) Next‐Generation DNA Sequencing Informatics, 2nd edn. Cold Spring Harbor Laboratory Press.

Cotton RGH (2012) The human variome project and the developing world. In: Kumar D (ed.) Genomics and Health in the Developing World, part 1, chap. 2,. Oxford University Press.

McElheny VK (2012) Drawing the Map of Life: Inside the Human Genome Project. Basic Books.

Palladino MA (2006) Understanding the Human Genome Project, 2nd edn. Pearson.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Baux, David, Sasorith, Souphatta, Bergougnoux, Anne, and Claustres, Mireille(Sep 2017) Mutation Databases. In: eLS. John Wiley & Sons Ltd, Chichester. [doi: 10.1002/9780470015902.a0005315.pub2]