National Genomic Databases: Documenting Populations' Genography

Abstract

National Genomic Databases are online repositories documenting the innate genetic heterogeneity of human population and ethnic groups. These include the incidence of genetic diseases among populations, their genomic variations spectrum in qualitative and quantitative terms, for example variant type, along with information as far as the causality of the said variant is concerned, and the respective allelic frequencies. These resources, although serve a very well‐defined niche of human genome informatics, have enjoyed an exponential growth in the post‐genomic era, and together with locus‐specific and general databases nowadays constitute an integral part of human genome informatics.

Key Concepts

  • In recent years, human genomics research has progressed at a very rapid pace, resulting in very high amounts of data production in many laboratories.
  • It is imperative to efficiently integrate all of this information in structured repositories to establish a detailed understanding of how variants in the human genome sequence affect human health.
  • National Genomic Databases are online repositories documenting the observed genetic heterogeneity of human population and ethnic groups.
  • The first National Genomic Databases that appeared online were the National Genetic (or Disease Mutation) Databases and the National Mutation Frequency Databases.
  • FINDbase is a comprehensive data repository that records the prevalence of clinically relevant genomic variants in various populations worldwide, such as pathogenic variants leading mostly to monogenic disorders, pharmacogenomics biomarkers and summaries of genetic diseases that appear in certain population groups, provided in distinct data modules.
  • National Genomic Databases can be really useful resources not only for population genomics but also for genomic medicine applications.
  • Existence of an ‘off‐the‐shelf’ web application for National Genomic Database development and curation allows interested users, with minimal or even no (bio)informatics background, either to start a new National Genomic Database or continue curation of an existing one.
  • Although there have been significant advancements in the field of National Genomic Databases, there are still limitations that hold back the fields that need to be overcome, being mostly the lack of adequate incentives for data sharing and funding.

Keywords: Genomic Databases; National Genomic Databases; database management systems; microattribution; genomic variants; pharmacogenomics; allele frequencies

Figure 1. Data content of FINDbase causative genomic variants (a) and pharmacogenomic biomarker modules (b), respectively, based on Microsoft's PivotViewer and Silverlight technology. Differences in the number of data records in each module are depicted by the differences in the different data cards in each module (7419 data records in the causative genomic variants and 2795 data records in the pharmacogenomic biomarker modules, respectively). In each module, the data‐querying interface is provided on the left.
Figure 2. Example of a display item provided for causative genomic variants (a) and pharmacogenomic biomarker (b), respectively. Every display item resembles a data card (with the variant name appearing in prominent position), accompanied by a sidebar textbox with in‐depth data concerning the particular variant and population (shown on the right of each data card). Each item includes the name of the allele in its official HGVS or other nomenclature systems, if available, the population for which this information is available (shown by the country's flag) and a chromosomal map, where the gene's position is indicated. Hyperlinks for each gene to other related linked databases (such as HGMD and PharmGKB) offer to the user the possibility of easily accessing additional information. Finally, each item displays the corresponding PubMed and Researcher IDs, if applicable. Similar display items are also used for the Genetic Disease Summaries modules (not shown).
Figure 3. Overview of the next‐generation Greek NGDB that has derived from the new version of the ETHNOS software. The home page (a) provides brief information and a direct link to the causative genomic variants (b), pharmacogenomic biomarkers (c) and genetic disease summaries modules (not shown). Data records can be clustered by different querying options (in this example by OMIM ID (b) and gene name (c), respectively).
Figure 4. The map display visualisation tool, available from FINDbase home page, allowing the user to get a visual impression of FINDbase data content and, at the same time, providing a quick access to the available data per population and per module. The user can get a worldwide impression, select a continent or can zoom in and out in the map, using the + and − keys and select the data module desired, namely causative genomic variants, pharmacogenomic biomarkers or disease summaries from the top left corner of the map. Different shades of blue depict the wealth of data records per population (the darker the colour, the more the data that are recorded).
close

References

Agarwala R, Barrett T, Beck J, et al. (2016) Database resources of the National Center for Biotechnology Information. Nucleic Acids Research 44: D7–D19.

Amberger JS, Bocchini CA, Schiettecatte F, Scott AF and Hamosh A (2015) OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Research 43: D789–D798.

van Baal S, Kaimakis P, Phommarinh M, et al. (2007) FINDbase: a relational database recording frequencies of genetic defects leading to inherited disorders worldwide. Nucleic Acids Research 35: D690–D695.

van Baal S, Zlotogora J, Lagoumintzis G, et al. (2010) ETHNOS: a versatile electronic tool for the development and curation of national genetic databases. Human Genomics 4: 361–368.

Beck T, Hastings RK, Gollapudi S, Free RC and Brookes AJ (2014) GWAS Central: a comprehensive resource for the comparison and interrogation of genome‐wide association studies. European Journal of Human Genetics 22: 949–952.

Bell M (2010) SOA Modeling Patterns for Service‐Oriented Discovery and Analysis. Hoboken, NJ: John Wiley & Sons.

Cooper DN, Ball EV and Krawczak M (1998) The human gene mutation database. Nucleic Acids Research 26: 285–287.

Cotton RG, McKusick V and Scriver CR (1998) The HUGO mutation database initiative. Science 279: 10–11.

Eckerson WW (1995) Three tier client/server architecture: achieving scalability, performance, and efficiency in client server applications. Open Information Systems 10: 3(20).

Fokkema IF, den Dunnen JT and Taschner PE (2005) LOVD: easy creation of a locus‐specific sequence variation database using an ‘LSDB‐in‐a‐box’ approach. Human Mutation 26: 63–68.

George RA, Smith TD, Callaghan S, et al. (2008) General mutation databases: analysis and review. Journal of Medical Genetics 45: 65–70.

Georgitsi M, Viennas E, Gkantouna V, et al. (2011a) Population‐specific documentation of pharmacogenomic markers and their allelic frequencies in FINDbase. Pharmacogenomics 12: 49–58.

Georgitsi M, Viennas E, Gkantouna V, et al. (2011b) FINDbase: A worldwide database for genetic variation allele frequencies updated. Nucleic Acids Research 39: D926–D932.

Giardine B, van Baal S, Kaimakis P, et al. (2007) HbVar database of human hemoglobin variants and thalassemia mutations: 2007 update. Human Mutation 28: 206.

Giardine B, Borg J, Higgs DR, et al. (2011) Systematic documentation and analysis of human genetic variation in hemoglobinopathies using the microattribution approach. Nature Genetics 43: 295–301.

Giardine B, Borg J, Viennas E, et al. (2014) Updates of the HbVar database of human hemoglobin variants and thalassemia mutations. Nucleic Acids Research 42: D1063–D1069.

Hardison RC, Chui DH, Giardine B, et al. (2002) HbVar: a relational database of human hemoglobin variants and thalassemia mutations at the globin gene server. Human Mutation 19: 225–233.

Karageorgos I, Giannopoulou E, Mizzi C, et al. (2015) Identification of cancer predisposition variants using a next generation sequencing‐based family genomics approach. Human Genomics 9: 12.

Kleanthous M, Patsalis PC, Drousiotou A, et al. (2006) The Cypriot and Iranian national mutation databases. Human Mutation 27: 598–599.

Kogelnik AM, Lott MT, Brown MD, Navathe SB and Wallace DC (1996) MITOMAP: a human mitochondrial genome database. Nucleic Acids Research 24: 177–179.

Krawczak M and Cooper DN (1997) The human gene mutation database. Trends in Genetics 13: 121–122.

Landrum MJ, Lee JM, Riley GR, et al. (2014) ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Research 42: D980–D985.

Landrum MJ, Lee JM, Benson M, et al. (2016) ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Research 44: D862–D868.

McKusick VA (1966) Mendelian Inheritance in Man. A Catalog of Human Genes and Genetic Disorders, 1st edn. Baltimore, MD: Johns Hopkins University Press.

Mitropoulou C, Webb AJ, Mitropoulos K, Brookes AJ and Patrinos GP (2010) Locus‐specific databases domain and data content analysis: evolution and content maturation towards clinical use. Human Mutation 31: 1109–1116.

Papadopoulos P, Viennas E, Gkantouna V, et al. (2014) Developments in FINDbase worldwide database for clinically relevant genomic variation allele frequencies. Nucleic Acids Research 42: D1020–D1026.

Patrinos GP, Giardine B, Riemer C, et al. (2004) Improvements in the HbVar database of human hemoglobin variants and thalassemia mutations for population and sequence variation studies. Nucleic Acids Research 32: D537–D541.

Patrinos GP and Brookes AJ (2005) DNA, disease and databases: disastrously deficient. Trends in Genetics 21: 333–338.

Patrinos GP, Kollia P and Papadakis MN (2005a) Molecular diagnosis of inherited disorders: lessons from hemoglobinopathies. Human Mutation 26: 399–412.

Patrinos GP, van Baal S, Petersen MB and Papadakis MN (2005b) The hellenic national mutation database: a prototype database for mutations leading to inherited disorders in the hellenic population. Human Mutation 25: 327–333.

Patrinos GP, Al Aama J, Al Aqeel A, et al. (2011) Recommendations for genetic variation data capture in emerging and developing countries to ensure a comprehensive worldwide data collection. Human Mutation 32: 2–9.

Patrinos GP, Cooper DN, van Mulligen E, et al. (2012a) Microattribution and nanopublication as means to incentivize the placement of human genome variation data into the public domain. Human Mutation 33: 1503–1512.

Patrinos GP, Smith TD, Howard H, et al. (2012b) Human variome project country nodes: documenting genetic information within a country. Human Mutation 33: 1513–1519.

Shah N, Hou YC, Yu HC, et al. (2018) Identification of misclassified ClinVar variants via disease population prevalence. American Journal of Human Genetics 102: 609–619.

Sipila K and Aula P (2002) Database for the mutations of the Finnish disease heritage. Human Mutation 19: 16–22.

Stenson PD, Mort M, Ball EV, et al. (2014) The human gene mutation database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Human Genetics 133: 1–9.

Stenson PD, Mort M, Ball EV, et al. (2017) The human gene mutation database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next‐generation sequencing studies. Human Genetics 136: 665–677.

Thompson R, Johnston L, Taruscio D, et al. (2014) RD‐Connect: an integrated platform connecting databases, registries, biobanks and clinical bioinformatics for rare disease research. Journal of General Internal Medicine 29: 780–787.

Viennas E, Komianou A, Mizzi C, et al. (2017) Expanded national database collection and data coverage in the FINDbase worldwide database for clinically relevant genomic variation allele frequencies. Nucleic Acids Research 45: D846–D853.

Zlotogora J, van Baal S and Patrinos GP (2007) Documentation of inherited disorders and mutation frequencies in the different religious communities in Israel in the Israeli national genetic Database. Human Mutation 28: 944–949.

Zlotogora J, Patrinos GP and Meiner V (2018) Ashkenazi Jewish genomic variants: integrating data from the Israeli national genetic database and gnomAD. Genetics in Medicine. DOI: 10.1038/gim.2017.193.

Further Reading

Hall JG (2003) A clinician's plea. Nature Genetics 33: 440–442.

Patrinos GP (2006) National and ethnic mutation databases: recording populations' genography. Human Mutation 27: 879–887.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Patrinos, George P(Jul 2018) National Genomic Databases: Documenting Populations' Genography. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0028111]